There will be some more data to come!
The definition of spam varies from user to user. There is also a list of accounts that I'm curating to flag as "non human" in my data set.
Longer term, we might need to see "competition" for inclusion in blocks, perhaps based on resource credits.
There is a lot of spam, but the resource credits you get for HP are an absolute bargain compared to the amount of crap you can inject into the chain.