Biggest Spammers Report

in #steemdev3 years ago (edited)

Here's a user friendly view of the Sincerity API's Biggest Spammers list.

It focuses mostly on comment spam and the list is autoupdated with the accounts with maximum spammer scores, and sorted by the number of comments made in the last calculation period (currently 14 days).

This is generated by a machine learning algorithm, which may produce a few 'false positives' where non-spammer accounts are detected as spammers. If you find any such accounts, I'd be pleased if you comment about it, as it will help me improve the spam classification algorithm.


I'm flagging those who hit my posts and reporting the worst ones. A few whales using a tiny amount of their SP could deal with it easily. If spam doesn't earn and gets flagged to invisibility then it will go away.

Yes. It is surprising and annoying how little these accounts seem to need to make to keep spamming though!

I'm starting to think comments should cost something like 0.001 SBD/STEEM/SP if they end their voting window with a net negative vote (a negative number of rshares). For accounts with only delegation, perhaps the source account should pay the fee.

Fake accounts? If I set up a bunch fake accounts with bots (too much work for me) they would only need to make a few dollars a week each to make some money.

I need to dig in into the api and use it to validate the vote buyers :) tipuvote!

Sorry, @tipU is currently recovering voting power. Please try again later! ;)

Hush, I have special privilages!

You might need to watch your spam score if you go around talking to bots! ;)

Better then to myself, lol.


I hate those types of responses, I hate to used up voting power to flag them, but I guess I am going to have to as they are becoming more and more prevelt in many posts.

I have to admit that I've never actually downvoted someone (except 1 phishing comment). With just 250 SP, it wouldn't really make that big of an effect.

Might have to rethink that, if they're really just earning a couple of cents every 100 posts or so.

I was wondering what data points you are using to flag something as spam. One that I thought might be useful is mean time between posts/word count. Or some variant. I saw an account today that was posting a 500 word article very 10 minutes or so. Sorry, but nobody writes that fast and ends up with the quality. These had to be cut and paste - maybe even blatant plagiarism.

In theory, you could trap plagiarism by comparing consecutive posts and seeing if there is linguistic consistency between them. Someone cutting and pasting content from other sites would show variations in vocabulary, sentence structure and other linguistic markers. These markers would be similar if posted by the same person.

I know there is academic work that has done this very thing but I suspect it would be a difficult task to do in real time.

Anyway, good work on this project. Keep it up!

That kind of feature isn't currently used, but I've considered similar, and it's still a work in progress, so thanks for your thoughts.

Very good short