Steem Sincerity - Working with SteemPlus to crowdsource spam classification training data

jumowa (65) 8 years ago

o, instantlove

$0.45

therealwolf (76) 8 years ago

Great teamwork @andybets and @stoodkev!

I will implement the spammer-score API as soon as possible into @smartmarket (smartsteem.com) and maybe even @smartsteem as well.

However, my guess is that I have to play around with the spammer score value.

Nevertheless - really great job @andybets!

$0.06

andybets (62) 8 years ago

Thanks very much! I'm working to improve the classification process as we speak, but that's the reason the API provides its estimate of probabilities rather than an absolute classification. App can decide for themselves how to responsibly use the imperfect data.

$0.00

duplibot (55) 8 years ago

Any documentation on the classification endpoint? I could start posting my work to add to the training data.

None of my work is automated so I would feed you 'pre-reviewed' data so maybe you want to consider another private endpoint to save you the additional effort. I'm sure there's others like me that could feed in some great data here and expedite your training.

$0.06

andybets (62) 8 years ago

Thank you, I will be in touch. I just need to work out how to assimilate all the information in the most appropriate way.

$0.00

duplibot (55) 8 years ago

I totally understand, this is no small task! Let me know how I can help.

$0.00

cedricguillas (52) 8 years ago

Amazing job! It's a pleasure to work with you!

andybets (62) 8 years ago

Likewise :)

$0.11

fraenk (63) 8 years ago (edited)

something needs to be done to seriously improve that classification.

bots that give 3 times as much upvotes than comments (and we are talking just a few hundreds) are classified as top 500 spammers with a 1.0 spammer score (yes, I am talking about my poor @cuddlekitten) while actual spambots leaving thousands of identical comments get a 0.42 human score (see @tomole444).

For the time being maybe you could put a huge warning sticker on the API as it's still pretty damn inaccurate. A lot of people are already embracing the API, and yes, it is a great step forward for the steem ecosystem... but with such inaccuracies it could be quite damaging to the wrong accounts.

3 votes

andybets (62) 8 years ago

Thanks for the feedback. I'm working on improving classification, I have presented the limitations of the software in my posts about the subject. I agree that the classification for tomole444 is not correct, and I also disagree with the scores received by cuddlekitten.

fraenk (63) 8 years ago

I think there should be a stronger biasing of the data towards evaluating total number of comments and downvote-ratio.

I'll be curious as to how this progresses, but I do indeed think it should be used with caution as it is, as this becomes available to "end-users" already the labels will be taken as true-beyond-doubt and cuddlekitten has been made aware of her new spammer-label by a few confused cuddle friends already.

$0.00

hashcash (63) 8 years ago

Okay you need a larger set for training. More peoepl need to paticipate in this effort for it to show more accurate results. This is a great project and I would try to follow it as much as can. Thanks :-)
Came here after @steevc resteemed this post. Thanks to him as well

2 votes

cardboard (65) 8 years ago

@tipu upvote this post for 1 sbd :)

$0.03

tipu (67) 8 years ago

Hi @andybets! You have received 1.0 SBD @tipU upvote from @cardboard !

@tipU! upvotes with 220% profit and pays 100% profit + 50% curation rewards to investors :)

$0.03

steevc (80) 8 years ago

That's really cool. Could save me time checking up on some commenters.

$0.00

2 votes

teamhumble (74) 8 years ago

oh that's neat! :)

$0.00

schrosct (44) 8 years ago

"Sir, are you classified as human?"
"Negative, I am a meat popsicle."

$0.00

cardboard (65) 8 years ago (edited)

Steemit should implement it as default view :D I will need to finally try it for @tipu :)

$0.00

sebbbl (63) 8 years ago

Great idea ! In the same way, bots could be identified in the list of rewards of a post on Busy (don't know if this kind of list exists on Steemit).
Sorry @eroche to choose you as an exemple, this was totally random !

$0.00

dailypick (49) 8 years ago

This post has been upvoted and picked by Daily Picked #23! Thank you for the cool and quality content. Keep going!

Don’t forget I’m not a robot. I explore, read, upvote and share manually ☺️

$0.00