Introducing photomatcher: reverse image search bot for photo posts

in #steemdev6 years ago

pexels-photo-243757 (1).jpeg

Plagiarism is a hot topic in the Steem community. As an experiment I tried to write a bot that attempts to reverse image search photo posts using various photography-related tags and posts a comment when matches are found.

Under the hood it uses @xeroc's piston-lib and Google Cloud Vision API.

Example: this photo posted via Steepshot received a comment by the photomatcher bot.

You can find the most recent comments here

This is a prototype and any ideas on how to improve this bot are welcome, just let me know in the comments!

Photo by Kaique Rocha (CC0 License)

Sort:  

I've actually researched reverse image search the other day and I found out that plagiarists crop the image so the reverse image search can't locate it's original source. I don't know if the cropping they do works on all images, but if it does it is untraceable. So, my question is, can it be fixed? Plagiarism is destroying art! We need to stop it at once. Great job making the bot.

Yes, you are correct! Unfortunately. Another known trick: flip/reverse the image! There's no further editing needed, bots won't recognise a flipped image as plagiarism either. So, although @photomatcher will have some challenges to create the 'perfect' bot, I still think more than 90% of plagiarism is done by just plain stupid/ignorant people :-) And at least those will be caught by the bot ;-)

I hope so. Plagiarism is a known problem to blogging/photography communities and it is actually very hard to catch an experienced plagiarist. (as you've mentioned, flipping/reversing an image) We could just hope that they will be so stupid not to check if anyone has reversed the image before they did, which would result with instant catch of a plagiarist. :) And if you ask me, I would permanently ban every plagiarist around here. It may sound harsh but that is the only way of telling them that plagiarism is not welcome.

This is not a plagiarism control bot. He is attacking ALL posters (many hundreds within a few days) who use a web image, such as, from pixabay who allow it, knowing it will cause them embarrassment, so that other viewers think the poster stole the picture. This is all for his own profit - just look at how quickly his rep has gone up -did you go from 25 to 44 within a couple of days? He is using technology for his own benefit, without caring how many innocent people he maligns by implication.

Please do not just react to your fears of your work being stolen and think about what he is really doing.

I actually hope that his intentions are good, because royalty free images are given for free to use on authors behalf. So if you do a reverse search on royalty free image, it would produce hundreds if not thousands of results. (people that used the same royalty free photo) You have a good point here. Update: I have even tested it now and for a single pexels royalty free image, it found me a astonishing 100,000+ similar images. Wow!

I'm sorry, it may be cynical of me, but the way he went about this, from the first day he joined steemit, it is obvious he came aboard with a plan for making his Rep and profit quickly. He may improve his bots as he goes, but what about those he damaged deliberately, without caring about them?

He should reverse ALL his comments (over 500 last time I looked) and only once he has perfected it, should he ask those who deal with this kind of problem and once they approve and appoint him, can he go ahead. Otherwise, I am letting all posters know that they should Flag him.

This is GREAT! I find plagiarism annoying and adding a lot of complaints to steemcleaners.

My feature request would be: let photographers like me add their website/social media accounts to your bot so your bot knows if they find something on those websites/social media it's actually NO plagiarism. I have hundreds of images uploaded on several places online!

Will continue following this development!

This is NOT great. Please think about what he is doing, instead of just reacting to your own anger with plagiarists.

Hi Arthur,

I appreciate you adding a critical note. I read back your comments on this post and on other users' comments where you encourage them to flag and let it sink in. I can see what your concern is.

What I'm thinking now regarding your 'fight' against @photomatcher:

  • He seems open to discussion seeing his comments on other users suggestions. Maybe start with that? Asking your questions and seeing the respons first?
  • It is annoying to see users get these 'scare comments' if they're innocent, and this is mostly caused by the pace with which the bot is testing now. I understand the need for testing though, so I would give a user I know nothing about the benefit of the doubt. (I am by the way suspecting this is a longer term user trying out a bot on a fresh account, but of course, I can also not be sure about that.)

In the big scheme of things I do see the problem with Plagiarism as one of the biggest on Steemit. I use Pinterest, for example, and on that website I sometimes get an e-mail saying 'I'm sorry, the pin you saved on your account has to be removed for legal reasons'... And 'non-blockchain blogs' get sued all the time for not appropriately crediting the use of images, even professional blogs (which I suspect will not be the biggest part of Steemit users in the end.) What does it mean for Steemit once plagiarised things get locked (after 7 days) in the blockchain? Who is responsible for it, who can remove/edit?

Anyway, lots of questions, I'd like to have a discussion on it, especially if we can do that in a civilised and calm manner. I do hope @photomatcher will join in and shed some light on the concerns mentioned in yours and my comments.

Thank you for your feedback, I just stopped the bot for now, read here for an explanation.

Of course I'm always open for discussion and I would love to learn from anyone of you about how to take on the subject of copyright and plagiarism. I believe we all benefit on this platform from discouraging people that try to make some quick money with content that they have no rights to use.

So for now there's space for much improvement on the bot and I will develop further until I'm confident enough to announce and test the new version. Thanks again for all the feedback!

Hi, thanks for responding and being open for feedback :-) It's good to test and then take time to develop further! I'd love to hear your adaptations on the next version and will always give my feedback.

Even though English is not my first language I'm very sensitive to 'tone', so if you want me to five feedback on the tone of the automated text I'd gladly do so. For now / the first rounds at least, I would suggest adding a sentence like 'Photomatcher Bot is still in beta testing, so if I made a mistake by suggesting your picture is not yours please let me know by adding a comment to [link to dedicated feedback article]'

Good luck on developing!

Great work! Checking through your comments and it seems like you've done a good job. Perhaps turn down the threshold a bit. Checked out a few of the posts that you've flagged and defnitely noticed a number oof few false-positives. Too many of these can be annoying to the community.

Thank you! I'm monitoring the comments and hope to improve the bot very soon to have fewer false positives.

That is so helpful! You could add the function of matching the hash values of the images. If they are the same, you definitely know that they are a match.

also... if the text of the post contains a link to one of the found matches, you could ignore it. Since they already refer to their source.

I added this feature just now and looks to be working, thanks for the suggestion!

Thank you for building this!! :)

hi - can you build and share me the project of a red-rose gifts bot,
voting service for fees of 0.01 to 0.02
the member will get 250% upvote + resteeming to real 685 followers + sharing his post on social media
if you like check my wallet you can see many good members are using my service and they love me and they trust on me
check my replies section you will see more than 1300 thanks replies are coming from happy members i have supported them

i will wait your reply

thank you

You are welcome! I hope you get the most out of this platform

Ok, there are some features that need to be added (and tested) before it can be activated broadly. I notice that some posts use photo's that are on google labeled as "labeled for reuse". Example vs Google. Are you taking this into account and is this just a glitch? Or is this not part of the algorithm that you developed.

What I think should be a good plan:

  • Stop the bot for time being
  • Create a new post that covers the checks that you do
  • Gather feedback
  • Implement useful ideas
  • Affiliate with steemcleaners if possible
  • Do proper testing
  • Then launch

I have to agree with @arthur.grafo that if smells fishy this way. Is it about getting the bot into action asap (without considering people's reputation)? Or is it about having a good working bot?

I stopped the bot for now and I agree I have to test it more thoroughly. I appreciate all the feedback so far.

I'd like to emphasize that I got this idea because I read many complaints on Steem about people taking advantage of someone else's content without permission. Maybe I should rewrite the bot's texts a bit, I see some people get offended by it (even though I never meant to accuse anyone of plagiarism), so it might need a different 'tone of voice'.

Again, thank you for your feedback. I wanted to test the bot as much as possible 'in the open' and gathered many new insights this way. Now it's time for further development and I will update you all soon!

This bot will be a great help to avoid plagiarism.

This is really cool man. Would love to understand this more if you don't mind the questions.

So each time the bot comments, it's because it found a match of a new posted photo to one that already exists on Steemit?

I would imagine that in matching photos like this, there's some sort of probability metric that determines how likely it is to be a match. If that's the case, what do you have your thresholds set to, like 95%+?

The bot checks the posted photo against Google reverse image search. So you will know if the photo is original content or taken from somewhere else on the web.

Right now the matching is done by Google, but I'm already working on additional checks to be more sure about the matches. For example, it's not 100% perfect now with sunset photos right now, because they all look so similar :-)

I never realized that Google made this functionality open to the public. I've been seeing them use it for a number of purposes in google images lately. It's so good that it's a little spooky at times. I think it's time for me to deep dive on Google Cloud Vision.

BTW, what are you tying into that triggers the bot on each post?

For getting the latests posts I use piston-lib, here's an example: http://lib.piston.rocks/en/develop/quickstart.html#waiting-for-new-comments

Wow, this is great! Thanks for pointing me to this documentation.

Please do not just react to your fears of your work being stolen and think about what he is really doing.

Please read my other comments on the page you made your post. He is not using his bot to stop plagiarism. If he were, he would not be embarrassing posters who do give source or get their images from sites like pixabay, who allow their images to be used.

All he cares about is that he has found a way to grow himself here super quick, and he does not care how many posters he damages with his suggestive comment (plus the implied threat by his posting the link to steemcleaners.). Most posters lose it, get afraid, because most are new. This is bad and I just wish I knew who to contact to stop this abuse of posters by him. I guess I should maybe ask @patrice, she'll know.

I'm trying to go to the posters he sent his bot, to reassure them (if they are legally using the images) but I cannot keep up, the bot is distressing hundreds of posters without reason.

Pretty kool set up .. thanks jkenny

Great Post! Thanks for sharing @photomatcher

What if we indicate the source of the image? How is that?

Nice to meet you!

I Just authored a post and i used a CC0 Creative commons from Pixabay .I indicate the source too. I received a comment from your bot. Remove immediately that comment or i will downvote it at full Power.
Your bot don't work properly.

I updated the bot to make sure that Pixabay and others are whitelisted, thank you for your feedback!

Please read my reply to many about this bot. Now he pretends he has discovered there are legitimate users and says he is changing his bot? I have tried going around to all posters he has attacked with this bot, to tell the legit ones to Flag him. Glad to see you had the same idea. Go for it if he comes around again.

incredible picture. like an imaagen can transmit tannto. Greetings from Venezuela.

I disagree with all of you. This is not being used to fight plagiarism, it is just a quick easy way for him to use a bot and make hundreds of posts, so as to up his rep.

It is totally accepted that almost everyone uses images from the web. What is asked is that we only use images that are not copyright and are free for us to use.

He did not build into his search any controls to prevent him making accusations where they are not warranted. All it does is report other copies of the same image in our posts and then makes its allegations, so that everyone else thinks that poster stole the image. It does not even check to see whether the poster has provided the source. He is going to end up attacking nearly every poster in steemit!

Exactly because it caters to the fears of people like you (people who produce their own photos), it is accepted by you, without you taking the time to think about it and see the damage he is deliberately causing hundreds of posters per day.

I am considering contacting every person he has sent his damaging comment to and suggesting they all flag his comment.

Sweet Idea!

Wow mazing fhotogrpyh i like to into. Help me follow

Congratulations @photomatcher! You received a personal award!

1 Year on Steemit

Click here to view your Board

Do not miss the last post from @steemitboard:

SteemWhales has officially moved to SteemitBoard Ranking
SteemitBoard - Witness Update

Support SteemitBoard's project! Vote for its witness and get one more award!

Congratulations @photomatcher! You received a personal award!

Happy Birthday! - You are on the Steem blockchain for 2 years!

You can view your badges on your Steem Board and compare to others on the Steem Ranking

Vote for @Steemitboard as a witness to get one more award and increased upvotes!