Image recognition with python

This is the third part of a serie "automating games with python"

You can find on my account How I made my own python bot to automate complex games (part 1) Which explains my motivation and the game I'm automating itself. the part 2 is How to control the mouse and keyboard with python for automation Which digs into the core functions that are needed for automation.

So here we go with another core functionality : How to find images on the screen and react to it ?

I haven't actually found any library that does that well or that allows me to do everything that I want. So I went ahead and wrote my own !

Introducing : Python-Imagesearch

https://github.com/drov0/python-imagesearch

This is a wrapper around opencv which is a great library for image processing and pyautogui, which we talked about here to move the mouse and stuff.

Basically what we need is simple :

take a screenshot of the screen
look for the image inside
return the position of said image

This is pretty easy. But as development went I had some other needs like being able to tune the precision (the less precision, the more forgiving the imagesearch is with slight differences). Or look at some specific places on the screen. etc

I encourage you to read the actual code, it's really straightforward, short and commented : https://github.com/drov0/python-imagesearch/blob/master/imagesearch.py

I'm going to show you a few code snippets from the example file in the git repo and from the bot itself to show you how I use it.

The basics

This is incredibly straightforward and yet, it works perfectly.

pos = imagesearch("github.png")
if pos[0] != -1:
    print("position : ", pos[0], pos[1])
    pyautogui.moveTo(pos[0], pos[1])
else:
    print("image not found")

This is how I use it most of the time, but I also use it to see if an element is present or not
Here if we right click on a dead body, then the "harvest icon" pops up :

So we can check if there is a dead body that is present where we click or not.

    pyautogui.click(button="right")
    time.sleep(r(0.4, 0.2)) # the r function simply returns 0.4 + 0.2 * random.random()
    presence = imagesearch("harvest_icon.png")

    if (presence[0] == -1):
      # do things

Same goes for imagesearchArea, I simply use it to search the screen for a specific element that may pop up several times.

for example :

this is the timeline where you see who's turn is it. If I look at the whole screen for a monster to see if it's his turn or not, I will have 2 3 matches, but if I look very precisely where the end of the timeline is (close to the 57) then I'll only have one match and I can use that to see if it's the monster's turn or mine.

This can also be used to avoid compatibility issues with screen resolutions :

an icon obviously takes more pixels if it's displayed in 1920x1080 rather than 800x600 so the imagesearch won't recognize the images captured on a 800x600 screen if you search for them on a higher resolution screen.

This way you can just say "well the images are captures in 800x600 so I'll set up the lookup zone to 800x600 so the rest can resize their game windows to that size this way the resolution is not a problem, and everyone can use it.

the code is still as straightforward :

pos = imagesearcharea("github.png", 0, 0, 800, 600)
if pos[0] != -1:
    print("position : ", pos[0], pos[1])
    pyautogui.moveTo(pos[0], pos[1])
else:
    print("image not found")

Loops

Sometimes you may want to look for something until it pops. For instance let's say that you launch photoshop, depending on your computer it may take 10 seconds or 2 minutes. So you can just wait until you detect the photoshop toolbox and then do whatever you need.

I use them for this exact use case, it saves me a few lines and it's clearer.

pos = imagesearch_loop("github.png", 0.5)

print("image found ", pos[0], pos[1])

of course it works with regions as well :


pos = imagesearch_region_loop("github.png",0.5, 0,0,800,600)
print("image found ", pos[0], pos[1])

Randomly clicking on an image

This is a rather specific use case. Basically a bot will always click at the same point. This is not a problem but when you try to imitate a human then, it's important to add a bit of random to our clicks, but it can be very annoying to add some random and then notice 20 hours later that the bot crashed because the random made it click out of the image.

So I designed a function to click close to the center of the image with an offset to prevent this from happening :

# click image is to be used after having found the image

pos = imagesearch("github.png")
if pos[0] != -1:
    click_image("github.png", pos, "right", 0.2, offset=5)

the offset is the number of pixels to be used for randomization.

quoting the documentation :

"eg, if an image is 100*100 with an offset of 5 it may click at 52,50 the first time and then 55,53 etc" it will go to 55,55 maximum, and not backward (so 45,45 is impossible), plan accordingly !

Optimization

You'll notice that doing imagesearch is not without it's compute time, and if you want to look at the same image for several different images, it can take you a few seconds of run time, and since time is money, here is how to optimize :

Use region_grabber to grab the image and then compute imagesearch_area on it several times. This can provide a good improvement, here I got a 4X improvement over 20 searches :

# non -optimized way :
time1 = time.clock()
for i in range(10):
    imagesearcharea("github.png", 0, 0, 800, 600)
    imagesearcharea("panda.png", 0, 0, 800, 600)
print(str(time.clock() - time1) + " seconds (non optimized)")

# optimized way :

time1 = time.clock()
im = region_grabber((0, 0, 800, 600))
for i in range(10):
    imagesearcharea("github.png", 0, 0, 800, 600, 0.8, im)
    imagesearcharea("panda.png", 0, 0, 800, 600, 0.8, im)
print(str(time.clock() - time1) + " seconds (optimized)")

# sample output :

# 1.6233619831305721 seconds (non optimized)
# 0.4075934110084374 seconds (optimized)

I use that when I need to check the screen for several elements on the same image.

It was a necessity when I used another function to grab screenshots which took about 1 second per screenshot, so if you have to do imagesearch on the same spot a few times it quickly got out of hand. Now it's less important.

Well that's about it, if you have any questions, feel free to write them in the comments or if you find an issue with my wrapper, feel free to open an issue. If you want to use the library, I wrote it under the MIT license so don't worry about it :)

Thanks for reading and see you next time for the machine learning part of this serie.

Sort:

Trending

[-]

phil-coding (31) 8 years ago

This is extremly interesting Thank you. I also work with computer vision in MATLAB :D

$0.00

1 vote

howo (76) 8 years ago

Thanks a lot for your input :D, If you find yourself without a matlab licence give octave a try, It's basically matlab but open source, and the code you write for matlab works for octave 99% of the time. But it's still in beta it has a few rough edges

wise-old-man (4) 8 years ago

Nearly everything you do is of no importance, but it is important that you do it.

- Mahatma Gandhi

steemitboard (66) 8 years ago

Congratulations @howo! You have completed some achievement on Steemit and have been rewarded with new badge(s) :

Award for the number of upvotes

Click on any badge to view your own Board of Honor on SteemitBoard.
For more information about SteemitBoard, click here

If you no longer want to receive notifications, reply to this comment with the word STOP

By upvoting this notification, you can help all Steemit users. Learn how here!

Image recognition for automation with python