Using Machine Learning and Python to Keyword Images

Screenshot from 2018-06-08 13-49-04.png

You may have seen earlier that I am building an image viewer application.

One of the reasons for this application is my store of purchased stock photography, and also my library of personally taken images. When I need an appropriate image for an article, I have to scroll through visually.

What if I could add keywords to the file names so I could search more easily?

Tensor Flow and Machine Learning with Python

It turns out there has been a heck of a lot of work done in the field of categorizing and analyzing images. While image recognition is still being developed, the last 5 or so years has seen great strides from companies such as Microsoft and Google, and also universities such as the University of Montreal.

For Python users, we can leverage all this hard work even on a tool as humble as the Raspberry Pi or an old laptop, because we don't need to train the machine ourselves!

It's not R2 D2

Now don't get too excited, it's not quite there yet. This is how it managed to interpret my first image:

YOUR PICTURE IS OF A:
 - fountain: 0.533854 likelihood
 - paddlewheel: 0.097188 likelihood
 - park_bench: 0.044255 likelihood
 - birdhouse: 0.031683 likelihood
 - breakwater: 0.021008 likelihood
 - paintbrush: 0.018073 likelihood
 - boathouse: 0.014875 likelihood
 - barn: 0.014575 likelihood
 - mailbox: 0.012850 likelihood
 - shopping_cart: 0.012731 likelihood

Then I realized it had exported from my camera rotated by 90 degrees, so I ran it again ...

YOUR PICTURE IS OF A:
 - lakeside: 0.342271 likelihood
 - freight_car: 0.259517 likelihood
 - boathouse: 0.118738 likelihood
 - paddlewheel: 0.054276 likelihood
 - container_ship: 0.032019 likelihood
 - steam_locomotive: 0.031085 likelihood
 - trailer_truck: 0.017880 likelihood
 - tractor: 0.013872 likelihood
 - canoe: 0.013488 likelihood
 - electric_locomotive: 0.008250 likelihood

Some interpretations were way off, but lakeside and boathouse were excellent matches, as was canoe further down.

Python Code

Full code in this Gist

Let's load in one image and get the top 10 results.

We will use this cat selfie:

import numpy as np
from keras.preprocessing import image
from keras.applications import resnet50

# Load the Keras image database
model = resnet50.ResNet50()

# Load the picture as 224x224 (maximum size this model can cope with)
picture = image.load_img("2.jpg", target_size=(224, 224))

# Convert to image array
x = image.img_to_array(picture)

# Expand as if it is an array of images
x = np.expand_dims(x, axis=0)

# Pre-process to the scale of the trained network
x = resnet50.preprocess_input(x)

# Run the prediction
predictions = model.predict(x)

# Get the classes of the top 10 results
predicted_classes = resnet50.decode_predictions(predictions, top=10)

print("YOUR PICTURE IS OF A:")

for imagenet_id, name, likelihood in predicted_classes[0]:
    print(" - {}: {}".format(name, likelihood))

Results

YOUR PICTURE IS OF A:
 - tabby: 0.37501177191734314
 - Egyptian_cat: 0.19036126136779785
 - lynx: 0.09264399111270905
 - tiger_cat: 0.07313445210456848
 - Persian_cat: 0.07192806154489517
 - Siamese_cat: 0.04455263167619705
 - carton: 0.027719130739569664
 - window_screen: 0.014117571525275707
 - plastic_bag: 0.013977281749248505
 - bow_tie: 0.008933120407164097

It even knew it was a Tabby! Would have been useful to say "cat".

What about Benji? He is an English Cocker Spaniel.

YOUR PICTURE IS OF A:
 - cocker_spaniel: 0.48787274956703186
 - curly-coated_retriever: 0.2491113841533661
 - bluetick: 0.048936877399683
 - standard_poodle: 0.045590486377477646
 - Irish_water_spaniel: 0.026697995141148567
 - Labrador_retriever: 0.022910259664058685
 - Chesapeake_Bay_retriever: 0.012821889482438564
 - Bouvier_des_Flandres: 0.012804904952645302
 - Great_Dane: 0.009580017998814583
 - American_Staffordshire_terrier: 0.006594178732484579

Perfect first result!

Conclusion

It still needs some human interpretation. The cat and dog pictures could definitely use the cat and dog keywords ;)

But still, impressive!

The results highly depend on how well trained it is on sample images. You need thousands and thousands of images to properly train it.

If it has enough of the exact category you are looking for, you should get pretty accurate results, but you need a lot of samples that are not the category as well or it will think everything is the category you are looking for.

You are using a pre-built model. But if you want to look for things they don't cover you can make your own model with your own sample images.

Even then though, a lot of the times you start with their trained data and retrain it for more specific cases you will be looking for and this is where you save a ton of time and get really good results.

For example, the data set you are using has a lot of cats, dogs, animals, vehicles and some things in the real world. Let's say you want to change it to detect GPUs based on pictures. You can take the existing trained model and remove the last few layers that branch out to the categories and do additional transfer learning from there, this will allow you to make new categories and add to what it knows already.

Would be a good learning experience to train that model to look for categories, not in the original data set. For example, detecting the difference between FedEX and UPS trucks. The model is already trained well in detecting trucks. So it would be fairly easy to get it to detect specific company trucks.

But whatever you do, make sure you use notebooks, it will make your life sooo much easier

Sort:

Trending

[-]

themarkymark (81) 8 years ago (edited)

$0.02

2 votes

mstafford (68) 8 years ago

I'm thinking of building a a trained model to classify and tag images from construction projects that I'm involved in. Would save me hours and hours and hours of time in writing photo descriptions, as well as trying to find images later on to remember what happened when.

$0.00

1 vote