Tuesday, January 15, 2008

Image Search

How many of you have faced the problem with the image search results from any of the search engines? Have you got, what you were looking, on the first page?
I guess the answer in most of the cases is 'yes' (for the former one) and 'no' (later one). So why does it happen?
Most of the search engines works almost in a same way with the images. Whenever an 'ImageBot' encounters an image on the web, it tries to retrieve the image content specific information in the following manner:
1. It tries to relate the image content with the file name. (i.e. if name of image file is Briteny_spears.jpg, then it stores this info or in other words it labels the image with the name of the file)
2. Search engine looks into HTML META tag and labels the image with the information inside it.
3. It looks for the text around the image and using some heuristics gives some more labels to the image.
4. It can perform some more heuristics i.e. from which page, the page containing the image was referenced etc.

As one can see, if somebody creates an online image gallery for his favorite actress (let say 'Angelina Jolie') and names it 'My First Crush' and names all the image files like 1.jpg, 2.jpg etc. Then the labeling of all the images in the gallery would be awesome :-). Things can get worse if he names all the images using breeds of dog. ("exceptionally" bad example ;-> but there are people who can do this)

So what are the possible solutions?
1. Keep relying on content generator (on the web, I am not talking about the person who actually clicks the photo) and hope they will get smarter and smarter and will put much more relevant information in META tags and around the image.
2. Search engines should draw the first blood. They can ask for user inputs regarding the visual content present inside the image i.e. google image labeler.
3. The third possibility is controlled by actual media generator (the photographers who are clicking thousands of photos everyday). They are the best judges of the content inside the image, so let them put the image related information inside the image file itself. Almost every image file format supports metadata field inside it. So just dump all the information inside it. No need to worry about web developer, whether they are smart or not, whether search engines heuristics are good or not.
4. Last but not the least, do image recognition. Recognize all the objects present in the image and label the image accordingly. (not as hard as finding the answer of the ultimate question but still...)

All in all, future of image search is bright and there are lots of possibilities for making it more and more efficient. cya... :)

and the reasons...

there are no reasons... :)