A review of Describable Visual Attributes for Face Images, by Neeraj Kumar.
Digital images are integral and ubiquitous in modern information technology, and our production and consumption of digital images today are staggering. flickr, Yahoo’s popular image hosting site, announced in 2011 that they hosted 6 billion images and had 80 million unique visitors . However, navigating and searching these massive collections of imagery online is far from trivial. While humans are incredibly good at identifying the content of an image, computers are not. Hence, the task of searching for a specific image and identifying the content of an image is a major challenge in computer science.
In Describable Visual Attributes for Face Images, Neeraj Kumar presents a novel technique for automatically attributing images with something computers easily deal with, namely labels. These labels are called describable visual attributes and in the case of images of human faces these attributes include gender, age, hair color, nose size, etc. Labels can be combined: “white male, thirties, scar-on-forehead, blue-eyes” (p. 4). It is easy to see the potential of a search like that in law enforcement, for example. The core application area of the work presented in Kumar’s dissertation is face images. Kumar demonstrates three applications for search, replace and compare face images. However, Kumar’s work reaches well beyond face images and demonstrates how the technique of using describable visual attributes can be applied to identify the species of a leaf by a snapshot from a mobile phone.
The key in Kumar’s work lies in labeling images, and what better resource to use than those who are actually good at it, namely us. Humans. To do so Kumar applies crowdsourcing to make real people label and identify a set of images to create a labeled image dataset that can be used as point of reference for automatically labeling future images without human intervention.
In Chapter 3 Kumar describes how this labeled image dataset was created using publicly available face images on flickr and picasa and by crowdsourcing labeling tasks to real people through Amazon’s Mechanical Turk. Two labeled face datasets has been made publicly available as a result of the work described in Chapter 3. Chapter 4 explains the algorithm for using this data to train attribute classifiers that maps low-level features of images to human-understandable attributes e.g. mapping pixel values and gradient directions to a jaw size attribute. In Chapters 5 to 7 Kumar describes three applications of the visual attribute technique to face images. Chapter 5 describes a search engine capable of answering queries such as “smiling Asian men with glasses” with extremely high accuracy. In Chapter 6 demonstrates how the technique can be used in an algorithm to automatically replace faces in images – e.g. to conceal identity with stock photos or to synthesize a group image where all are smiling from a series of images taken in a burst. Finally in Chapter 7 Kumar demonstrates that the technique can be applied to verify whether two face images are of the same individual, which is central to face recognition algorithms.
In Chapter 8 it is shown how the technique presented in the dissertation is applicable beyond face imagery. Using the describable visual attribute technique Kumar and colleagues have developed Leafsnap; a mobile application enabling the user to take a snapshot of a leaf and within seconds retrieve a set of top candidates for the species of the leaf with textual descriptions and additional images. Leafsnap has been released to the public with great success and quite a lot of media attention. It is here the author really has to be praised for his work. Not only has Kumar and colleagues developed an elegant technique for identifying the species of a leaf (find all the details of computing histograms of curvature values in Chapter 8) but they have actually proved the real world benefit of their technique beyond academic toy examples and created a simple yet extremely useful tool – and it is free!
In Describable Visual Attributes for Face Images Neeraj Kumar presents a novel technique for training classifiers for describable aspects of visual appearances in images and demonstrates the real world value of this technique. Given the topic of the dissertation, it is inevitably a challenging read for the layperson, yet Kumar manages to leave the reader with no doubt of the potential of his work. I am certain that the peers of Kumar would easily be able to pick up the dissertation, understand the potential of the technique through the elaborate examples, and be provided with enough information and detail to replicate the work in another domain.
This dissertation is not only a strong contribution to computer vision and machine learning, but also a great example of bringing an academic contribution to real world use.
Clemens Nylandsted Klokmose PhD
Department of Computer Science
Academic papers in the field of computer vision and machine learning, in particular on the Support Vector Machines (SVMs) learning model
Digital images extracted from sources such as flickr and Yahoo Images
Amazon’s Mechanical Turk for web-based labeling
Benchmarking for face comparison with the Labeled Faces in the Wild data set
Columbia University. 2011. 127pp. Primary Advisors: Peter N. Belhumeur and Shree K. Nayar.
Image: Image by Neeraj Kumar, from Neeraj Kumar, Alexander C. Berg, Peter N. Belhumeur, Shree K. Nayar, “Attribute and Simile Classifiers for Face Verification,” Proceedings of the 12th IEEE International Conference on Computer Vision (ICCV), October 2009.