Abstract - Computer Vision ||
Capturing Human Insight for Visual Learning

The University of Texas at Austin

 

Abstract

How should visual learning algorithms exploit human knowledge? Existing approaches allow only a narrow channel of input to the system, typically treating human annotators as “label machines” who provide category names for image or video exemplars. While there is no question that human understanding of visual content is much richer than mere labels, the challenge is how to elicit their insight in ways that learning algorithms can readily exploit.

We propose to widen the channel of communication to visual recognition systems beyond traditional labels. I will present new techniques that allow a human annotator to teach the system more fully---using either descriptive relative comparisons (e.g., “bears are furrier than rats”), explanations behind the label he assigns to an exemplar (e.g., “this region is too round for it to be a chair”), or even through implicit cues about relative object importance that are revealed in the way an annotator tags an image. In developing these techniques, we introduce a novel approach to model relative visual attributes using learned ranking functions, which generalizes previous strategies restricted to categorical properties. In addition, we investigate new cross-modal textual/visual representations that can capture what human viewers find most noteworthy in images. Through results on challenging image recognition and retrieval tasks, I will demonstrate the clear advantage of incorporating such richer forms of input for visual learning.

This talk describes work with Sung Ju Hwang, Devi Parikh, and Jeff Donahue.


Bio

Kristen Grauman is a Clare Boothe Luce Assistant Professor in the Department of Computer Science at the University of Texas at Austin. Her research in computer vision and machine learning focuses on visual search and object recognition. Before joining UT-Austin in 2007, she received her Ph.D. in the EECS department at MIT, in the Computer Science and Artificial Intelligence Laboratory. She is a Microsoft Research New Faculty Fellow, and a recipient of an NSF CAREER award and the Howes Scholar Award in Computational Science. Grauman and her collaborators were awarded the CVPR Best Student Paper Award in 2008 for work on hashing algorithms for large-scale image retrieval, and the Marr Prize at ICCV in 2011 for work on modeling relative visual attributes.