Mining visual representations from unlabeled and weakly-labeled image collections

Carl Doersch

Machine Learning Department at Carnegie Mellon University

 

Abstract
Data mining--i.e. finding repeated, informative patterns in large datasets--has proven extremely difficult for visual data. A key issue is the lack of a reliable way to tell whether two images or image patches depict the same thing. In this talk, I'll cover two algorithms for clustering together visually coherent sets of image patches, in both weakly-supervised and fully unsupervised settings, and show how the resulting clusters provide powerful image representations.
Our first work proposes discriminative mode seeking, an extension of Mean Shift to weakly-labeled data. Instead of finding the local maxima of a density, we exploit a weak label to partition the data into two sets and find the maxima of the density ratio. Given a dataset with weak labels such as scene categories, these 'discriminative modes' correspond to remarkably meaningful visual patterns, including objects and object parts. Using these discriminative patches as an image representation, we obtain state-of-the-art results on a challenging indoor scene classification benchmark.
In the second part of the talk, I will discuss how we can extend this formulation to a fully unsupervised setting. Instead of using weak labels as supervision, we use the ability of an object patch to predict the rest of the object (its context) as supervisory signal to help discover visually consistent object clusters. The proposed method outperforms previous unsupervised as well as weakly-supervised object discovery approaches, and can discover objects even within extremely difficult datasets intended for benchmarking fully supervised object detection algorithms (e.g. Pascal VOC).

Bio:
Carl Doersch is a PhD student in the Machine Learning Department at Carnegie Mellon University, co-advised by Alexei Efros (UC Berkeley) and Abhinav Gupta (CMU Robotics Institute). His research interests are in computer vision and machine learning, focusing on learning useful image representations while minimizing reliance on expensive human annotations. Carl graduated with BS in Computer Science from Carnegie Mellon in 2010. He was awarded the NDSEG fellowship in 2011 and the Google Fellowship in computer vision in 2014. More information can be found at his website: http://www.cs.cmu.edu/~cdoersch/