Abstract || Learning from Descriptive Text

Stony Brook University

 

Abstract:
People communicate using language, whether spoken, written, or typed. A significant amount of this language describes the world around us, especially the visual world in an environment, or depicted in images or video. Such visually descriptive language is potentially a rich source of 1) information about the world, especially the visual world, and 2) training data for how people construct natural language to describe imagery. In addition there exist billions of photographs with associated text available on the web; examples include web pages, captioned or tagged photographs, and video with speech or closed captioning. In this talk I will describe several projects related to images and descriptive text, including our recent approaches to automatically generating natural language describing images, our collection of 1 million images with captions, and explorations of how visual content relates to what people find important in images.

All papers, created datasets, and demos are available on my webpage at: http://tamaraberg.com/

Bio:
Tamara Berg received her B.S. in Mathematics and Computer Science from the University of Wisconsin, Madison in 2001. She then completed a PhD from the University of California, Berkeley in 2007 and spent 1 year as a research scientist at Yahoo! Research. She is currently an Assistant Professor in the computer science department at Stony Brook University and a core member of the consortium for Digital Art, Culture, and Technology (cDACT). Her research straddles the boundary between Computer Vision and Natural Language Processing with applications to large scale recognition and multimedia retrieval.