Abstract - Natural Language Processing ||
Semantic text kernels for sentence-based image retrieval and annotation

UIUC

 

When someone is asked to succinctly describe what is depicted in a photograph, the description will not just simply list everything that is being depicted in the image. In addition, two people typically will not produce the same description for a photo; they will mention different things and use different phrasing. Current approaches for sentence-based image annotation use detectors to map images to an explicit representation of objects, scenes and events, and cannot be applied directly to the converse task of sentence-based image retrieval. By contrast, in this work, we model the association between photographs and their English-language sentence descriptions by inducing a common semantic space suitable for image annotation and image retrieval. We use Kernel Canonical Correlation Analysis (KCCA) to induce this semantic space of images and sentences which can be used for both tasks. We investigate what text representations are most appropriate for this task. We only rely only on low-level image features, but images appear near sentences that describe them well: for over a quarter of unseen test images, the closest sentence among a large pool of unseen candidates describes them perfectly or with only minor errors.
* presenting joint work with Peter Young *