Alexandre Klementiev

Learning with Incidental Supervision

 

Moving toward understanding and automatic generation of natural human languages requires a toolbox of core capabilities. It is well accepted today that it is essentially impossible to manually encode many of these capabilities without the aid of machine learning techniques, which automatically acquire them from available natural language data. Corpus-based supervised learning has emerged as the dominant approach, and it relies crucially on the availability of labeled data. However, while unsupervised data is usually plentiful, its annotation is a laborious process for a number of realistic Natural Language Processing tasks, especially those dealing with structured output spaces.

In this talk, I will argue that it is often possible to derive a surrogate supervision signal from a small amount of background knowledge and often plentiful weakly structured unsupervised data. We call this setting "learning with incidental supervision", and study it in the context of the following tasks. First, we consider the problem of Named Entity (NE) annotation transfer to a resource-poor language in a bilingual corpus. We demonstrate that temporal similarity of NE counterparts across languages can be used as an incidental supervision signal to drive learning of a discriminative transliteration model. Second, we consider the task of unsupervised aggregation of structured output models. We demonstrate for ranked data that agreement between constituent models can serve as an incidental supervision signal sufficient to learn an effective aggregation model.

 

 

 

 

 

Official inquiries about AIIS should be directed to Alexandre Klementiev (klementi AT uiuc DOT edu)
Last update: 08/30/2007