Kevin Small

Interactive Learning Protocols for Natural Language Applications

 

Statistical machine learning has become an integral technology for solving many informatics applications. In particular, corpus-based statistical techniques have emerged as the dominant paradigm for core natural language processing (NLP) tasks including parsing, machine translation, and information extraction. However, while supervised machine learning is well understood, its successful application to practical scenarios incur significant costs associated with annotating large data sets and feature engineering.

In this talk, I will describe methods for reducing annotation costs and improving system performance through interactive learning protocols. The first part of the talk describes my research on active learning strategies for the structured output and pipeline model settings, two widely-used models for complex application scenarios where obtaining labeled data is particularly expensive. Secondly, I will introduce the interactive feature space construction protocol, which uses a more sophisticated interaction to incrementally add application-targeted domain knowledge into the feature space to improve performance and reduce the need for labeled data. I will also present empirical results for the semantic role labeling and named entity/relation extraction NLP tasks, demonstrating state of the art performance with significantly reduced annotation requirements.

 

 

 

 

 

Official inquiries about AIIS should be directed to Alexandre Klementiev (klementi AT uiuc DOT edu)
Last update: 08/30/2007