Abstract - Natural Language Processing ||
Learning the Discourse Structure of Text

University of Illinois at Chicago

 

Discourse Parsing, the computational segmentation and inference of structure and relations in text, remains a highly challenging task. Efforts have relied mostly on syntactic and lexical information. The use of semantics has been restricted to shallow semantic features such as lexical chains and similarity measures based on word co-occurrences.

I will present an innovative discourse parser that uses rich verb semantics and relational information on the structure of the segment being built. Our discourse parser, based on a modified shift-reduce algorithm, crucially uses a rhetorical relation classifier to determine the site of attachment of a new incoming chunk together with the appropriate relation label. Another novel aspect of our work is that the relation classifier uses Inductive Logic Programming, a method that learns from first-order logic representations. We show that on classifying rhetorical relations, our results are significantly better than attribute-value learning paradigms such as Decision Trees, RIPPER and Naive Bayes. Our work demonstrates that, when available, semantic information for discourse parsing can be used effectively.

Bio:
Barbara Di Eugenio is Associate Professor in the Department of Computer Science at the University of Illinois, Chicago campus. There she leads the NLP laboratory (http://nlp.cs.uic.edu/). She obtained her laurea in Informatica in 1985, from Universita' di Torino, Italy, and her PhD in Computer Science in 1993, from the University of Pennsylvania. She is an NSF CAREER awardee, and a past treasurer of the North American Chapter of the Association for Computational Linguistics; she is also one of the founding and managing editors of the Journal of Discourse and Dialogue Research.

Her research has been supported by NSF, ONR, Motorola and Yahoo!