Regina Barzilay

Statistical Models of Discourse Structure

 

Bag-of-words representations are used in many NLP applications, such as text classification and sentiment analysis. These representations ignore relations across different sentences in a text and disregard the underlying structure of documents. While more linguistically elaborate models of text structure have been studied for decades, they are rarely used in text analysis applications. A reliance on handcrafted rules combined with the limited portability and scalability of these models makes bag-of-words approaches the representation of choice in practical settings.

In this talk, I will demonstrate that incorporating structural information in document analysis yields significant performance gains over bag-of-words approaches. First, I will describe how models of text structure can be learned from a collection of unannotated texts, bypassing the need for complex annotations. The key premise of our work is that by analyzing patterns in word distribution we can predict high-level patterns in discourse organization. Second, I will show how these models can be effectively integrated into core text processing applications such as topical classification and sentiment analysis.

This is joint work with Amir Globerson, Zoran Dzunic, Yoong Keok Lee, Igor Malioutov and Benjamin Snyder.

 

 

 

 

 

Official inquiries about AIIS should be directed to Alexandre Klementiev (klementi AT uiuc DOT edu)
Last update: 08/30/2007