Andrew McCallum

Information Extraction, Data Mining and Joint Inference

 

Although information extraction and data mining appear together in many applications, their interface in most current systems would better be described as serial juxtaposition than as tight integration. Information extraction populates slots in a database by identifying relevant subsequences of text, but is usually not aware of the emerging patterns and regularities in the database. Data mining methods begin from a populated database, and are often unaware of where the data came from, or its inherent uncertainties. The result is that the accuracy of both suffers, and accurate mining of complex text sources has been beyond reach.

In this talk I will describe work in probabilistic models that perform joint inference across multiple components of an information processing pipeline in order to avoid the brittle accumulation of errors. The need for joint inference appears not only in extraction and data mining, but also in natural language processing, computer vision, robotics and elsewhere. I will argue that joint inference is one of the most fundamental issues in artificial intelligence.

I will present recent work in conditional random fields for information extraction and integration, with a focus on joint inference through stochastic approximations, weighted first-order logic, and new methods of probabilistic programming that enable reasoning about large-scale data. I'll close with a demonstration of Rexa.info, our research paper digital library that leverages these techniques.

Joint work with colleagues at UMass: Charles Sutton, Aron Culotta, Khashayar Rohanemanesh, Chris Pal, Greg Druck, Karl Schultz, Sameer Singh, Pallika Kanani, Kedare Bellare, Michael Wick, Rob Hall, David Mimno and Gideon Mann.

 

 

 

 

 

Official inquiries about AIIS should be directed to Alexandre Klementiev (klementi AT uiuc DOT edu)
Last update: 08/30/2007