Abstract - Information Extraction || Heng Ji

City University of New York


Leveraging Redundancy for Cross-Source Information Extraction, Fusion and Inference

One of the initial goals for Information Extraction (IE) was to create a knowledge base from the entire input corpus, such as a profile or a series of activities about any entity, and allow further logical reasoning on the knowledge base. In practice, such information may be scattered among a variety of sources (large-scale documents, languages, genres and data modalities). This requires the ability to identify topically-related documents and to integrate facts, possibly redundant, possibly complementary, possibly in conflict, coming from these documents. Unfortunately the knowledge base constructed from a typical IE pipeline often contains lots of erroneous and conflicting facts. Interestingly, when the data grows beyond some certain size, the extracted facts become inter-dependent and thus we can take advantage of information redundancy to conduct reasoning across sources and improve the performance of IE. This talk will describe and compare four general frameworks to leverange reundancy based on Information Networks to conduct more complete information fusion and robust inference. Experiments on cross-document, cross-lingual and cross-media IE will be presented and discussed.

Heng Ji is an assistant professor and doctoral faculty in Computer Science at Queens College and the Graduate Center of City University of New York. She received her Ph.D. in Computer Science from New York University in 2007. Her research interests focus on Information Extraction and Knowledge Discovery. She was the recipient of NSF CAREER Award in 2010. She is the coordinator of the NIST TAC Knowledge Base Population task in 2010 and 2011, and the IE area chair of NAACL-HLT2012.