|
Abstract - Information Extraction || Heng Ji
City University of New York
Leveraging Redundancy for Cross-Source Information Extraction, Fusion and Inference
One of the initial goals for Information Extraction (IE) was to create a knowledge base from the entire input corpus, such as a profile or a series of activities about any entity, and allow further logical reasoning on the knowledge base. In practice, such information may be scattered among a variety of sources (large-scale documents, languages, genres and data modalities). This requires the ability to identify topically-related documents and to integrate facts, possibly redundant, possibly complementary, possibly in conflict, coming from these documents. Unfortunately the knowledge base constructed from a typical IE pipeline often contains lots of erroneous and conflicting facts. Interestingly, when the data grows beyond some certain size, the extracted facts become inter-dependent and thus we can take advantage of information redundancy to conduct reasoning across sources and improve the performance of IE. This talk will describe and compare four general frameworks to leverange reundancy based on Information Networks to conduct more complete information fusion and robust inference. Experiments on cross-document, cross-lingual and cross-media IE will be presented and discussed.
|
|||||||||||