Abstract - Natural Language Processing ||
Scaling Markov Logic to 500 Million Documents

University of Wisconsin-Madison

 

The main question driving my research is: how does one deploy statistical data-analysis tools to enhance data-driven systems? Our goal is to find abstractions that one needs to deploy and maintain such systems. My group is attacking this question by building a diverse set of statistical data-driven applications: a Machine Reading system whose goal is to read the Web and answer complex questions, a muon detector for a neutrino telescope called IceCube in collaboration with physicists, and querying over rich content (OCR and speech data) in collaboration with social-scientists. Even in this diverse set, we have found common abstractions that we are exploiting to build systems.

In the technical portion of the talk, I will discuss our framework for a language called Markov Logic (think: logic queries with weights) that we have been using to build text-processing applications. A key feature of Markov Logic is that it allows the developer to express rules that are likely, but not certain, to be correct. This has allowed Markov-Logic-based approaches to achieve state-of-the-art quality on challenging tasks like Machine Reading. A key challenge is that previous implementations of Markov Logic have been confined to hundreds of documents; in contrast, I will describe the techniques that we use to process over 500 million documents with Markov Logic.

Video demos, papers, software, virtual machines containing installations of our software with data, and links to applications that are discussed in this talk are available from http://www.cs.wisc.edu/hazy.