Abstract - Machine Learning || Chin-Yew Lin

Institute at University of Southern California



Question Answering - The Easy Way?

With the booming of user participating online services such as Yahoo! Answers, Facebook Questions, MSDN forums, and plain old online discussion forums, a lot of valuable knowledge have been accumulated. To explore the potential impact of these user generated content (UGC), we focus our research on utilizing established user communities, their networks, and their content to facilitate three activities: assimilation, dissemination, and elicitation of community knowledge.
In this talk, I will introduce recent progress on Scalable Question Answering and Distillation (SQuAD) - a question answering (QA) project aiming to crawl, index, and serve question and answer pairs at web scale. I will use question answering as an example to illustrate the approach that we are taking toward solving real world problems. Question answering has been an active research field in information retrieval and natural language processing. Despite the "success" of TREC QA track, large scale robust QA systems are yet to be found in the real world. Instead of taking a traditional QA approach, SQuAD focuses on mining and organizing existing QA pairs from the web. I will address six main challenges of the project and present our solutions. In particular, I will show the importance of question type, drill down on comparative questions, and summarize lessons learned from our participation of the NTCIR Pilot Community Question Answering track.

Slides <link