Text Understanding with Latent and Explicit Semantics

Yangqiu Song

UIUC

 

Abstract
Psychologist Gregory Murphy began his highly acclaimed book with the statement “Concepts are the glue that holds our mental world together.” It is also important for machines to understand large corpus of text data with abstractive concepts to better facilitate human analytics. In this talk, I will present my research experience in text understanding with latent and explicit semantic concepts. To extract latent semantics, I use non-parametric Bayesian methods, including evolutionary hierarchical Dirichlet processes and Bayesian rose trees, to analyze and visualize the evolutionary patterns of latent topics over time. To find explicit concepts of texts (especially short texts), I use a Web-scale, automatically constructed knowledgebase to conduct conceptualization. Then a piece of text can be mapped to a set of concepts with typicality scores. I will show the methodologies and experiments with several interesting examples and results.

Bio:
Dr. Yangqiu Song is a post-doctoral researcher at UIUC since October 2013. Before that, he was a post-doctoral fellow and visiting researcher at Huawei Noah's Ark Lab, Hong Kong (2012-2013), an associate researcher in Microsoft Research Asia (2010-2012) and a staff researcher in IBM Research China (2009-2010) respectively. He received his B.E. and PHD degree from Tsinghua University, China, in July 2003 and January 2009. He also worked as interns at IBM in 2006-2007 and at Google in 2007-2008. His current research focuses on using machine learning and data mining to extract and infer insightful knowledge from big data, including the techniques of large scale learning algorithms, natural language understanding, text mining, and knowledge engineering.