Multi-Relational Latent Semantic Analysis for Lexical Semantics and Knowledge Base Embedding

Scott Yih

Microsoft Research

 

Slides.pptx

Abstract
In this talk, I will introduce our recent work on continuous-space word representations and knowledge base embedding. All the models presented in this talk can be viewed as different levels of generalization of latent semantic analysis (LSA). The first model, polarity-inducing latent semantic analysis (PILSA), extends LSA by inducing polarity information [1]. As a result, PILSA is able to go beyond simple word similarity and can measure the degree of both word synonymy and antonymy. The second model, multi-relational latent semantic analysis (MRLSA), combines multiple relations between words by constructing a 3-way tensor [2]. Similar to LSA, a low-rank approximation of the tensor is derived using tensor decomposition. Each word in the vocabulary is thus represented by a vector in the latent semantic space and each relation is captured by a latent square matrix. The degree of two words having a specific relation can then be measured through simple linear algebraic operations. Finally, the third model, TRESCAL, applies the idea of MRLSA to knowledge base embedding by incorporating entity type information [3]. Consequently, our learning algorithm is significantly faster than previous approaches and is better able to discover new relations missing from the database. All these models not only have shown state-of-the-art performance on tasks like answering GRE closest-opposite questions and knowledge base completion, but also have proven useful in end applications such as question answering and relation extraction.

[1] Yih, Zweig & Platt. Polarity Inducing Latent Semantic Analysis. In EMNLP-CoNLL-12.
[2] Chang, Yih & Meek. Multi-Relational Latent Semantic Analysis. In EMNLP-13.
[3] Chang, Yih, Yang & Meek. Typed Tensor Decomposition of Knowledge Bases for Relation Extraction. In EMNLP-14.


Bio:
Scott Wen-tau Yih is a Researcher in the Machine Learning Group at Microsoft Research Redmond. His research interests include natural language processing, machine learning and information retrieval. Yih received his Ph.D. in computer science at the University of Illinois at Urbana-Champaign. His work on joint inference using integer linear programming (ILP) [Roth & Yih, 2004] helped the UIUC team win the CoNLL-05 shared task on semantic role labeling, and the approach has been widely adopted in the NLP community. After joining MSR in 2005, he has worked on email spam filtering, keyword extraction and search & ad relevance. His recent work focuses on continuous semantic representations using neural networks and matrix/tensor decomposition methods, with applications in lexical semantics and question answering. Yih received the best paper award from CoNLL-2011 and has served as area chairs (HLT-NAACL-12, ACL-14) and program co-chairs (CEAS-09, CoNLL-14) in recent years.