Cognitive Computation Group

Tutorial: AAAI-16: Learning and Inference in Structured Prediction Models

Instructors

Kai-Wei Chang, Gourab Kundu, Dan Roth, and Vivek Srikumar.

Date and Time

2:00pm-6:00pm, Feb. 13, 2016.

Introduction:

Many prediction problems required structured decisions. That is, the goal is to assign values to multiple interdependent variables. The relationships between the output variables could represent a sequence, a set of clusters, or in the general case, a graph. When solving these problems, it is important to make consistent decisions that take the interdependencies among output variables into account. Such problems are often referred to as structured prediction problems. In past decades, multiple structured prediction models have been proposed and studied and success has been demonstrated in a range of applications, including natural language processing, information extraction, computer vision and computational biology. However, the high computational cost often limits both models' expressive power and the size of the data that can be handled. Therefore, designing efficient inference and learning algorithms for these models is a key challenge for structured prediction.

In this tutorial, we will focus on recent developments in discriminative structured prediction models such as Structured SVMs and Structured Perceptron. Beyond introducing the algorithmic approaches in this domain, we will discuss ideas that result in significant improvements both in the learning and in the inference stages of these algorithms. In particular, we will discuss the use of caching techniques to reuse computations and methods for decomposing complex structures, along with learning procedures that make use of it to simplify the learning stage. We will also present a recently proposed formulation that captures similarities between structured labels by using distributed representation. Participants will learn about existing trends in learning and the inference for the structured prediction models, recent tools developed in this area, and how they can be applied to AI applications.

Tutorial Outline:

Introduction [60 min]
handout

We will present several AI applications with structured outputs to motivate the need for structured prediction models. We then present Constrained Conditional Models (CCM), a framework that is used to model interdependencies between output variables using constrains and features. We then discuss how to formulate an inference problem as an Integer Linear Programming. We will also describe several paradigms to learn the parameters of a CCM. Necessary background about structured prediction will be provided in this section.

Efficient Learning for Structured Prediction Models [45 min]
handout

We will first discuss global learning v.s. local learning. We then describe several structured learning approaches such as Structured SVMs and Structured Perceptron. Next, We describe efficient learning algorithms for Structured SVMs based on a dual coordinate descent method. Finally, we will present methods that make use of amortized inference techniques.

Coffee Break

Amortized Inference for Structured Prediction Models [45 min]
handout

We will describe a recently developed technique, amortized inference, for speeding up inference for structured prediction models by caching previous inference samples. We will also discuss how to further improve the amortized inference techniques by incorporating a dual decomposition approach which decomposes the output structure and makes use of Lagrangian relaxation methods.

Distributed Representation for Structured Prediction [30 min]
handout

We will present a recently proposed structured learning formulation, Distro, which models meaning of labels using real valued vectors. We will also describe inference and learning algorithms in this model.

Structured Prediction Software [15 min]
handout

We will introduce IllinoisSL -- a Java based discriminative structured learning library. We will use the problem of part-of-speech tagging as a running example and demonstrate how to implement a sequential tagging model using the library.

Conclusion and Future Research Directions [15 min]
handout

Structured prediction models are widely used in AI. Therefore, designing efficient learning and inference algorithms for them could have a great impact. We will conclude the tutorial by presenting some challenges and potential research topics in designing and applying structured prediction models.

Resources:

Tutorial syllabus

Tutorial slides

Reference (.bib)

Instructors' bio:

Kai-Wei Chang is a post-doctoral researcher at Microsoft Research. He will be joining the Department of Computer Science at the University of Virginia as Assistant Professor in the Fall 2016. His research interests are in designing practical machine learning techniques for large and complex data, and applying them to real-world applications. He has been working on various topics in Machine Learning and Natural Language Processing, including large-scale classification, structured learning, co-reference resolution, and relation extraction. He has been also involved in developing machine learning packages such as LIBLINEAR, Vowpal Wabbit, and Illinois-SL. He was awarded the KDD Best Paper Award in 2010 and won the Yahoo! Key Scientific Challenges Award in 2011.

Gourab Kundu is a research staff member at IBM research. He is broadly interested in all aspects of machine learning and natural language processing. He has publications in top tier machine learning and natural language processing conferences along with a best student paper in CoNLL 2011.

Dan Roth is a Professor in the Department of Computer Science and the Beckman Institute at the University of Illinois at Urbana-Champaign and a University of Illinois Scholar. Roth is a Fellow of the AAAS, ACM, AAAI and ACL, for his contributions to Machine Learning and to Natural Language Processing. He has published broadly in machine learning, natural language processing, knowledge representation and reasoning, and learning theory, and has developed advanced machine learning based tools for natural language applications that are being used widely by the research community and commercially. Roth is the Editor-in-Chief of the Journal of Artificial Intelligence Research (JAIR) and has served on the editorial board of several of the major journals in his research areas. He was the program chair of AAAI'11, ACL'03 and CoNLL'02 and serves regularly as an area chair and senior program committee member in the major conferences in his research areas. Prof. Roth received his B.A Summa cum laude in Mathematics from the Technion, Israel, and his Ph.D in Computer Science from Harvard University in 1995.

Vivek Srikumar is an assistant professor in the School of Computing at the University of Utah. His research interests are in the areas of machine learning and natural learning processing in the context of structured learning and prediction. His research has primarily been driven by questions arising from the need to learn structured representations of text using little or indirect supervision and to scale inference to large problems. His work has been published in various AI, NLP and machine learning venues and recently received the best paper award at EMNLP. Previously, he obtained his Ph.D. from the University of Illinois at Urbana-Champaign in 2013 and was a post-doctoral scholar at Stanford University. \