Xavier Carreras

Full Syntactic Parsing with Dynamic Programming and the Perceptron

 

I will describe an efficient, expressive model for natural language parsing that makes use of the perceptron algorithm in conjunction with dynamic programming search. An advantage of dynamic programming approaches is that a very large set of possible parsing structures can be considered for a given sentence. Plus, the discriminative nature of our model allows a great deal of flexibility in representing candidate parse trees for a sentence, including PCFG-based features, dependency relations and features on the surface of the sentence.

A critical problem when training a discriminative model for a task of the scale of natural language parsing is the efficiency of the parsing algorithm involved. Discriminative training algorithms, such as the averaged perceptron, require to repeatedly parse the training set under the model. But factors such as the sentence length or the number of syntactic labels in the parses, which directly affect the efficiency of the algorithms, are usually very large in real parsing tasks, making exact inference virtually intractable. As a result, in spite of the potential advantages of discriminative parsing methods, there has been little previous work for full constituent parsing without the use of fairly severe restrictions.

I will show that training models for full constituent parsing is possible, using a formalism closely related to Tree Adjoining Grammars that can be efficiently parsed with dependency parsing algorithms. Key to the efficiency of our approach is the use of a lower-order model for pruning the space of full parsing structures, and that performs remarkably well. Experiments on parsing the Penn WSJ treebank show that the model gives state-of-the-art accuracy.

Joint work with Michael Collins and Terry Koo.

 

 

 

 

 

Official inquiries about AIIS should be directed to Alexandre Klementiev (klementi AT uiuc DOT edu)
Last update: 01/22/2008