Benjamin Snyder

Multilingual Learning for Unsupervised Linguistic Analysis

 

For centuries, the deep connection between human languages has fascinated linguists, anthropologists and historians. The study of this connection has made major discoveries about human communication possible: it has revealed the evolution of languages, facilitated the reconstruction of proto-languages, and led to understanding language universals. The connection between languages should be a powerful source of information for automatic linguistic analysis as well. In this line of work I investigate two questions: (i) Can we exploit cross-lingual correspondences to improve unsupervised language learning? (ii) Will this joint analysis provide more or less benefit when the languages belong to the same family?

I will present multilingual generative unsupervised models for morphological segmentation and part-of-speech tagging. In both instances we model the multilingual data as arising through a combination of language-independent and language-specific probabilistic processes. This feature allows the model to identify and learn from recurring cross-lingual patterns to improve prediction accuracy in each language.

I will also discuss ongoing work on the unsupervised decoding of the ancient Ugaritic script using data from related (Semitic) languages.

 

 

 

 

 

Official inquiries about AIIS should be directed to Alexandre Klementiev (klementi AT uiuc DOT edu)
Last update: 01/22/2008