Unsupervised and Semi-Supervised Learning

[ Overview | Publications ]

Overview:

Supervised learning strategies are costly in terms of resources. However, one can often reduce costs--make use of only a small amount of labeled data along with a large pool of unlabeled examples--by exploiting regularities present in the data and, possibly, domain specific information. We investigate semi-supervised and unsupervised learning methods to minimize the need for supervision in a variety of learning protocols and multiple NLP problems.

Details:

Examples of learning protocols and problems studied include:

  • We propose a general mathematical and algorithmic framework for unsupervised rank aggregation which can be used to combine rankings over objects. See the Learning to Rank project.
  • We exploit temporal alignment between two sides of the bilingual corpus to derive a nearly unsupervised learning algorithm for trasliteration and automatic discovery of Named Entities in the resource poor language. See the Transliteration project.
  • We develop an unsupervised, bootstrapped learning approach to context sensitive lexical paraphrasing. See a recent paper.
  • We suggest a method for incorporating domain knowledge in the context of Constrained Conditional Models. Domain knowledge is encoded in the form of task specific constraints into semi-supervised learning algorithms. See a recent paper.

Publications: