Unsupervised and Semi-Supervised Learning
[
Overview |
Publications ]
Overview:
Supervised learning strategies are costly in terms of resources. However, one can often reduce costs--make use of only a small amount of labeled data along with a large pool of unlabeled examples--by exploiting regularities present in the data and, possibly, domain specific information. We investigate semi-supervised and unsupervised learning methods to minimize the need for supervision in a variety of learning protocols and multiple NLP problems.
Details:
Examples of learning protocols and problems studied include:
- We propose a general mathematical and algorithmic framework for unsupervised rank aggregation which can be used to combine rankings over objects. See the Learning to Rank project.
- We exploit temporal alignment between two sides of the bilingual corpus to derive a nearly unsupervised learning algorithm for trasliteration and automatic discovery of Named Entities in the resource poor language. See the Transliteration project.
- We develop an unsupervised, bootstrapped learning approach to context sensitive lexical paraphrasing. See a recent paper.
- We suggest a method for incorporating domain knowledge in the context of Constrained Conditional Models. Domain knowledge is encoded in the form of task specific constraints into semi-supervised learning algorithms. See a recent paper.