Learning to Rank

[ Overview | Participants | Software | Publications ]

Overview:

The need to meaningfully combine sets of rankings often arises in ranked data sets. Although a number of heuristic and supervised learning approaches to rank aggregation exist, they require domain knowledge or supervised ranked data, both of which are expensive to acquire. We investigate learning methods for aggregation of (partial) rankings without supervision.

Details:

Suppose that each member of a panel of judges independently generates a (partial) ranking over a set of objects, and assume that each judge tries to reproduce a true underlying ranking according to the degree of their expertise. This setting often arises in Information Retrieval (IR) and Natural Language Processing (NLP), among other areas, and one needs to meaningfully combine such expert opinions into an aggregate ranking. For example, in meta-search, the aim is to aggregate a set of Web-search query results produced by multiple engines. In machine translation (MT), combining outputs of multiple MT systems based on different principles may produce a better translation.

One impediment to solving rank aggregation tasks is the high cost associated with acquiring full or partial preference information, reducing the utility of supervised approaches. We propose a general mathematical framework for unsupervised rank aggregation which can be used to learn to combine rankings over objects. We investigate the effectiveness of the framework for various types of (partial) orders.

In an earlier line of work, we studied generalization properties of the area under the ROC curve (AUC). We define the expected accuracy of a ranking function, and derive distribution-free probabilistic bounds on the deviation of the empirical AUC of a ranking function (observed on a finite data sequence) from its expected accuracy.

Participants:

Dan Roth, Shivani Agarwal, Kevin Small, Alex Klementiev

Publications: