Statistical analysis of Bayes optimal subset ranking

David Cossock, Tong Zhang

Research output: Contribution to journalArticlepeer-review

Abstract

The ranking problem has become increasingly important in modern applications of statistical methods in automated decision making systems. In particular, we consider a formulation of the statistical ranking problem which we call subset ranking, and focus on the discounted cumulated gain (DCG) criterion that measures the quality of items near the top of the rank-list. Similar to error minimization for binary classification, direct optimization of natural ranking criteria such as DCG leads to a nonconvex optimization problems that can be NP-hard. Therefore, a computationally more tractable approach is needed. We present bounds that relate the approximate optimization of DCG to the approximate minimization of certain regression errors. These bounds justify the use of convex learning formulations for solving the subset ranking problem. The resulting estimation methods are not conventional, in that we focus on the estimation quality in the top-portion of the rank-list. We further investigate the asymptotic statistical behavior of these formulations. Under appropriate conditions, the consistency of the estimation schemes with respect to the DCG metric can be derived.

Original languageEnglish (US)
Pages (from-to)5140-5154
Number of pages15
JournalIEEE Transactions on Information Theory
Volume54
Issue number11
DOIs
StatePublished - 2008
Externally publishedYes

Keywords

  • Bayes optimal
  • Consistency
  • Convex surrogate
  • Ranking

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Library and Information Sciences

Fingerprint

Dive into the research topics of 'Statistical analysis of Bayes optimal subset ranking'. Together they form a unique fingerprint.

Cite this