Optimal sample complexity of M-wise Data for Top-K ranking

Minje Jang, Sunghyun Kim, Changho Suh, Sewoong Oh

Research output: Contribution to journalConference article

Abstract

We explore the top-K rank aggregation problem in which one aims to recover a consistent ordering that focuses on top-K ranked items based on partially revealed preference information. We examine an M-wise comparison model that builds on the Plackett-Luce (PL) model where for each sample, M items are ranked according to their perceived utilities modeled as noisy observations of their underlying true utilities. As our result, we characterize the minimax optimality on the sample size for top-K ranking. The optimal sample size turns out to be inversely proportional to M. We devise an algorithm that effectively converts M-wise samples into pairwise ones and employs a spectral method using the refined data. In demonstrating its optimality, we develop a novel technique for deriving tight ℓ estimation error bounds, which is key to accurately analyzing the performance of top-K ranking algorithms, but has been challenging. Recent work relied on an additional maximum-likelihood estimation (MLE) stage merged with a spectral method to attain good estimates in ℓ error to achieve the limit for the pairwise model. In contrast, although it is valid in slightly restricted regimes, our result demonstrates a spectral method alone to be sufficient for the general M-wise model. We run numerical experiments using synthetic data and confirm that the optimal sample size decreases at the rate of 1/M. Moreover, running our algorithm on real-world data, we find that its applicability extends to settings that may not fit the PL model.

Original languageEnglish (US)
Pages (from-to)1687-1697
Number of pages11
JournalAdvances in Neural Information Processing Systems
Volume2017-December
StatePublished - Jan 1 2017
Event31st Annual Conference on Neural Information Processing Systems, NIPS 2017 - Long Beach, United States
Duration: Dec 4 2017Dec 9 2017

Fingerprint

Maximum likelihood estimation
Error analysis
Agglomeration
Experiments

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Cite this

Jang, M., Kim, S., Suh, C., & Oh, S. (2017). Optimal sample complexity of M-wise Data for Top-K ranking. Advances in Neural Information Processing Systems, 2017-December, 1687-1697.

Optimal sample complexity of M-wise Data for Top-K ranking. / Jang, Minje; Kim, Sunghyun; Suh, Changho; Oh, Sewoong.

In: Advances in Neural Information Processing Systems, Vol. 2017-December, 01.01.2017, p. 1687-1697.

Research output: Contribution to journalConference article

Jang, M, Kim, S, Suh, C & Oh, S 2017, 'Optimal sample complexity of M-wise Data for Top-K ranking', Advances in Neural Information Processing Systems, vol. 2017-December, pp. 1687-1697.
Jang, Minje ; Kim, Sunghyun ; Suh, Changho ; Oh, Sewoong. / Optimal sample complexity of M-wise Data for Top-K ranking. In: Advances in Neural Information Processing Systems. 2017 ; Vol. 2017-December. pp. 1687-1697.
@article{e1b7f27c219447c8ad6678577e5817d4,
title = "Optimal sample complexity of M-wise Data for Top-K ranking",
abstract = "We explore the top-K rank aggregation problem in which one aims to recover a consistent ordering that focuses on top-K ranked items based on partially revealed preference information. We examine an M-wise comparison model that builds on the Plackett-Luce (PL) model where for each sample, M items are ranked according to their perceived utilities modeled as noisy observations of their underlying true utilities. As our result, we characterize the minimax optimality on the sample size for top-K ranking. The optimal sample size turns out to be inversely proportional to M. We devise an algorithm that effectively converts M-wise samples into pairwise ones and employs a spectral method using the refined data. In demonstrating its optimality, we develop a novel technique for deriving tight ℓ∞ estimation error bounds, which is key to accurately analyzing the performance of top-K ranking algorithms, but has been challenging. Recent work relied on an additional maximum-likelihood estimation (MLE) stage merged with a spectral method to attain good estimates in ℓ∞ error to achieve the limit for the pairwise model. In contrast, although it is valid in slightly restricted regimes, our result demonstrates a spectral method alone to be sufficient for the general M-wise model. We run numerical experiments using synthetic data and confirm that the optimal sample size decreases at the rate of 1/M. Moreover, running our algorithm on real-world data, we find that its applicability extends to settings that may not fit the PL model.",
author = "Minje Jang and Sunghyun Kim and Changho Suh and Sewoong Oh",
year = "2017",
month = "1",
day = "1",
language = "English (US)",
volume = "2017-December",
pages = "1687--1697",
journal = "Advances in Neural Information Processing Systems",
issn = "1049-5258",

}

TY - JOUR

T1 - Optimal sample complexity of M-wise Data for Top-K ranking

AU - Jang, Minje

AU - Kim, Sunghyun

AU - Suh, Changho

AU - Oh, Sewoong

PY - 2017/1/1

Y1 - 2017/1/1

N2 - We explore the top-K rank aggregation problem in which one aims to recover a consistent ordering that focuses on top-K ranked items based on partially revealed preference information. We examine an M-wise comparison model that builds on the Plackett-Luce (PL) model where for each sample, M items are ranked according to their perceived utilities modeled as noisy observations of their underlying true utilities. As our result, we characterize the minimax optimality on the sample size for top-K ranking. The optimal sample size turns out to be inversely proportional to M. We devise an algorithm that effectively converts M-wise samples into pairwise ones and employs a spectral method using the refined data. In demonstrating its optimality, we develop a novel technique for deriving tight ℓ∞ estimation error bounds, which is key to accurately analyzing the performance of top-K ranking algorithms, but has been challenging. Recent work relied on an additional maximum-likelihood estimation (MLE) stage merged with a spectral method to attain good estimates in ℓ∞ error to achieve the limit for the pairwise model. In contrast, although it is valid in slightly restricted regimes, our result demonstrates a spectral method alone to be sufficient for the general M-wise model. We run numerical experiments using synthetic data and confirm that the optimal sample size decreases at the rate of 1/M. Moreover, running our algorithm on real-world data, we find that its applicability extends to settings that may not fit the PL model.

AB - We explore the top-K rank aggregation problem in which one aims to recover a consistent ordering that focuses on top-K ranked items based on partially revealed preference information. We examine an M-wise comparison model that builds on the Plackett-Luce (PL) model where for each sample, M items are ranked according to their perceived utilities modeled as noisy observations of their underlying true utilities. As our result, we characterize the minimax optimality on the sample size for top-K ranking. The optimal sample size turns out to be inversely proportional to M. We devise an algorithm that effectively converts M-wise samples into pairwise ones and employs a spectral method using the refined data. In demonstrating its optimality, we develop a novel technique for deriving tight ℓ∞ estimation error bounds, which is key to accurately analyzing the performance of top-K ranking algorithms, but has been challenging. Recent work relied on an additional maximum-likelihood estimation (MLE) stage merged with a spectral method to attain good estimates in ℓ∞ error to achieve the limit for the pairwise model. In contrast, although it is valid in slightly restricted regimes, our result demonstrates a spectral method alone to be sufficient for the general M-wise model. We run numerical experiments using synthetic data and confirm that the optimal sample size decreases at the rate of 1/M. Moreover, running our algorithm on real-world data, we find that its applicability extends to settings that may not fit the PL model.

UR - http://www.scopus.com/inward/record.url?scp=85047017887&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85047017887&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85047017887

VL - 2017-December

SP - 1687

EP - 1697

JO - Advances in Neural Information Processing Systems

JF - Advances in Neural Information Processing Systems

SN - 1049-5258

ER -