TY - JOUR
T1 - Optimal sample complexity of M-wise Data for Top-K ranking
AU - Jang, Minje
AU - Kim, Sunghyun
AU - Suh, Changho
AU - Oh, Sewoong
N1 - Funding Information:
This work was supported by Institute for Information & communications Technology Promotion(IITP) grant funded by the Korea government(MSIT) (2017-0-00694, Coding for High-Speed Distributed Networks).
Publisher Copyright:
© 2017 Neural information processing systems foundation. All rights reserved.
PY - 2017
Y1 - 2017
N2 - We explore the top-K rank aggregation problem in which one aims to recover a consistent ordering that focuses on top-K ranked items based on partially revealed preference information. We examine an M-wise comparison model that builds on the Plackett-Luce (PL) model where for each sample, M items are ranked according to their perceived utilities modeled as noisy observations of their underlying true utilities. As our result, we characterize the minimax optimality on the sample size for top-K ranking. The optimal sample size turns out to be inversely proportional to M. We devise an algorithm that effectively converts M-wise samples into pairwise ones and employs a spectral method using the refined data. In demonstrating its optimality, we develop a novel technique for deriving tight ℓ∞ estimation error bounds, which is key to accurately analyzing the performance of top-K ranking algorithms, but has been challenging. Recent work relied on an additional maximum-likelihood estimation (MLE) stage merged with a spectral method to attain good estimates in ℓ∞ error to achieve the limit for the pairwise model. In contrast, although it is valid in slightly restricted regimes, our result demonstrates a spectral method alone to be sufficient for the general M-wise model. We run numerical experiments using synthetic data and confirm that the optimal sample size decreases at the rate of 1/M. Moreover, running our algorithm on real-world data, we find that its applicability extends to settings that may not fit the PL model.
AB - We explore the top-K rank aggregation problem in which one aims to recover a consistent ordering that focuses on top-K ranked items based on partially revealed preference information. We examine an M-wise comparison model that builds on the Plackett-Luce (PL) model where for each sample, M items are ranked according to their perceived utilities modeled as noisy observations of their underlying true utilities. As our result, we characterize the minimax optimality on the sample size for top-K ranking. The optimal sample size turns out to be inversely proportional to M. We devise an algorithm that effectively converts M-wise samples into pairwise ones and employs a spectral method using the refined data. In demonstrating its optimality, we develop a novel technique for deriving tight ℓ∞ estimation error bounds, which is key to accurately analyzing the performance of top-K ranking algorithms, but has been challenging. Recent work relied on an additional maximum-likelihood estimation (MLE) stage merged with a spectral method to attain good estimates in ℓ∞ error to achieve the limit for the pairwise model. In contrast, although it is valid in slightly restricted regimes, our result demonstrates a spectral method alone to be sufficient for the general M-wise model. We run numerical experiments using synthetic data and confirm that the optimal sample size decreases at the rate of 1/M. Moreover, running our algorithm on real-world data, we find that its applicability extends to settings that may not fit the PL model.
UR - http://www.scopus.com/inward/record.url?scp=85047017887&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85047017887&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85047017887
SN - 1049-5258
VL - 2017-December
SP - 1687
EP - 1697
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
T2 - 31st Annual Conference on Neural Information Processing Systems, NIPS 2017
Y2 - 4 December 2017 through 9 December 2017
ER -