TY - GEN
T1 - Candidates vs. noises estimation for large multi-class classification problem
AU - Han, Lei
AU - Huang, Yiheng
AU - Zhang, Tong
N1 - Publisher Copyright:
© 2018 by authors.All right reserved.
PY - 2018
Y1 - 2018
N2 - This paper proposes a method for multi-class classification problems, where the number of classes K is large. The method, referred to as Candidates vs. Noises Estimation (CANE), selects a small subset of candidate classes and samples the remaining classes. We show that CANE is always consistent and computationally efficient. Moreover, the resulting estimator has low statistical variance approaching that of the maximum likelihood estimator, when the observed label belongs to the selected candidates with high probability. In practice, we use a tree structure with leaves as classes to promote fast beam search for candidate selection. We further apply the CANE method to estimate word probabilities in learning large neural language models. Extensive experimental results show that CANE achieves better prediction accuracy over the Noise-Contrastive Estimation (NCE), its variants and a number of the state-ofthe-art tree classifiers, while it gains significant speedup compared to standard O(K) methods.
AB - This paper proposes a method for multi-class classification problems, where the number of classes K is large. The method, referred to as Candidates vs. Noises Estimation (CANE), selects a small subset of candidate classes and samples the remaining classes. We show that CANE is always consistent and computationally efficient. Moreover, the resulting estimator has low statistical variance approaching that of the maximum likelihood estimator, when the observed label belongs to the selected candidates with high probability. In practice, we use a tree structure with leaves as classes to promote fast beam search for candidate selection. We further apply the CANE method to estimate word probabilities in learning large neural language models. Extensive experimental results show that CANE achieves better prediction accuracy over the Noise-Contrastive Estimation (NCE), its variants and a number of the state-ofthe-art tree classifiers, while it gains significant speedup compared to standard O(K) methods.
UR - http://www.scopus.com/inward/record.url?scp=85057268496&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85057268496&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85057268496
T3 - 35th International Conference on Machine Learning, ICML 2018
SP - 3013
EP - 3029
BT - 35th International Conference on Machine Learning, ICML 2018
A2 - Dy, Jennifer
A2 - Krause, Andreas
PB - International Machine Learning Society (IMLS)
T2 - 35th International Conference on Machine Learning, ICML 2018
Y2 - 10 July 2018 through 15 July 2018
ER -