TY - GEN
T1 - Gradient boosted decision trees for high dimensional sparse output
AU - Si, Si
AU - Zhang, Huan
AU - Keerthi, S. Sathiya
AU - Mahajan, Dhruv
AU - Dhillon, Inderjit S.
AU - Hsieh, Cho Jui
N1 - Publisher Copyright:
Copyright © 2017 by the authors.
PY - 2017
Y1 - 2017
N2 - In this paper, wc study the gradient boosted decision trees (GBDT) when the output space is high dimensional and sparse. For example, in multilabel classification, the output space is a L-dimensional 0/1 vector, where L is number of labels that can grow to millions and beyond in many modern applications. We show that vanilla GBDT can easily run out of memory or encounter near-forever running time in this regime, and propose a new GBDT variant, GBDT-SPARSE, to resolve this problem by employing L0 regularization. We then discuss in detail how to utilize this sparsity to conduct GBDT training, including splitting the nodes, comput-ing the sparse residual, and predicting in sub-linear time. Finally, we apply our algorithm to extreme multilabel classification problems, and show that the proposed GBDT-SPARSE achieves an order of magnitude improvements in model size and prediction time over existing methods, while yielding similar performance.
AB - In this paper, wc study the gradient boosted decision trees (GBDT) when the output space is high dimensional and sparse. For example, in multilabel classification, the output space is a L-dimensional 0/1 vector, where L is number of labels that can grow to millions and beyond in many modern applications. We show that vanilla GBDT can easily run out of memory or encounter near-forever running time in this regime, and propose a new GBDT variant, GBDT-SPARSE, to resolve this problem by employing L0 regularization. We then discuss in detail how to utilize this sparsity to conduct GBDT training, including splitting the nodes, comput-ing the sparse residual, and predicting in sub-linear time. Finally, we apply our algorithm to extreme multilabel classification problems, and show that the proposed GBDT-SPARSE achieves an order of magnitude improvements in model size and prediction time over existing methods, while yielding similar performance.
UR - http://www.scopus.com/inward/record.url?scp=85048484915&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85048484915&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85048484915
T3 - 34th International Conference on Machine Learning, ICML 2017
SP - 4899
EP - 4908
BT - 34th International Conference on Machine Learning, ICML 2017
PB - International Machine Learning Society (IMLS)
T2 - 34th International Conference on Machine Learning, ICML 2017
Y2 - 6 August 2017 through 11 August 2017
ER -