Gradient boosted decision trees for high dimensional sparse output

Si Si, Huan Zhang, S. Sathiya Keerthi, Dhruv Mahajan, Inderjit S. Dhillon, Cho Jui Hsieh

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, wc study the gradient boosted decision trees (GBDT) when the output space is high dimensional and sparse. For example, in multilabel classification, the output space is a L-dimensional 0/1 vector, where L is number of labels that can grow to millions and beyond in many modern applications. We show that vanilla GBDT can easily run out of memory or encounter near-forever running time in this regime, and propose a new GBDT variant, GBDT-SPARSE, to resolve this problem by employing L0 regularization. We then discuss in detail how to utilize this sparsity to conduct GBDT training, including splitting the nodes, comput-ing the sparse residual, and predicting in sub-linear time. Finally, we apply our algorithm to extreme multilabel classification problems, and show that the proposed GBDT-SPARSE achieves an order of magnitude improvements in model size and prediction time over existing methods, while yielding similar performance.

Original languageEnglish (US)
Title of host publication34th International Conference on Machine Learning, ICML 2017
PublisherInternational Machine Learning Society (IMLS)
Pages4899-4908
Number of pages10
ISBN (Electronic)9781510855144
StatePublished - 2017
Externally publishedYes
Event34th International Conference on Machine Learning, ICML 2017 - Sydney, Australia
Duration: Aug 6 2017Aug 11 2017

Publication series

Name34th International Conference on Machine Learning, ICML 2017
Volume7

Other

Other34th International Conference on Machine Learning, ICML 2017
Country/TerritoryAustralia
CitySydney
Period8/6/178/11/17

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Human-Computer Interaction
  • Software

Fingerprint

Dive into the research topics of 'Gradient boosted decision trees for high dimensional sparse output'. Together they form a unique fingerprint.

Cite this