TY - JOUR
T1 - Efficient gradient boosting for prognostic biomarker discovery
AU - Li, Kaiqiao
AU - Yao, Sijie
AU - Zhang, Zhenyu
AU - Cao, Biwei
AU - Wilson, Christopher M.
AU - Kalos, Denise
AU - Kuan, Pei Fen
AU - Zhu, Ruoqing
AU - Wang, Xuefeng
N1 - Publisher Copyright:
© 2022 The Author(s) 2022. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: [email protected].
PY - 2022/3/15
Y1 - 2022/3/15
N2 - Motivation: A gradient boosting decision tree (GBDT) is a powerful ensemble machine-learning method that has the potential to accelerate biomarker discovery from high-dimensional molecular data. Recent algorithmic advances, such as extreme gradient boosting (XGB) and light gradient boosting (LGB), have rendered the GBDT training more efficient, scalable and accurate. However, these modern techniques have not yet been widely adopted in discovering biomarkers for censored survival outcomes, which are key clinical outcomes or endpoints in cancer studies. Results: In this paper, we present a new R package 'Xsurv' as an integrated solution that applies two modern GBDT training frameworks namely, XGB and LGB, for the modeling of right-censored survival outcomes. Based on our simulations, we benchmark the new approaches against traditional methods including the stepwise Cox regression model and the original gradient boosting function implemented in the package 'gbm'. We also demonstrate the application of Xsurv in analyzing a melanoma methylation dataset. Together, these results suggest that Xsurv is a useful and computationally viable tool for screening a large number of prognostic candidate biomarkers, which may facilitate future translational and clinical research.
AB - Motivation: A gradient boosting decision tree (GBDT) is a powerful ensemble machine-learning method that has the potential to accelerate biomarker discovery from high-dimensional molecular data. Recent algorithmic advances, such as extreme gradient boosting (XGB) and light gradient boosting (LGB), have rendered the GBDT training more efficient, scalable and accurate. However, these modern techniques have not yet been widely adopted in discovering biomarkers for censored survival outcomes, which are key clinical outcomes or endpoints in cancer studies. Results: In this paper, we present a new R package 'Xsurv' as an integrated solution that applies two modern GBDT training frameworks namely, XGB and LGB, for the modeling of right-censored survival outcomes. Based on our simulations, we benchmark the new approaches against traditional methods including the stepwise Cox regression model and the original gradient boosting function implemented in the package 'gbm'. We also demonstrate the application of Xsurv in analyzing a melanoma methylation dataset. Together, these results suggest that Xsurv is a useful and computationally viable tool for screening a large number of prognostic candidate biomarkers, which may facilitate future translational and clinical research.
UR - http://www.scopus.com/inward/record.url?scp=85126569746&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85126569746&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btab869
DO - 10.1093/bioinformatics/btab869
M3 - Article
C2 - 34978570
AN - SCOPUS:85126569746
SN - 1367-4803
VL - 38
SP - 1631
EP - 1638
JO - Bioinformatics
JF - Bioinformatics
IS - 6
ER -