TY - JOUR
T1 - Dimension Reduction Forests
T2 - Local Variable Importance Using Structured Random Forests
AU - Loyal, Joshua Daniel
AU - Zhu, Ruoqing
AU - Cui, Yifan
AU - Zhang, Xin
N1 - Publisher Copyright:
© 2022 American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America.
PY - 2022
Y1 - 2022
N2 - Random forests are one of the most popular machine learning methods due to their accuracy and variable importance assessment. However, random forests only provide variable importance in a global sense. There is an increasing need for such assessments at a local level, motivated by applications in personalized medicine, policy-making, and bioinformatics. We propose a new nonparametric estimator that pairs the flexible random forest kernel with local sufficient dimension reduction to adapt to a regression function’s local structure. This allows us to estimate a meaningful directional local variable importance measure at each prediction point. We develop a computationally efficient fitting procedure and provide sufficient conditions for the recovery of the splitting directions. We demonstrate significant accuracy gains of our proposed estimator over competing methods on simulated and real regression problems. Finally, we apply the proposed method to seasonal particulate matter concentration data collected in Beijing, China, which yields meaningful local importance measures. The methods presented here are available in the drforest Python package. Supplementary materials for this article are available online.
AB - Random forests are one of the most popular machine learning methods due to their accuracy and variable importance assessment. However, random forests only provide variable importance in a global sense. There is an increasing need for such assessments at a local level, motivated by applications in personalized medicine, policy-making, and bioinformatics. We propose a new nonparametric estimator that pairs the flexible random forest kernel with local sufficient dimension reduction to adapt to a regression function’s local structure. This allows us to estimate a meaningful directional local variable importance measure at each prediction point. We develop a computationally efficient fitting procedure and provide sufficient conditions for the recovery of the splitting directions. We demonstrate significant accuracy gains of our proposed estimator over competing methods on simulated and real regression problems. Finally, we apply the proposed method to seasonal particulate matter concentration data collected in Beijing, China, which yields meaningful local importance measures. The methods presented here are available in the drforest Python package. Supplementary materials for this article are available online.
KW - Random forests
KW - Sufficient dimension reduction
KW - Variable importance
UR - http://www.scopus.com/inward/record.url?scp=85131091276&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85131091276&partnerID=8YFLogxK
U2 - 10.1080/10618600.2022.2069777
DO - 10.1080/10618600.2022.2069777
M3 - Article
AN - SCOPUS:85131091276
SN - 1061-8600
VL - 31
SP - 1104
EP - 1113
JO - Journal of Computational and Graphical Statistics
JF - Journal of Computational and Graphical Statistics
IS - 4
ER -