On variance estimation of random forests with Infinite-order U-statistics

Research output: Contribution to journalArticlepeer-review

Abstract

Infinite-order U-statistics (IOUS) have been used extensively in subbagging ensemble learning algorithms such as random forests to quantify its uncertainty. While normality results of IOUS have been studied extensively, its variance estimation and theoretical properties remain mostly un-explored. Existing approaches mainly utilize the leading term dominance property in the Hoeffding decomposition. However, such a view usually leads to biased estimation when the kernel size is large relative to sample size. On the other hand, while several unbiased estimators exist in the literature, their relationships and theoretical properties, (e.g., ratio consistency), have never been studied. These limitations lead to unguaranteed asymptotic coverage of constructed confidence intervals. To bridge these gaps in the literature, we propose a new view of the Hoeffding decomposition for variance estimation that leads to an unbiased estimator. Instead of leading term dominance, our view utilizes the dominance of the peak region. Moreover, we establish the connection and equivalence of our estimator with several existing unbiased variance estimators. Theoretically, we are the first to establish the ratio consistency of such a variance estimator, which justifies the coverage rate of confidence intervals constructed from random forests. Numerically, we further propose a local smoothing procedure to improve the estimator’s finite sample performance. Extensive simulation studies show that our estimators enjoy lower bias and achieve targeted coverage rates.

Original languageEnglish (US)
Pages (from-to)2135-2207
Number of pages73
JournalElectronic Journal of Statistics
Volume18
Issue number1
DOIs
StatePublished - 2024

Keywords

  • ensemble learning
  • Hoeffding decomposition
  • Infinite-order U-statistics
  • random forests
  • ratio consistency
  • variance estimation

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Fingerprint

Dive into the research topics of 'On variance estimation of random forests with Infinite-order U-statistics'. Together they form a unique fingerprint.

Cite this