Abstract
Infinite-order U-statistics (IOUS) have been used extensively in subbagging ensemble learning algorithms such as random forests to quantify its uncertainty. While normality results of IOUS have been studied extensively, its variance estimation and theoretical properties remain mostly un-explored. Existing approaches mainly utilize the leading term dominance property in the Hoeffding decomposition. However, such a view usually leads to biased estimation when the kernel size is large relative to sample size. On the other hand, while several unbiased estimators exist in the literature, their relationships and theoretical properties, (e.g., ratio consistency), have never been studied. These limitations lead to unguaranteed asymptotic coverage of constructed confidence intervals. To bridge these gaps in the literature, we propose a new view of the Hoeffding decomposition for variance estimation that leads to an unbiased estimator. Instead of leading term dominance, our view utilizes the dominance of the peak region. Moreover, we establish the connection and equivalence of our estimator with several existing unbiased variance estimators. Theoretically, we are the first to establish the ratio consistency of such a variance estimator, which justifies the coverage rate of confidence intervals constructed from random forests. Numerically, we further propose a local smoothing procedure to improve the estimator’s finite sample performance. Extensive simulation studies show that our estimators enjoy lower bias and achieve targeted coverage rates.
Original language | English (US) |
---|---|
Pages (from-to) | 2135-2207 |
Number of pages | 73 |
Journal | Electronic Journal of Statistics |
Volume | 18 |
Issue number | 1 |
DOIs | |
State | Published - 2024 |
Keywords
- ensemble learning
- Hoeffding decomposition
- Infinite-order U-statistics
- random forests
- ratio consistency
- variance estimation
ASJC Scopus subject areas
- Statistics and Probability
- Statistics, Probability and Uncertainty