TY - JOUR
T1 - A Multi-resolution Theory for Approximating Infinite-p-Zero-n
T2 - Transitional Inference, Individualized Predictions, and a World Without Bias-Variance Tradeoff
AU - Li, Xinran
AU - Meng, Xiao Li
N1 - Publisher Copyright:
© 2020 American Statistical Association.
PY - 2020
Y1 - 2020
N2 - Transitional inference is an empiricism concept, rooted and practiced in clinical medicine since ancient Greece. Knowledge and experiences gained from treating one entity (e.g., a disease or a group of patients) are applied to treat a related but distinctively different one (e.g., a similar disease or a new patient). This notion of “transition to the similar” renders individualized treatments an operational meaning, yet its theoretical foundation defies the familiar inductive inference framework. The uniqueness of entities is the result of potentially an infinite number of attributes (hence (Formula presented.)), which entails zero direct training sample size (i.e., n = 0) because genuine guinea pigs do not exist. However, the literature on wavelets and on sieve methods for nonparametric estimation suggests a principled approximation theory for transitional inference via a multi-resolution (MR) perspective, where we use the resolution level to index the degree of approximation to ultimate individuality. MR inference seeks a primary resolution indexing an indirect training sample, which provides enough matched attributes to increase the relevance of the results to the target individuals and yet still accumulate sufficient indirect sample sizes for robust estimation. Theoretically, MR inference relies on an infinite-term ANOVA-type decomposition, providing an alternative way to model sparsity via the decay rate of the resolution bias as a function of the primary resolution level. Unexpectedly, this decomposition reveals a world without variance when the outcome is a deterministic function of potentially infinitely many predictors. In this deterministic world, the optimal resolution prefers over-fitting in the traditional sense when the resolution bias decays sufficiently rapidly. Furthermore, there can be many “descents” in the prediction error curve, when the contributions of predictors are inhomogeneous and the ordering of their importance does not align with the order of their inclusion in prediction. These findings may hint at a deterministic approximation theory for understanding the apparently over-fitting resistant phenomenon of some over-saturated models in machine learning.
AB - Transitional inference is an empiricism concept, rooted and practiced in clinical medicine since ancient Greece. Knowledge and experiences gained from treating one entity (e.g., a disease or a group of patients) are applied to treat a related but distinctively different one (e.g., a similar disease or a new patient). This notion of “transition to the similar” renders individualized treatments an operational meaning, yet its theoretical foundation defies the familiar inductive inference framework. The uniqueness of entities is the result of potentially an infinite number of attributes (hence (Formula presented.)), which entails zero direct training sample size (i.e., n = 0) because genuine guinea pigs do not exist. However, the literature on wavelets and on sieve methods for nonparametric estimation suggests a principled approximation theory for transitional inference via a multi-resolution (MR) perspective, where we use the resolution level to index the degree of approximation to ultimate individuality. MR inference seeks a primary resolution indexing an indirect training sample, which provides enough matched attributes to increase the relevance of the results to the target individuals and yet still accumulate sufficient indirect sample sizes for robust estimation. Theoretically, MR inference relies on an infinite-term ANOVA-type decomposition, providing an alternative way to model sparsity via the decay rate of the resolution bias as a function of the primary resolution level. Unexpectedly, this decomposition reveals a world without variance when the outcome is a deterministic function of potentially infinitely many predictors. In this deterministic world, the optimal resolution prefers over-fitting in the traditional sense when the resolution bias decays sufficiently rapidly. Furthermore, there can be many “descents” in the prediction error curve, when the contributions of predictors are inhomogeneous and the ordering of their importance does not align with the order of their inclusion in prediction. These findings may hint at a deterministic approximation theory for understanding the apparently over-fitting resistant phenomenon of some over-saturated models in machine learning.
KW - Double descent
KW - Machine learning
KW - Multiple descents
KW - Personalized medicine
KW - Sieve methods
KW - Sparsity
KW - Transition to the similar
KW - Wavelets
UR - http://www.scopus.com/inward/record.url?scp=85098688665&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85098688665&partnerID=8YFLogxK
U2 - 10.1080/01621459.2020.1844210
DO - 10.1080/01621459.2020.1844210
M3 - Article
AN - SCOPUS:85098688665
SN - 0162-1459
VL - 116
SP - 353
EP - 367
JO - Journal of the American Statistical Association
JF - Journal of the American Statistical Association
IS - 533
ER -