Abstract
In many application areas, data are collected on a categorical response and high-dimensional categorical predictors, with the goals being to build a parsimonious model for classification while doing inferences on the important predictors. In settings such as genomics, there can be complex interactions among the predictors. By using a carefully structured Tucker factorization, we define a model that can characterize any conditional probability, while facilitating variable selection and modeling of higher-order interactions. Following a Bayesian approach, we propose a Markov chain Monte Carlo algorithm for posterior computation accommodating uncertainty in the predictors to be included. Under near low-rank assumptions, the posterior distribution for the conditional probability is shown to achieve close to the parametric rate of contraction even in ultra high-dimensional settings. The methods are illustrated using simulation examples and biomedical applications. Supplementary materials for this article are available online.
Original language | English (US) |
---|---|
Pages (from-to) | 656-669 |
Number of pages | 14 |
Journal | Journal of the American Statistical Association |
Volume | 111 |
Issue number | 514 |
DOIs | |
State | Published - Apr 2 2016 |
Externally published | Yes |
Keywords
- Classification
- Convergence rate
- Nonparametric Bayes
- Tensor factorization
- Ultra high-dimensional
- Variable selection
ASJC Scopus subject areas
- Statistics and Probability
- Statistics, Probability and Uncertainty