TY - JOUR
T1 - A predictive based regression algorithm for gene network selection
AU - Guerrier, Stéphane
AU - Mili, Nabil
AU - Molinari, Roberto
AU - Orso, Samuel
AU - Avella-Medina, Marco
AU - Ma, Yanyuan
N1 - Publisher Copyright:
© 2016 Guerrier, Mili, Molinari, Orso, Avella-Medina and Ma.
PY - 2016/6/15
Y1 - 2016/6/15
N2 - Gene selection has become a common task in most gene expression studies. The objective of such research is often to identify the smallest possible set of genes that can still achieve good predictive performance. To do so, many of the recently proposed classification methods require some form of dimension-reduction of the problem which finally provide a single model as an output and, in most cases, rely on the likelihood function in order to achieve variable selection. We propose a new prediction-based objective function that can be tailored to the requirements of practitioners and can be used to assess and interpret a given problem. Based on cross-validation techniques and the idea of importance sampling, our proposal scans low-dimensional models under the assumption of sparsity and, for each of them, estimates their objective function to assess their predictive power in order to select. Two applications on cancer data sets and a simulation study show that the proposal compares favorably with competing alternatives such as, for example, Elastic Net and Support Vector Machine. Indeed, the proposed method not only selects smaller models for better, or at least comparable, classification errors but also provides a set of selected models instead of a single one, allowing to construct a network of possible models for a target prediction accuracy level.
AB - Gene selection has become a common task in most gene expression studies. The objective of such research is often to identify the smallest possible set of genes that can still achieve good predictive performance. To do so, many of the recently proposed classification methods require some form of dimension-reduction of the problem which finally provide a single model as an output and, in most cases, rely on the likelihood function in order to achieve variable selection. We propose a new prediction-based objective function that can be tailored to the requirements of practitioners and can be used to assess and interpret a given problem. Based on cross-validation techniques and the idea of importance sampling, our proposal scans low-dimensional models under the assumption of sparsity and, for each of them, estimates their objective function to assess their predictive power in order to select. Two applications on cancer data sets and a simulation study show that the proposal compares favorably with competing alternatives such as, for example, Elastic Net and Support Vector Machine. Indeed, the proposed method not only selects smaller models for better, or at least comparable, classification errors but also provides a set of selected models instead of a single one, allowing to construct a network of possible models for a target prediction accuracy level.
KW - Acute leukemia
KW - Biomarker selection
KW - Breast cancer
KW - Disease classification
KW - Genomic networks
KW - Model averaging
UR - http://www.scopus.com/inward/record.url?scp=84977472940&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84977472940&partnerID=8YFLogxK
U2 - 10.3389/fgene.2016.00097
DO - 10.3389/fgene.2016.00097
M3 - Article
AN - SCOPUS:84977472940
SN - 1664-8021
VL - 7
JO - Frontiers in Genetics
JF - Frontiers in Genetics
IS - JUN
M1 - 97
ER -