Variable selection in heterogeneous datasets: A truncated-rank sparse linear mixed model with applications to genome-wide association studies

Haohan Wang, Bryon Aragam, Eric P. Xing

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

A fundamental and important challenge in modern datasets of ever increasing dimensionality is variable selection, which has taken on renewed interest recently due to the growth of biological and medical datasets with complex, non-i.i.d. structures. Naïvely applying classical variable selection methods such as the Lasso to such datasets may lead to a large number of false discoveries. Motivated by genome-wide association studies in genetics, we study the problem of variable selection for datasets arising from multiple subpopulations, when this underlying population structure is unknown to the researcher. We propose a unified framework for sparse variable selection that adaptively corrects for population structure via a low-rank linear mixed model. Most importantly, the proposed method does not require prior knowledge of sample structure in the data and adaptively selects a covariance structure of the correct complexity. Through extensive experiments, we illustrate the effectiveness of this framework over existing methods. Further, we test our method on three different genomic datasets from plants, mice, and human, and discuss the knowledge we discover with our method.

Original languageEnglish (US)
Title of host publicationProceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017
EditorsIllhoi Yoo, Jane Huiru Zheng, Yang Gong, Xiaohua Tony Hu, Chi-Ren Shyu, Yana Bromberg, Jean Gao, Dmitry Korkin
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages431-438
Number of pages8
ISBN (Electronic)9781509030491
DOIs
StatePublished - Dec 15 2017
Externally publishedYes
Event2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017 - Kansas City, United States
Duration: Nov 13 2017Nov 16 2017

Other

Other2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017
Country/TerritoryUnited States
CityKansas City
Period11/13/1711/16/17

ASJC Scopus subject areas

  • Biomedical Engineering
  • Health Informatics

Fingerprint

Dive into the research topics of 'Variable selection in heterogeneous datasets: A truncated-rank sparse linear mixed model with applications to genome-wide association studies'. Together they form a unique fingerprint.

Cite this