TY - GEN

T1 - A parallel evolutionary algorithm for subset selection in causal inference models

AU - Cho, Wendy K.Tam

AU - Liu, Yan Y.

PY - 2016/7/17

Y1 - 2016/7/17

N2 - Science is concerned with identifying causal inferences. To move beyond simple observed relationships and associational inferences, researchers may employ randomized experimen-tal designs to isolate a treatment effect, which then per-mits causal inferences. When experiments are not prac-tical, a researcher is relegated to analyzing observational data. To make causal inferences from observational data, one must adjust the data so that they resemble data that might have emerged from an experiment. Traditionally, this has occurred through statistical models identified as match-ing methods. We claim that matching methods are unnecessarily constraining and propose, instead, that the goal is better achieved via a subset selection procedure that is able to identify statistically indistinguishable treatment and control groups. This reformulation to identifying optimal subsets leads to a model that is computationally complex. We develop an evolutionary algorithm that is more efficient and identifies empirically more optimal solutions than any other causal inference method. To gain greater efficiency, we also develop a scalable algorithm for a parallel computing environment by enlisting additional processors to search a greater range of the solution space and to aid other processors at particularly difficult peaks.

AB - Science is concerned with identifying causal inferences. To move beyond simple observed relationships and associational inferences, researchers may employ randomized experimen-tal designs to isolate a treatment effect, which then per-mits causal inferences. When experiments are not prac-tical, a researcher is relegated to analyzing observational data. To make causal inferences from observational data, one must adjust the data so that they resemble data that might have emerged from an experiment. Traditionally, this has occurred through statistical models identified as match-ing methods. We claim that matching methods are unnecessarily constraining and propose, instead, that the goal is better achieved via a subset selection procedure that is able to identify statistically indistinguishable treatment and control groups. This reformulation to identifying optimal subsets leads to a model that is computationally complex. We develop an evolutionary algorithm that is more efficient and identifies empirically more optimal solutions than any other causal inference method. To gain greater efficiency, we also develop a scalable algorithm for a parallel computing environment by enlisting additional processors to search a greater range of the solution space and to aid other processors at particularly difficult peaks.

KW - Combinatorial optimization

KW - Evolutionary algorithm

KW - Message passing

KW - Parallel computing

UR - http://www.scopus.com/inward/record.url?scp=84989173852&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84989173852&partnerID=8YFLogxK

U2 - 10.1145/2949550.2949568

DO - 10.1145/2949550.2949568

M3 - Conference contribution

AN - SCOPUS:84989173852

T3 - ACM International Conference Proceeding Series

BT - Proceedings of XSEDE 2016

PB - Association for Computing Machinery

T2 - Conference on Diversity, Big Data, and Science at Scale, XSEDE 2016

Y2 - 17 July 2016 through 21 July 2016

ER -