TY - GEN
T1 - A parallel evolutionary algorithm for subset selection in causal inference models
AU - Cho, Wendy K.Tam
AU - Liu, Yan Y.
N1 - Publisher Copyright:
© 2016 ACM.
PY - 2016/7/17
Y1 - 2016/7/17
N2 - Science is concerned with identifying causal inferences. To move beyond simple observed relationships and associational inferences, researchers may employ randomized experimen-tal designs to isolate a treatment effect, which then per-mits causal inferences. When experiments are not prac-tical, a researcher is relegated to analyzing observational data. To make causal inferences from observational data, one must adjust the data so that they resemble data that might have emerged from an experiment. Traditionally, this has occurred through statistical models identified as match-ing methods. We claim that matching methods are unnecessarily constraining and propose, instead, that the goal is better achieved via a subset selection procedure that is able to identify statistically indistinguishable treatment and control groups. This reformulation to identifying optimal subsets leads to a model that is computationally complex. We develop an evolutionary algorithm that is more efficient and identifies empirically more optimal solutions than any other causal inference method. To gain greater efficiency, we also develop a scalable algorithm for a parallel computing environment by enlisting additional processors to search a greater range of the solution space and to aid other processors at particularly difficult peaks.
AB - Science is concerned with identifying causal inferences. To move beyond simple observed relationships and associational inferences, researchers may employ randomized experimen-tal designs to isolate a treatment effect, which then per-mits causal inferences. When experiments are not prac-tical, a researcher is relegated to analyzing observational data. To make causal inferences from observational data, one must adjust the data so that they resemble data that might have emerged from an experiment. Traditionally, this has occurred through statistical models identified as match-ing methods. We claim that matching methods are unnecessarily constraining and propose, instead, that the goal is better achieved via a subset selection procedure that is able to identify statistically indistinguishable treatment and control groups. This reformulation to identifying optimal subsets leads to a model that is computationally complex. We develop an evolutionary algorithm that is more efficient and identifies empirically more optimal solutions than any other causal inference method. To gain greater efficiency, we also develop a scalable algorithm for a parallel computing environment by enlisting additional processors to search a greater range of the solution space and to aid other processors at particularly difficult peaks.
KW - Combinatorial optimization
KW - Evolutionary algorithm
KW - Message passing
KW - Parallel computing
UR - http://www.scopus.com/inward/record.url?scp=84989173852&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84989173852&partnerID=8YFLogxK
U2 - 10.1145/2949550.2949568
DO - 10.1145/2949550.2949568
M3 - Conference contribution
AN - SCOPUS:84989173852
T3 - ACM International Conference Proceeding Series
BT - Proceedings of XSEDE 2016
PB - Association for Computing Machinery
T2 - Conference on Diversity, Big Data, and Science at Scale, XSEDE 2016
Y2 - 17 July 2016 through 21 July 2016
ER -