TY - JOUR
T1 - DPPred
T2 - An Effective Prediction Framework with Concise Discriminative Patterns
AU - Shang, Jingbo
AU - Jiang, Meng
AU - Tong, Wenzhu
AU - Xiao, Jinfeng
AU - Peng, Jian
AU - Han, Jiawei
N1 - Funding Information:
Research was sponsored in part by the U.S. Army Research Lab. under Cooperative Agreement No. W911NF-09-2-0053 (NSCTA), US National Science Foundation IIS 16-18481, IIS 17-04532, and IIS 17-41317, grant 1U54GM114838 awarded by NIGMS through funds provided by the trans-NIH Big Data to Knowledge (BD2K) initiative (www.bd2k.nih.gov), and a Google PhD Fellowship. The views and conclusions contained in this document are those of the author(s) and should not be interpreted as representing the official policies of the U.S. Army Research Laboratory or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation hereon. Data used in the preparation of this article were obtained from the Pooled Resource Open-Access ALS Clinical Trials (PRO-ACT) Database. As such, the following organizations and individuals within the PRO-ACT Consortium contributed to the design and implementation of the PRO-ACT Database and/or provided data, but did not participate in the analysis of the data or the writing of this report: (1) Neurological Clinical Research Institute, MGH; (2) Northeast ALS Consortium; (2) Novartis; (3) Prize4Life; (4) Regeneron Pharmaceuticals, Inc.; (5) Sanofi; and (6) Teva Pharmaceutical Industries, Ltd.
Publisher Copyright:
© 1989-2012 IEEE.
PY - 2018/7/1
Y1 - 2018/7/1
N2 - In the literature, two series of models have been proposed to address prediction problems including classification and regression. Simple models, such as generalized linear models, have ordinary performance but strong interpretability on a set of simple features. The other series, including tree-based models, organize numerical, categorical, and high dimensional features into a comprehensive structure with rich interpretable information in the data. In this paper, we propose a novel Discriminative Pattern-based Prediction framework ( DPPred ) to accomplish the prediction tasks by taking their advantages of both effectiveness and interpretability. Specifically, DPPred adopts the concise discriminative patterns that are on the prefix paths from the root to leaf nodes in the tree-based models. DPPred selects a limited number of the useful discriminative patterns by searching for the most effective pattern combination to fit generalized linear models. Extensive experiments show that in many scenarios, DPPred provides competitive accuracy with the state-of-the-art as well as the valuable interpretability for developers and experts. In particular, taking a clinical application dataset as a case study, our DPPred outperforms the baselines by using only 40 concise discriminative patterns out of a potentially exponentially large set of patterns.
AB - In the literature, two series of models have been proposed to address prediction problems including classification and regression. Simple models, such as generalized linear models, have ordinary performance but strong interpretability on a set of simple features. The other series, including tree-based models, organize numerical, categorical, and high dimensional features into a comprehensive structure with rich interpretable information in the data. In this paper, we propose a novel Discriminative Pattern-based Prediction framework ( DPPred ) to accomplish the prediction tasks by taking their advantages of both effectiveness and interpretability. Specifically, DPPred adopts the concise discriminative patterns that are on the prefix paths from the root to leaf nodes in the tree-based models. DPPred selects a limited number of the useful discriminative patterns by searching for the most effective pattern combination to fit generalized linear models. Extensive experiments show that in many scenarios, DPPred provides competitive accuracy with the state-of-the-art as well as the valuable interpretability for developers and experts. In particular, taking a clinical application dataset as a case study, our DPPred outperforms the baselines by using only 40 concise discriminative patterns out of a potentially exponentially large set of patterns.
KW - Discriminative pattern
KW - classification
KW - generalized linear model
KW - regression
KW - tree-based models
UR - http://www.scopus.com/inward/record.url?scp=85030777057&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85030777057&partnerID=8YFLogxK
U2 - 10.1109/TKDE.2017.2757476
DO - 10.1109/TKDE.2017.2757476
M3 - Article
C2 - 30745791
AN - SCOPUS:85030777057
VL - 30
SP - 1226
EP - 1239
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
SN - 1041-4347
IS - 7
ER -