TY - JOUR
T1 - Predictive analytics of insurance claims using multivariate decision trees
AU - Quan, Zhiyu
AU - Valdez, Emiliano A.
N1 - Funding Information:
Acknowledgment: We would like to thank the Society of Actuaries for the funding support of this research project through our Centers of Actuarial Excellence (CAE) grant on data mining. The data used in this paper was provided by Gee Lee and Edward W. (Jed) Frees of the University of Wisconsin in Madison; we extend our appreciation to them for allowing us to use the data. We would also like to thank the participants of the 10th Conference in Actuarial Science and Finance on Samos for the feedback. Zhiyu would like to acknowledge the doctoral student travel award provided by the University of Connecticut (UConn) Graduate School .
Publisher Copyright:
© by Zhiyu Quan, Emiliano A. Valdez, published by De Gruyter 2019.
PY - 2018/12/1
Y1 - 2018/12/1
N2 - Because of its many advantages, the use of decision trees has become an increasingly popular alternative predictive tool for building classification and regression models. Its origins date back for about five decades where the algorithm can be broadly described by repeatedly partitioning the regions of the explanatory variables and thereby creating a tree-based model for predicting the response. Innovations to the original methods, such as random forests and gradient boosting, have further improved the capabilities of using decision trees as a predictive model. In addition, the extension of using decision trees with multivariate response variables started to develop and it is the purpose of this paper to apply multivariate tree models to insurance claims data with correlated responses. This extension to multivariate response variables inherits several advantages of the univariate decision tree models such as distribution-free feature, ability to rank essential explanatory variables, and high predictive accuracy, to name a few. To illustrate the approach, we analyze a dataset drawn from the Wisconsin Local Government Property Insurance Fund (LGPIF)which offers multi-line insurance coverage of property, motor vehicle, and contractors' equipments.With multivariate tree models, we are able to capture the inherent relationship among the response variables and we find that the marginal predictive model based on multivariate trees is an improvement in prediction accuracy from that based on simply the univariate trees.
AB - Because of its many advantages, the use of decision trees has become an increasingly popular alternative predictive tool for building classification and regression models. Its origins date back for about five decades where the algorithm can be broadly described by repeatedly partitioning the regions of the explanatory variables and thereby creating a tree-based model for predicting the response. Innovations to the original methods, such as random forests and gradient boosting, have further improved the capabilities of using decision trees as a predictive model. In addition, the extension of using decision trees with multivariate response variables started to develop and it is the purpose of this paper to apply multivariate tree models to insurance claims data with correlated responses. This extension to multivariate response variables inherits several advantages of the univariate decision tree models such as distribution-free feature, ability to rank essential explanatory variables, and high predictive accuracy, to name a few. To illustrate the approach, we analyze a dataset drawn from the Wisconsin Local Government Property Insurance Fund (LGPIF)which offers multi-line insurance coverage of property, motor vehicle, and contractors' equipments.With multivariate tree models, we are able to capture the inherent relationship among the response variables and we find that the marginal predictive model based on multivariate trees is an improvement in prediction accuracy from that based on simply the univariate trees.
KW - Tree-based models
KW - gradient boosting
KW - multivariate regression trees
KW - multivariate tree boosting
KW - predictive model of insurance claims
KW - random forests
KW - univariate regression trees
UR - http://www.scopus.com/inward/record.url?scp=85060374488&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85060374488&partnerID=8YFLogxK
U2 - 10.1515/demo-2018-0022
DO - 10.1515/demo-2018-0022
M3 - Article
AN - SCOPUS:85060374488
SN - 2300-2298
VL - 6
SP - 377
EP - 407
JO - Dependence Modeling
JF - Dependence Modeling
IS - 1
ER -