TY - JOUR
T1 - On hybrid tree-based methods for short-term insurance claims
AU - Quan, Zhiyu
AU - Wang, Zhiguo
AU - Gan, Guojun
AU - Valdez, Emiliano A.
PY - 2023/4
Y1 - 2023/4
N2 - Two-part framework and the Tweedie generalized linear model (GLM) have traditionally been used to model loss costs for short-term insurance contracts. For most portfolios of insurance claims, there is typically a large proportion of zero claims that leads to imbalances, resulting in lower prediction accuracy of these traditional approaches. In this article, we propose the use of tree-based methods with a hybrid structure that involves a two-step algorithm as an alternative approach. For example, the first step is the construction of a classification tree to build the probability model for claim frequency. The second step is the application of elastic net regression models at each terminal node from the classification tree to build the distribution models for claim severity. This hybrid structure captures the benefits of tuning hyperparameters at each step of the algorithm; this allows for improved prediction accuracy, and tuning can be performed to meet specific business objectives. An obvious major advantage of this hybrid structure is improved model interpretability. We examine and compare the predictive performance of this hybrid structure relative to the traditional Tweedie GLM using both simulated and real datasets. Our empirical results show that these hybrid tree-based methods produce more accurate and informative predictions.
AB - Two-part framework and the Tweedie generalized linear model (GLM) have traditionally been used to model loss costs for short-term insurance contracts. For most portfolios of insurance claims, there is typically a large proportion of zero claims that leads to imbalances, resulting in lower prediction accuracy of these traditional approaches. In this article, we propose the use of tree-based methods with a hybrid structure that involves a two-step algorithm as an alternative approach. For example, the first step is the construction of a classification tree to build the probability model for claim frequency. The second step is the application of elastic net regression models at each terminal node from the classification tree to build the distribution models for claim severity. This hybrid structure captures the benefits of tuning hyperparameters at each step of the algorithm; this allows for improved prediction accuracy, and tuning can be performed to meet specific business objectives. An obvious major advantage of this hybrid structure is improved model interpretability. We examine and compare the predictive performance of this hybrid structure relative to the traditional Tweedie GLM using both simulated and real datasets. Our empirical results show that these hybrid tree-based methods produce more accurate and informative predictions.
KW - Hyperparameter tuning
KW - Tweedie generalized linear model
KW - Tree-based models
KW - Regularized regression
KW - Pure premium
UR - http://www.scopus.com/inward/record.url?scp=85152635329&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85152635329&partnerID=8YFLogxK
U2 - 10.1017/S0269964823000074
DO - 10.1017/S0269964823000074
M3 - Article
SN - 0269-9648
VL - 37
SP - 597
EP - 620
JO - Probability in the Engineering and Informational Sciences
JF - Probability in the Engineering and Informational Sciences
IS - 2
ER -