TY - JOUR
T1 - Recurrent neural networks for early detection of heart failure from longitudinal electronic health record data
T2 - Implications for temporal modeling with respect to time before diagnosis, data density, data quantity, and data type
AU - Chen, Robert
AU - Stewart, Walter F.
AU - Sun, Jimeng
AU - Ng, Kenney
AU - Yan, Xiaowei
N1 - Funding Information:
This work was supported by the National Science Foundation awards IIS-1418511 IIS-1838042 and CCF-1533768, and the National Institute of Health awards 1R01MD011682-01, R56HL138415, and R01HL116832.
Publisher Copyright:
© 2019 Lippincott Williams and Wilkins. All rights reserved.
PY - 2019/10/1
Y1 - 2019/10/1
N2 - Background: We determined the impact of data volume and diversity and training conditions on recurrent neural network methods compared with traditional machine learning methods. Methods and Results: Using longitudinal electronic health record data, we assessed the relative performance of machine learning models trained to detect a future diagnosis of heart failure in primary care patients. Model performance was assessed in relation to data parameters defined by the combination of different data domains (data diversity), the number of patient records in the training data set (data quantity), the number of encounters per patient (data density), the prediction window length, and the observation window length (ie, the time period before the prediction window that is the source of features for prediction). Data on 4370 incident heart failure cases and 30 132 group-matched controls were used. Recurrent neural network model performance was superior under a variety of conditions that included (1) when data were less diverse (eg, a single data domain like medication or vital signs) given the same training size; (2) as data quantity increased; (3) as density increased; (4) as the observation window length increased; and (5) as the prediction window length decreased. When all data domains were used, the performance of recurrent neural network models increased in relation to the quantity of data used (ie, up to 100% of the data). When data are sparse (ie, fewer features or low dimension), model performance is lower, but a much smaller training set size is required to achieve optimal performance compared with conditions where data are more diverse and includes more features. Conclusions: Recurrent neural networks are effective for predicting a future diagnosis of heart failure given sufficient training set size. Model performance appears to continue to improve in direct relation to training set size.
AB - Background: We determined the impact of data volume and diversity and training conditions on recurrent neural network methods compared with traditional machine learning methods. Methods and Results: Using longitudinal electronic health record data, we assessed the relative performance of machine learning models trained to detect a future diagnosis of heart failure in primary care patients. Model performance was assessed in relation to data parameters defined by the combination of different data domains (data diversity), the number of patient records in the training data set (data quantity), the number of encounters per patient (data density), the prediction window length, and the observation window length (ie, the time period before the prediction window that is the source of features for prediction). Data on 4370 incident heart failure cases and 30 132 group-matched controls were used. Recurrent neural network model performance was superior under a variety of conditions that included (1) when data were less diverse (eg, a single data domain like medication or vital signs) given the same training size; (2) as data quantity increased; (3) as density increased; (4) as the observation window length increased; and (5) as the prediction window length decreased. When all data domains were used, the performance of recurrent neural network models increased in relation to the quantity of data used (ie, up to 100% of the data). When data are sparse (ie, fewer features or low dimension), model performance is lower, but a much smaller training set size is required to achieve optimal performance compared with conditions where data are more diverse and includes more features. Conclusions: Recurrent neural networks are effective for predicting a future diagnosis of heart failure given sufficient training set size. Model performance appears to continue to improve in direct relation to training set size.
KW - diagnosis
KW - electronic health records
KW - heart failure
KW - machine learning
KW - mortality
UR - http://www.scopus.com/inward/record.url?scp=85073180482&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85073180482&partnerID=8YFLogxK
U2 - 10.1161/CIRCOUTCOMES.118.005114
DO - 10.1161/CIRCOUTCOMES.118.005114
M3 - Article
C2 - 31610714
AN - SCOPUS:85073180482
SN - 1941-7713
VL - 12
JO - Circulation: Cardiovascular Quality and Outcomes
JF - Circulation: Cardiovascular Quality and Outcomes
IS - 10
M1 - e005114
ER -