TY - GEN
T1 - DAPPER
T2 - 18th IEEE International Conference on Data Mining, ICDM 2018
AU - Giaquinto, Robert
AU - Banerjee, Arindam
N1 - Funding Information:
ACKNOWLEDGMENTS We thank reviewers for their valuable comments, University of Minnesota Supercomputing Institute for technical support, and CaringBridge for support and collaboration. The research was supported by NSF grants IIS-1563950, IIS-1447566, IIS-1447574, IIS-1422557, CCF-1451986, CNS-1314560.
Publisher Copyright:
© 2018 IEEE.
Copyright:
Copyright 2019 Elsevier B.V., All rights reserved.
PY - 2018/12/27
Y1 - 2018/12/27
N2 - Extracting common narratives from multi-author dynamic text corpora requires complex models, such as the Dynamic Author Persona (DAP) topic model. However, such models are complex and can struggle to scale to large corpora, often because of challenging non-conjugate terms. To overcome such challenges, we adapt new ideas in approximate inference to the DAP model, resulting in the DAP Performed Exceedingly Rapidly (DAPPER) topic model. Specifically, we develop Conjugate-Computation Variational Inference (CVI) based variational Expectation-Maximization (EM) for learning the model, yielding fast, closed form updates for each document, replacing iterative optimization in earlier work. Our results show significant improvements in model fit and training time without needing to compromise the model's temporal structure or the application of Regularized Variation Inference (RVI). We demonstrate the scalability and effectiveness of the DAPPER model on multiple datasets, including the CaringBridge corpus - a collection of 9 million journals written by 200,000 authors during health crises.
AB - Extracting common narratives from multi-author dynamic text corpora requires complex models, such as the Dynamic Author Persona (DAP) topic model. However, such models are complex and can struggle to scale to large corpora, often because of challenging non-conjugate terms. To overcome such challenges, we adapt new ideas in approximate inference to the DAP model, resulting in the DAP Performed Exceedingly Rapidly (DAPPER) topic model. Specifically, we develop Conjugate-Computation Variational Inference (CVI) based variational Expectation-Maximization (EM) for learning the model, yielding fast, closed form updates for each document, replacing iterative optimization in earlier work. Our results show significant improvements in model fit and training time without needing to compromise the model's temporal structure or the application of Regularized Variation Inference (RVI). We demonstrate the scalability and effectiveness of the DAPPER model on multiple datasets, including the CaringBridge corpus - a collection of 9 million journals written by 200,000 authors during health crises.
KW - Approximate inference
KW - Graphical model
KW - Healthcare
KW - Non-conjugate models
KW - Regularized variational inference
KW - Text mining
KW - Topic modeling
UR - http://www.scopus.com/inward/record.url?scp=85061386408&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85061386408&partnerID=8YFLogxK
U2 - 10.1109/ICDM.2018.00120
DO - 10.1109/ICDM.2018.00120
M3 - Conference contribution
AN - SCOPUS:85061386408
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 971
EP - 976
BT - 2018 IEEE International Conference on Data Mining, ICDM 2018
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 17 November 2018 through 20 November 2018
ER -