TY - GEN
T1 - A latent hawkes process model for event clustering and temporal dynamics learning with applications in GitHub
AU - Liu, Shengzhong
AU - Yao, Shuochao
AU - Liu, Dongxin
AU - Shao, Huajie
AU - Zhao, Yiran
AU - Fu, Xinzhe
AU - Abdelzaher, Tarek
N1 - Funding Information:
This work was sponsored in part by DARPA under contract W911NF-17-C-0099. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/7
Y1 - 2019/7
N2 - Large volumes of event data are becoming increasingly available on online social networks. These events are usually causally dependent to each other, reflecting the interactions and collaborations among different parties. Learning and interpreting the temporal patterns and dynamics within these event streams plays an important role in many practical applications, such as trend prediction and anomaly detection. Since causal dependencies can be reflected in both event time (i.e., when) and event content (i.e., who and what), we thus develop a user community based generative model, called latent Hawkes process (LHP), taking into account both-side information to illustrate the generation of such inter-dependent event streams on GitHub repositories, where each attribute is assumed to be generated by interplays between correlated latent communities. Through learning of our model, two functionalities are fulfilled concurrently: event clustering (i.e., community discovery) and temporal dependency learning among these clusters (i.e., dependency profiling). To do so, we design an EM-based framework integrating sequential Monte Carlo sampling to estimate model parameters in an end-to-end manner. Through experiments on practical GitHub event data, we validate the effectiveness of LHP in extracting user community structures and learning their correlated temporal dynamics. Such knowledge further enables us to gain new insights into the development status of software, such as the project persistence and anomaly detection.
AB - Large volumes of event data are becoming increasingly available on online social networks. These events are usually causally dependent to each other, reflecting the interactions and collaborations among different parties. Learning and interpreting the temporal patterns and dynamics within these event streams plays an important role in many practical applications, such as trend prediction and anomaly detection. Since causal dependencies can be reflected in both event time (i.e., when) and event content (i.e., who and what), we thus develop a user community based generative model, called latent Hawkes process (LHP), taking into account both-side information to illustrate the generation of such inter-dependent event streams on GitHub repositories, where each attribute is assumed to be generated by interplays between correlated latent communities. Through learning of our model, two functionalities are fulfilled concurrently: event clustering (i.e., community discovery) and temporal dependency learning among these clusters (i.e., dependency profiling). To do so, we design an EM-based framework integrating sequential Monte Carlo sampling to estimate model parameters in an end-to-end manner. Through experiments on practical GitHub event data, we validate the effectiveness of LHP in extracting user community structures and learning their correlated temporal dynamics. Such knowledge further enables us to gain new insights into the development status of software, such as the project persistence and anomaly detection.
KW - Github
KW - Graphical Model
KW - Temporal Point Process
KW - Time Series Analysis
UR - http://www.scopus.com/inward/record.url?scp=85074832673&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85074832673&partnerID=8YFLogxK
U2 - 10.1109/ICDCS.2019.00128
DO - 10.1109/ICDCS.2019.00128
M3 - Conference contribution
AN - SCOPUS:85074832673
T3 - Proceedings - International Conference on Distributed Computing Systems
SP - 1275
EP - 1285
BT - Proceedings - 2019 39th IEEE International Conference on Distributed Computing Systems, ICDCS 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 39th IEEE International Conference on Distributed Computing Systems, ICDCS 2019
Y2 - 7 July 2019 through 9 July 2019
ER -