A latent hawkes process model for event clustering and temporal dynamics learning with applications in GitHub

Shengzhong Liu, Shuochao Yao, Dongxin Liu, Huajie Shao, Yiran Zhao, Xinzhe Fu, Tarek Abdelzaher

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Large volumes of event data are becoming increasingly available on online social networks. These events are usually causally dependent to each other, reflecting the interactions and collaborations among different parties. Learning and interpreting the temporal patterns and dynamics within these event streams plays an important role in many practical applications, such as trend prediction and anomaly detection. Since causal dependencies can be reflected in both event time (i.e., when) and event content (i.e., who and what), we thus develop a user community based generative model, called latent Hawkes process (LHP), taking into account both-side information to illustrate the generation of such inter-dependent event streams on GitHub repositories, where each attribute is assumed to be generated by interplays between correlated latent communities. Through learning of our model, two functionalities are fulfilled concurrently: event clustering (i.e., community discovery) and temporal dependency learning among these clusters (i.e., dependency profiling). To do so, we design an EM-based framework integrating sequential Monte Carlo sampling to estimate model parameters in an end-to-end manner. Through experiments on practical GitHub event data, we validate the effectiveness of LHP in extracting user community structures and learning their correlated temporal dynamics. Such knowledge further enables us to gain new insights into the development status of software, such as the project persistence and anomaly detection.

Original languageEnglish (US)
Title of host publicationProceedings - 2019 39th IEEE International Conference on Distributed Computing Systems, ICDCS 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1275-1285
Number of pages11
ISBN (Electronic)9781728125190
DOIs
StatePublished - Jul 2019
Event39th IEEE International Conference on Distributed Computing Systems, ICDCS 2019 - Richardson, United States
Duration: Jul 7 2019Jul 9 2019

Publication series

NameProceedings - International Conference on Distributed Computing Systems
Volume2019-July

Conference

Conference39th IEEE International Conference on Distributed Computing Systems, ICDCS 2019
Country/TerritoryUnited States
CityRichardson
Period7/7/197/9/19

Keywords

  • Github
  • Graphical Model
  • Temporal Point Process
  • Time Series Analysis

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'A latent hawkes process model for event clustering and temporal dynamics learning with applications in GitHub'. Together they form a unique fingerprint.

Cite this