Process trace clustering: A heterogeneous information network approach

Phuong Nguyen, Aleksander Slominski, Vinod Muthusamy, Vatche Ishakian, Klara Nahrstedt

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Process mining is the task of extracting information from event logs, such as ones generated from workflow management or enterprise resource planning systems, in order to discover models of the underlying processes, organizations, and products. As the event logs often contain a variety of process executions, the discovered models can be complex and difficult to comprehend. Trace clustering helps solve this problem by splitting the event logs into smaller subsets and applying process discovery algorithms on each subset, resulting in per-subset discovered processes that are less complex and more accurate. However, the state-of-the-art clustering techniques are limited: the similarity measures are not process-aware and they do not scale well to high-dimensional event logs. In this paper, we propose a conceptualization of process's event logs as a heterogeneous information network, in order to capture the rich semantic meaning, and thereby derive better process-specific features. In addition, we propose SeqPathSim, a meta path-based similarity measure that considers node sequences in the heterogeneous graph and results in better clustering. We also introduce a new dimension reduction method that combines event similarity with regularization by process model structure to deal with event logs of high dimensionality. The experimental results show that our proposed approach outperforms state-of-the-art trace clustering approaches in both accuracy and structural complexity metrics.

Original languageEnglish (US)
Title of host publication16th SIAM International Conference on Data Mining 2016, SDM 2016
EditorsSanjay Chawla Venkatasubramanian, Wagner Meira
PublisherSociety for Industrial and Applied Mathematics Publications
Number of pages9
ISBN (Electronic)9781510828117
StatePublished - 2016
Event16th SIAM International Conference on Data Mining 2016, SDM 2016 - Miami, United States
Duration: May 5 2016May 7 2016

Publication series

Name16th SIAM International Conference on Data Mining 2016, SDM 2016


Other16th SIAM International Conference on Data Mining 2016, SDM 2016
Country/TerritoryUnited States

ASJC Scopus subject areas

  • Computer Science Applications
  • Software


Dive into the research topics of 'Process trace clustering: A heterogeneous information network approach'. Together they form a unique fingerprint.

Cite this