TY - GEN
T1 - Process trace clustering
T2 - 16th SIAM International Conference on Data Mining 2016, SDM 2016
AU - Nguyen, Phuong
AU - Slominski, Aleksander
AU - Muthusamy, Vinod
AU - Ishakian, Vatche
AU - Nahrstedt, Klara
N1 - Publisher Copyright:
Copyright © by SIAM.
PY - 2016
Y1 - 2016
N2 - Process mining is the task of extracting information from event logs, such as ones generated from workflow management or enterprise resource planning systems, in order to discover models of the underlying processes, organizations, and products. As the event logs often contain a variety of process executions, the discovered models can be complex and difficult to comprehend. Trace clustering helps solve this problem by splitting the event logs into smaller subsets and applying process discovery algorithms on each subset, resulting in per-subset discovered processes that are less complex and more accurate. However, the state-of-the-art clustering techniques are limited: the similarity measures are not process-aware and they do not scale well to high-dimensional event logs. In this paper, we propose a conceptualization of process's event logs as a heterogeneous information network, in order to capture the rich semantic meaning, and thereby derive better process-specific features. In addition, we propose SeqPathSim, a meta path-based similarity measure that considers node sequences in the heterogeneous graph and results in better clustering. We also introduce a new dimension reduction method that combines event similarity with regularization by process model structure to deal with event logs of high dimensionality. The experimental results show that our proposed approach outperforms state-of-the-art trace clustering approaches in both accuracy and structural complexity metrics.
AB - Process mining is the task of extracting information from event logs, such as ones generated from workflow management or enterprise resource planning systems, in order to discover models of the underlying processes, organizations, and products. As the event logs often contain a variety of process executions, the discovered models can be complex and difficult to comprehend. Trace clustering helps solve this problem by splitting the event logs into smaller subsets and applying process discovery algorithms on each subset, resulting in per-subset discovered processes that are less complex and more accurate. However, the state-of-the-art clustering techniques are limited: the similarity measures are not process-aware and they do not scale well to high-dimensional event logs. In this paper, we propose a conceptualization of process's event logs as a heterogeneous information network, in order to capture the rich semantic meaning, and thereby derive better process-specific features. In addition, we propose SeqPathSim, a meta path-based similarity measure that considers node sequences in the heterogeneous graph and results in better clustering. We also introduce a new dimension reduction method that combines event similarity with regularization by process model structure to deal with event logs of high dimensionality. The experimental results show that our proposed approach outperforms state-of-the-art trace clustering approaches in both accuracy and structural complexity metrics.
UR - http://www.scopus.com/inward/record.url?scp=84991687184&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84991687184&partnerID=8YFLogxK
U2 - 10.1137/1.9781611974348.32
DO - 10.1137/1.9781611974348.32
M3 - Conference contribution
AN - SCOPUS:84991687184
T3 - 16th SIAM International Conference on Data Mining 2016, SDM 2016
SP - 279
EP - 287
BT - 16th SIAM International Conference on Data Mining 2016, SDM 2016
A2 - Venkatasubramanian, Sanjay Chawla
A2 - Meira, Wagner
PB - Society for Industrial and Applied Mathematics Publications
Y2 - 5 May 2016 through 7 May 2016
ER -