TY - GEN
T1 - MULAN
T2 - 33rd ACM Web Conference, WWW 2024
AU - Zheng, Lecheng
AU - Chen, Zhengzhang
AU - He, Jingrui
AU - Chen, Haifeng
N1 - Publisher Copyright:
© 2024 ACM.
PY - 2024/5/13
Y1 - 2024/5/13
N2 - Effective root cause analysis (RCA) is vital for swiftly restoring services, minimizing losses, and ensuring the smooth operation and management of complex systems. Previous data-driven RCA methods, particularly those employing causal discovery techniques, have primarily focused on constructing dependency or causal graphs for backtracking the root causes. However, these methods often fall short as they rely solely on data from a single modality, thereby resulting in suboptimal solutions. In this work, we propose Mulan, a unified multi-modal causal structure learning method designed to identify root causes in microservice systems. We leverage a log-tailored language model to facilitate log representation learning, converting log sequences into time-series data. To explore intricate relationships across different modalities, we propose a contrastive learning-based approach to extract modality-invariant and modality-specific representations within a shared latent space. Additionally, we introduce a novel key performance indicator-aware attention mechanism for assessing modality reliability and co-learning a final causal graph. Finally, we employ random walk with restart to simulate system fault propagation and identify potential root causes. Extensive experiments on three real-world datasets validate the effectiveness of our proposed method.
AB - Effective root cause analysis (RCA) is vital for swiftly restoring services, minimizing losses, and ensuring the smooth operation and management of complex systems. Previous data-driven RCA methods, particularly those employing causal discovery techniques, have primarily focused on constructing dependency or causal graphs for backtracking the root causes. However, these methods often fall short as they rely solely on data from a single modality, thereby resulting in suboptimal solutions. In this work, we propose Mulan, a unified multi-modal causal structure learning method designed to identify root causes in microservice systems. We leverage a log-tailored language model to facilitate log representation learning, converting log sequences into time-series data. To explore intricate relationships across different modalities, we propose a contrastive learning-based approach to extract modality-invariant and modality-specific representations within a shared latent space. Additionally, we introduce a novel key performance indicator-aware attention mechanism for assessing modality reliability and co-learning a final causal graph. Finally, we employ random walk with restart to simulate system fault propagation and identify potential root causes. Extensive experiments on three real-world datasets validate the effectiveness of our proposed method.
KW - causal structure learning
KW - contrastive learning
KW - large language model
KW - log analysis
KW - microservice systems
KW - multi-modal learning
KW - root causal analysis
KW - system diagnosis
UR - http://www.scopus.com/inward/record.url?scp=85194052979&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85194052979&partnerID=8YFLogxK
U2 - 10.1145/3589334.3645442
DO - 10.1145/3589334.3645442
M3 - Conference contribution
AN - SCOPUS:85194052979
T3 - WWW 2024 - Proceedings of the ACM Web Conference
SP - 4107
EP - 4116
BT - WWW 2024 - Proceedings of the ACM Web Conference
PB - Association for Computing Machinery
Y2 - 13 May 2024 through 17 May 2024
ER -