MULAN: Multi-modal Causal Structure Learning and Root Cause Analysis for Microservice Systems

Lecheng Zheng, Zhengzhang Chen, Jingrui He, Haifeng Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Effective root cause analysis (RCA) is vital for swiftly restoring services, minimizing losses, and ensuring the smooth operation and management of complex systems. Previous data-driven RCA methods, particularly those employing causal discovery techniques, have primarily focused on constructing dependency or causal graphs for backtracking the root causes. However, these methods often fall short as they rely solely on data from a single modality, thereby resulting in suboptimal solutions. In this work, we propose Mulan, a unified multi-modal causal structure learning method designed to identify root causes in microservice systems. We leverage a log-tailored language model to facilitate log representation learning, converting log sequences into time-series data. To explore intricate relationships across different modalities, we propose a contrastive learning-based approach to extract modality-invariant and modality-specific representations within a shared latent space. Additionally, we introduce a novel key performance indicator-aware attention mechanism for assessing modality reliability and co-learning a final causal graph. Finally, we employ random walk with restart to simulate system fault propagation and identify potential root causes. Extensive experiments on three real-world datasets validate the effectiveness of our proposed method.

Original languageEnglish (US)
Title of host publicationWWW 2024 - Proceedings of the ACM Web Conference
PublisherAssociation for Computing Machinery
Pages4107-4116
Number of pages10
ISBN (Electronic)9798400701719
DOIs
StatePublished - May 13 2024
Event33rd ACM Web Conference, WWW 2024 - Singapore, Singapore
Duration: May 13 2024May 17 2024

Publication series

NameWWW 2024 - Proceedings of the ACM Web Conference

Conference

Conference33rd ACM Web Conference, WWW 2024
Country/TerritorySingapore
CitySingapore
Period5/13/245/17/24

Keywords

  • causal structure learning
  • contrastive learning
  • large language model
  • log analysis
  • microservice systems
  • multi-modal learning
  • root causal analysis
  • system diagnosis

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Software

Fingerprint

Dive into the research topics of 'MULAN: Multi-modal Causal Structure Learning and Root Cause Analysis for Microservice Systems'. Together they form a unique fingerprint.

Cite this