Murphy: Performance Diagnosis of Distributed Cloud Applications

Vipul Harsh, Wenxuan Zhou, Sachin Ashok, Radhika Niranjan Mysore, Brighten Godfrey, Sujata Banerjee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Modern cloud-based applications have complex inter-dependencies on both distributed application components as well as network infrastructure, making it difficult to reason about their performance. As a result, a rich body of work seeks to automate performance diagnosis of enterprise networks and such cloud applications. However, existing methods either ignore inter-dependencies which results in poor accuracy, or require causal acyclic dependencies which cannot model common enterprise environments.We describe the design and implementation of Murphy, an automated performance diagnosis system, that can work with commonly available telemetry in practical enterprise environments, while achieving high accuracy. Murphy utilizes loosely-defined associations between entities obtained from commonly available monitoring data. Its learning algorithm is based on a Markov Random Field (MRF) that can take advantage of such loose associations to reason about how entities affect each other in the context of a specific incident. We evaluate Murphy in an emulated microservice environment and in real incidents from a large enterprise. Compared to past work, Murphy is able to reduce diagnosis error by ≈ 1.35× in restrictive environments supported by past work, and by ≥ 4.7× in more general environments.

Original languageEnglish (US)
Title of host publicationSIGCOMM 2023 - Proceedings of the ACM SIGCOMM 2023 Conference
PublisherAssociation for Computing Machinery
Pages438-451
Number of pages14
ISBN (Electronic)9798400702365
DOIs
StatePublished - Sep 10 2023
EventACM SIGCOMM 2023 Conference - New York, United States
Duration: Sep 10 2023Sep 14 2023

Publication series

NameSIGCOMM 2023 - Proceedings of the ACM SIGCOMM 2023 Conference

Conference

ConferenceACM SIGCOMM 2023 Conference
Country/TerritoryUnited States
CityNew York
Period9/10/239/14/23

Keywords

  • cyclic dependencies
  • enterprise networks
  • microservices
  • performance diagnosis

ASJC Scopus subject areas

  • Hardware and Architecture
  • Software
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Murphy: Performance Diagnosis of Distributed Cloud Applications'. Together they form a unique fingerprint.

Cite this