Abstract
Modern cloud-based applications have complex inter-dependencies on both distributed application components as well as network infrastructure, making it difficult to reason about their performance. As a result, a rich body of work seeks to automate performance diagnosis of enterprise networks and such cloud applications. However, existing methods either ignore inter-dependencies which results in poor accuracy, or require causal acyclic dependencies which cannot model common enterprise environments.We describe the design and implementation of Murphy, an automated performance diagnosis system, that can work with commonly available telemetry in practical enterprise environments, while achieving high accuracy. Murphy utilizes loosely-defined associations between entities obtained from commonly available monitoring data. Its learning algorithm is based on a Markov Random Field (MRF) that can take advantage of such loose associations to reason about how entities affect each other in the context of a specific incident. We evaluate Murphy in an emulated microservice environment and in real incidents from a large enterprise. Compared to past work, Murphy is able to reduce diagnosis error by ≈ 1.35× in restrictive environments supported by past work, and by ≥ 4.7× in more general environments.
Original language | English (US) |
---|---|
Title of host publication | SIGCOMM 2023 - Proceedings of the ACM SIGCOMM 2023 Conference |
Publisher | Association for Computing Machinery |
Pages | 438-451 |
Number of pages | 14 |
ISBN (Electronic) | 9798400702365 |
DOIs | |
State | Published - Sep 10 2023 |
Event | ACM SIGCOMM 2023 Conference - New York, United States Duration: Sep 10 2023 → Sep 14 2023 |
Publication series
Name | SIGCOMM 2023 - Proceedings of the ACM SIGCOMM 2023 Conference |
---|
Conference
Conference | ACM SIGCOMM 2023 Conference |
---|---|
Country/Territory | United States |
City | New York |
Period | 9/10/23 → 9/14/23 |
Keywords
- cyclic dependencies
- enterprise networks
- microservices
- performance diagnosis
ASJC Scopus subject areas
- Hardware and Architecture
- Software
- Computer Networks and Communications
Fingerprint
Dive into the research topics of 'Murphy: Performance Diagnosis of Distributed Cloud Applications'. Together they form a unique fingerprint.Datasets
-
Murphy traces
Harsh, V. (Creator), Zhou, W. (Creator), Ashok, S. (Creator), Mysore, R. N. (Creator), Godfrey, P. B. (Creator) & Banerjee, S. (Creator), University of Illinois Urbana-Champaign, Feb 26 2024
DOI: 10.13012/B2IDB-6641912_V1
Dataset