TY - GEN
T1 - Graph-based trace analysis for microservice architecture understanding and problem diagnosis
AU - Guo, Xiaofeng
AU - Peng, Xin
AU - Wang, Hanzhang
AU - Li, Wanxue
AU - Jiang, Huai
AU - Ding, Dan
AU - Xie, Tao
AU - Su, Liangfei
N1 - Publisher Copyright:
© 2020 ACM.
PY - 2020/11/8
Y1 - 2020/11/8
N2 - Microservice systems are highly dynamic and complex. For such systems, operation engineers and developers highly rely on trace analysis to understand architectures and diagnose various problems such as service failures and quality degradation. However, the huge number of traces produced at runtime makes it challenging to capture the required information in real-time. To address the faced challenges, in this paper, we propose a graph-based microservice trace analysis approach GMTA for understanding architecture and diagnosing various problems. Built on a graph-based representation, GMTA includes efficient processing of traces produced on the fly. It abstracts traces into different paths and further groups them into business flows. To support various analytical applications, GMTA includes an efficient storage and access mechanism by combining a graph database and a real-time analytics database and using a carefully designed storage structure. Based on GMTA, we construct analytical applications for architecture understanding and problem diagnosis, these applications support various needs such as visualizing service dependencies, making architectural decisions, analyzing the changes of services behaviors, detecting performance issues, and locating root causes. GMTA has been implemented and deployed in eBay. An experimental study based on trace data produced by eBay demonstrates GMTA's effectiveness and efficiency for architecture understanding and problem diagnosis. Case studies conducted in eBay's monitoring team and Site Reliability Engineering (SRE) team further confirm GMTA's substantial benefits in industrial-scale microservice systems.
AB - Microservice systems are highly dynamic and complex. For such systems, operation engineers and developers highly rely on trace analysis to understand architectures and diagnose various problems such as service failures and quality degradation. However, the huge number of traces produced at runtime makes it challenging to capture the required information in real-time. To address the faced challenges, in this paper, we propose a graph-based microservice trace analysis approach GMTA for understanding architecture and diagnosing various problems. Built on a graph-based representation, GMTA includes efficient processing of traces produced on the fly. It abstracts traces into different paths and further groups them into business flows. To support various analytical applications, GMTA includes an efficient storage and access mechanism by combining a graph database and a real-time analytics database and using a carefully designed storage structure. Based on GMTA, we construct analytical applications for architecture understanding and problem diagnosis, these applications support various needs such as visualizing service dependencies, making architectural decisions, analyzing the changes of services behaviors, detecting performance issues, and locating root causes. GMTA has been implemented and deployed in eBay. An experimental study based on trace data produced by eBay demonstrates GMTA's effectiveness and efficiency for architecture understanding and problem diagnosis. Case studies conducted in eBay's monitoring team and Site Reliability Engineering (SRE) team further confirm GMTA's substantial benefits in industrial-scale microservice systems.
KW - Architecture
KW - Fault localization
KW - Graph
KW - Microservice
KW - Tracing
KW - Visualization
UR - http://www.scopus.com/inward/record.url?scp=85097146402&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85097146402&partnerID=8YFLogxK
U2 - 10.1145/3368089.3417066
DO - 10.1145/3368089.3417066
M3 - Conference contribution
AN - SCOPUS:85097146402
T3 - ESEC/FSE 2020 - Proceedings of the 28th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering
SP - 1387
EP - 1397
BT - ESEC/FSE 2020 - Proceedings of the 28th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering
A2 - Devanbu, Prem
A2 - Cohen, Myra
A2 - Zimmermann, Thomas
PB - Association for Computing Machinery, Inc
T2 - 28th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2020
Y2 - 8 November 2020 through 13 November 2020
ER -