TY - JOUR
T1 - Enjoy your observability
T2 - an industrial survey of microservice tracing and analysis
AU - Li, Bowen
AU - Peng, Xin
AU - Xiang, Qilin
AU - Wang, Hanzhang
AU - Xie, Tao
AU - Sun, Jun
AU - Liu, Xuanzhe
N1 - Funding Information:
This work was supported by the National Key Research and Development Program of China under Grant No. 2018YFB1004803.
Publisher Copyright:
© 2021, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2022/1
Y1 - 2022/1
N2 - Microservice systems are often deployed in complex cloud-based environments and may involve a large number of service instances being dynamically created and destroyed. It is thus essential to ensure observability to understand these microservice systems’ behaviors and troubleshoot their problems. As an important means to achieve the observability, distributed tracing and analysis is known to be challenging. While many companies have started implementing distributed tracing and analysis for microservice systems, it is not clear whether existing approaches fulfill the required observability. In this article, we present our industrial survey on microservice tracing and analysis through interviewing developers and operation engineers of microservice systems from ten companies. Our survey results offer a number of findings. For example, large microservice systems commonly adopt a tracing and analysis pipeline, and the implementations of the pipeline in different companies reflect different tradeoffs among a variety of concerns. Visualization and statistic-based metrics are the most common means for trace analysis, while more advanced analysis techniques such as machine learning and data mining are seldom used. Microservice tracing and analysis is a new big data problem for software engineering, and its practices breed new challenges and opportunities.
AB - Microservice systems are often deployed in complex cloud-based environments and may involve a large number of service instances being dynamically created and destroyed. It is thus essential to ensure observability to understand these microservice systems’ behaviors and troubleshoot their problems. As an important means to achieve the observability, distributed tracing and analysis is known to be challenging. While many companies have started implementing distributed tracing and analysis for microservice systems, it is not clear whether existing approaches fulfill the required observability. In this article, we present our industrial survey on microservice tracing and analysis through interviewing developers and operation engineers of microservice systems from ten companies. Our survey results offer a number of findings. For example, large microservice systems commonly adopt a tracing and analysis pipeline, and the implementations of the pipeline in different companies reflect different tradeoffs among a variety of concerns. Visualization and statistic-based metrics are the most common means for trace analysis, while more advanced analysis techniques such as machine learning and data mining are seldom used. Microservice tracing and analysis is a new big data problem for software engineering, and its practices breed new challenges and opportunities.
KW - Industrial survey
KW - Logging
KW - Microservice
KW - Tracing
UR - http://www.scopus.com/inward/record.url?scp=85120162918&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85120162918&partnerID=8YFLogxK
U2 - 10.1007/s10664-021-10063-9
DO - 10.1007/s10664-021-10063-9
M3 - Article
C2 - 34867075
AN - SCOPUS:85120162918
VL - 27
JO - Empirical Software Engineering
JF - Empirical Software Engineering
SN - 1382-3256
IS - 1
M1 - 25
ER -