TY - GEN
T1 - Firm
T2 - 14th USENIX Symposium on Operating Systems Design and Implementation,OSDI 2020
AU - Qiu, Haoran
AU - Banerjee, Subho S.
AU - Jha, Saurabh
AU - Kalbarczyk, Zbigniew T.
AU - Iyer, Ravishankar K.
N1 - Publisher Copyright:
© 2020 Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2020. All rights reserved.
PY - 2020
Y1 - 2020
N2 - User-facing latency-sensitive web services include numerous distributed, intercommunicating microservices that promise to simplify software development and operation. However, multiplexing of compute resources across microservices is still challenging in production because contention for shared resources can cause latency spikes that violate the service-level objectives (SLOs) of user requests. This paper presents FIRM, an intelligent fine-grained resource management framework for predictable sharing of resources across microservices to drive up overall utilization. FIRM leverages online telemetry data and machine-learning methods to adaptively (a) detect/localize microservices that cause SLO violations, (b) identify low-level resources in contention, and (c) take actions to mitigate SLO violations via dynamic reprovisioning. Experiments across four microservice benchmarks demonstrate that FIRM reduces SLO violations by up to 16× while reducing the overall requested CPU limit by up to 62%. Moreover, FIRM improves performance predictability by reducing tail latencies by up to 11×.
AB - User-facing latency-sensitive web services include numerous distributed, intercommunicating microservices that promise to simplify software development and operation. However, multiplexing of compute resources across microservices is still challenging in production because contention for shared resources can cause latency spikes that violate the service-level objectives (SLOs) of user requests. This paper presents FIRM, an intelligent fine-grained resource management framework for predictable sharing of resources across microservices to drive up overall utilization. FIRM leverages online telemetry data and machine-learning methods to adaptively (a) detect/localize microservices that cause SLO violations, (b) identify low-level resources in contention, and (c) take actions to mitigate SLO violations via dynamic reprovisioning. Experiments across four microservice benchmarks demonstrate that FIRM reduces SLO violations by up to 16× while reducing the overall requested CPU limit by up to 62%. Moreover, FIRM improves performance predictability by reducing tail latencies by up to 11×.
UR - http://www.scopus.com/inward/record.url?scp=85096750460&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85096750460&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85096750460
T3 - Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2020
SP - 805
EP - 825
BT - Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2020
PB - USENIX Association
Y2 - 4 November 2020 through 6 November 2020
ER -