On the use of ML for blackbox system performance prediction

Silvery Fu, Saurabh Gupta, Radhika Mittal, Sylvia Ratnasamy

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

There is a growing body of work that reports positive results from applying ML-based performance prediction to a particular application or use-case (e.g., server configuration, capacity planning). Yet, a critical question remains unanswered: does ML make prediction simpler (i.e., allowing us to treat systems as blackboxes) and general (i.e., across a range of applications and use-cases)? After all, the potential for simplicity and generality is a key part of what makes ML-based prediction so attractive compared to the traditional approach of relying on handcrafted and specialized performance models. In this paper, we attempt to answer this broader question. We develop a methodology for systematically diagnosing whether, when, and why ML does (not) work for performance prediction, and identify steps to improve predictability. We apply our methodology to test 6 ML models in predicting the performance of 13 real-world applications. We find that 12 out of our 13 applications exhibit inherent variability in performance that fundamentally limits prediction accuracy. Our findings motivate the need for system-level modifications and/or ML-level extensions that can improve predictability, showing how ML fails to be an easy-to-use predictor. On implementing and evaluating these changes, we find that while they do improve the overall prediction accuracy, prediction error remains high for multiple realistic scenarios, showing how ML fails as a general predictor. Hence our answer is clear: ML is not a general and easy-to-use hammer for system performance prediction.

Original languageEnglish (US)
Title of host publicationProceedings of the 18th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2021
PublisherUSENIX Association
Pages763-777
Number of pages15
ISBN (Electronic)9781939133212
StatePublished - 2021
Externally publishedYes
Event18th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2021 - Virtual, Online
Duration: Apr 12 2021Apr 14 2021

Publication series

NameProceedings of the 18th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2021

Conference

Conference18th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2021
CityVirtual, Online
Period4/12/214/14/21

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Control and Systems Engineering

Fingerprint

Dive into the research topics of 'On the use of ML for blackbox system performance prediction'. Together they form a unique fingerprint.

Cite this