Towards Portable Online Prediction of Network Utilization Using MPI-Level Monitoring

Shu Mei Tseng, Bogdan Nicolae, George Bosilca, Emmanuel Jeannot, Aparna Chandramowlishwaran, Franck Cappello

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Stealing network bandwidth helps a variety of HPC runtimes and services to run additional operations in the background without negatively affecting the applications. A key ingredient to make this possible is an accurate prediction of the future network utilization, enabling the runtime to plan the background operations in advance, such as to avoid competing with the application for network bandwidth. In this paper, we propose a portable deep learning predictor that only uses the information available through MPI introspection to construct a recurrent sequence-to-sequence neural network capable of forecasting network utilization. We leverage the fact that most HPC applications exhibit periodic behaviors to enable predictions far into the future (at least the length of a period). Our online approach does not have an initial training phase, it continuously improves itself during application execution without incurring significant computational overhead. Experimental results show better accuracy and lower computational overhead compared with the state-of-the-art on two representative applications.

Original languageEnglish (US)
Title of host publicationEuro-Par 2019
Subtitle of host publicationParallel Processing - 25th International Conference on Parallel and Distributed Computing, Proceedings
EditorsRamin Yahyapour
PublisherSpringer
Pages47-60
Number of pages14
ISBN (Print)9783030293994
DOIs
StatePublished - 2019
Externally publishedYes
Event25th International European Conference on Parallel and Distributed Computing, Euro-Par 2019 - Göttingen, Germany
Duration: Aug 26 2019Aug 30 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11725 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference25th International European Conference on Parallel and Distributed Computing, Euro-Par 2019
Country/TerritoryGermany
CityGöttingen
Period8/26/198/30/19

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Towards Portable Online Prediction of Network Utilization Using MPI-Level Monitoring'. Together they form a unique fingerprint.

Cite this