TY - GEN
T1 - Towards Portable Online Prediction of Network Utilization Using MPI-Level Monitoring
AU - Tseng, Shu Mei
AU - Nicolae, Bogdan
AU - Bosilca, George
AU - Jeannot, Emmanuel
AU - Chandramowlishwaran, Aparna
AU - Cappello, Franck
N1 - Funding Information:
Acknowledgments. This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. This material was based upon work supported by the U.S. Department of Energy, Office of Science, under contract DE-AC02-06CH11357, and by the National Science Foundation under Grant No. #1664142. The experiments presented in this paper were carried out using the Grid’5000/ALADDIN-G5K experimental testbed, an initiative of the French Ministry of Research through the ACI GRID incentive action, INRIA, CNRS and RENATER and other contributing partners (see http://www.grid5000.fr/).
Publisher Copyright:
© 2019, Springer Nature Switzerland AG.
PY - 2019
Y1 - 2019
N2 - Stealing network bandwidth helps a variety of HPC runtimes and services to run additional operations in the background without negatively affecting the applications. A key ingredient to make this possible is an accurate prediction of the future network utilization, enabling the runtime to plan the background operations in advance, such as to avoid competing with the application for network bandwidth. In this paper, we propose a portable deep learning predictor that only uses the information available through MPI introspection to construct a recurrent sequence-to-sequence neural network capable of forecasting network utilization. We leverage the fact that most HPC applications exhibit periodic behaviors to enable predictions far into the future (at least the length of a period). Our online approach does not have an initial training phase, it continuously improves itself during application execution without incurring significant computational overhead. Experimental results show better accuracy and lower computational overhead compared with the state-of-the-art on two representative applications.
AB - Stealing network bandwidth helps a variety of HPC runtimes and services to run additional operations in the background without negatively affecting the applications. A key ingredient to make this possible is an accurate prediction of the future network utilization, enabling the runtime to plan the background operations in advance, such as to avoid competing with the application for network bandwidth. In this paper, we propose a portable deep learning predictor that only uses the information available through MPI introspection to construct a recurrent sequence-to-sequence neural network capable of forecasting network utilization. We leverage the fact that most HPC applications exhibit periodic behaviors to enable predictions far into the future (at least the length of a period). Our online approach does not have an initial training phase, it continuously improves itself during application execution without incurring significant computational overhead. Experimental results show better accuracy and lower computational overhead compared with the state-of-the-art on two representative applications.
UR - http://www.scopus.com/inward/record.url?scp=85077130685&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85077130685&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-29400-7_4
DO - 10.1007/978-3-030-29400-7_4
M3 - Conference contribution
AN - SCOPUS:85077130685
SN - 9783030293994
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 47
EP - 60
BT - Euro-Par 2019
A2 - Yahyapour, Ramin
PB - Springer
T2 - 25th International European Conference on Parallel and Distributed Computing, Euro-Par 2019
Y2 - 26 August 2019 through 30 August 2019
ER -