Monitoring and testing for regression of large-scale systems such as the NCSA's Blue Waters supercomputer are challenging tasks. In this paper, we describe the solution we came up with to perform those tasks. Our goal was to find an automated solution for running user-level regression tests to evaluate system usability and performance. Jenkins, an automation server software, was chosen for its versatility, large user base, and multitude of plugins including collecting data and plotting test results over time. We describe our Jenkins deployment to launch and monitor jobs on remote HPC system, perform authentication with one-time password, and integrate with our LDAP server for its authorization. We show some use cases and describe our best practices for successfully using Jenkins as a user-level system-wide regression testing and monitoring framework for large supercomputer systems.
- regression testing
ASJC Scopus subject areas
- Theoretical Computer Science
- Computer Science Applications
- Computer Networks and Communications
- Computational Theory and Mathematics