TY - GEN
T1 - Establishing a High-Performance and Productive Ecosystem for Distributed Execution of Python Functions Using Globus Compute
AU - Ananthakrishnan, Rachana
AU - Babuji, Yadu
AU - Bryan, Josh
AU - Chard, Kyle
AU - Chard, Ryan
AU - Clifford, Ben
AU - Foster, Ian
AU - Gorenstein, Lev
AU - Kesling, Kevin Hunter
AU - Janidlo, Chris
AU - Katz, Daniel S.
AU - Mello, Reid
AU - Pauloski, J. Gregory
AU - Wang, Lei
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - The research computing ecosystem is increasingly heterogeneous and diverse. Democratizing access to these essential resources is critical for accelerating research progress. However, the gap between a high-level workload, such as Python in a Jupyter notebook, and the resources and interfaces exposed by HPC systems is significant. Users must securely authenticate, manage network connections, deploy and manage software, provision and configure nodes, and manage workload execution. Globus Compute reduces these barriers by providing a managed, fire-and-forget model that enables execution of Python functions across any resource to which a user has access. However, while Globus Compute has relieved users from many of the challenges of remote computing, we have observed some inefficiencies that remain in terms of use. For example, many users wrap external applications, such as C/C++, Fortran, and even MPI applications, in Python functions and users must deploy many endpoints on a single computer to exploit different configurations. In this paper we describe enhancements to Globus Compute to address these barriers: an asynchronous, future-based executor interface for submitting and monitoring tasks, shell and MPIbased function types, and a multi-user endpoint that can be deployed by administrators and used by authorized users.
AB - The research computing ecosystem is increasingly heterogeneous and diverse. Democratizing access to these essential resources is critical for accelerating research progress. However, the gap between a high-level workload, such as Python in a Jupyter notebook, and the resources and interfaces exposed by HPC systems is significant. Users must securely authenticate, manage network connections, deploy and manage software, provision and configure nodes, and manage workload execution. Globus Compute reduces these barriers by providing a managed, fire-and-forget model that enables execution of Python functions across any resource to which a user has access. However, while Globus Compute has relieved users from many of the challenges of remote computing, we have observed some inefficiencies that remain in terms of use. For example, many users wrap external applications, such as C/C++, Fortran, and even MPI applications, in Python functions and users must deploy many endpoints on a single computer to exploit different configurations. In this paper we describe enhancements to Globus Compute to address these barriers: an asynchronous, future-based executor interface for submitting and monitoring tasks, shell and MPIbased function types, and a multi-user endpoint that can be deployed by administrators and used by authorized users.
KW - n/a
UR - http://www.scopus.com/inward/record.url?scp=85217171057&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85217171057&partnerID=8YFLogxK
U2 - 10.1109/SCW63240.2024.00083
DO - 10.1109/SCW63240.2024.00083
M3 - Conference contribution
AN - SCOPUS:85217171057
T3 - Proceedings of SC 2024-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis
SP - 597
EP - 606
BT - Proceedings of SC 2024-W
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC Workshops 2024
Y2 - 17 November 2024 through 22 November 2024
ER -