Establishing a High-Performance and Productive Ecosystem for Distributed Execution of Python Functions Using Globus Compute

Rachana Ananthakrishnan, Yadu Babuji, Josh Bryan, Kyle Chard, Ryan Chard, Ben Clifford, Ian Foster, Lev Gorenstein, Kevin Hunter Kesling, Chris Janidlo, Daniel S. Katz, Reid Mello, J. Gregory Pauloski, Lei Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The research computing ecosystem is increasingly heterogeneous and diverse. Democratizing access to these essential resources is critical for accelerating research progress. However, the gap between a high-level workload, such as Python in a Jupyter notebook, and the resources and interfaces exposed by HPC systems is significant. Users must securely authenticate, manage network connections, deploy and manage software, provision and configure nodes, and manage workload execution. Globus Compute reduces these barriers by providing a managed, fire-and-forget model that enables execution of Python functions across any resource to which a user has access. However, while Globus Compute has relieved users from many of the challenges of remote computing, we have observed some inefficiencies that remain in terms of use. For example, many users wrap external applications, such as C/C++, Fortran, and even MPI applications, in Python functions and users must deploy many endpoints on a single computer to exploit different configurations. In this paper we describe enhancements to Globus Compute to address these barriers: an asynchronous, future-based executor interface for submitting and monitoring tasks, shell and MPIbased function types, and a multi-user endpoint that can be deployed by administrators and used by authorized users.

Original languageEnglish (US)
Title of host publicationProceedings of SC 2024-W
Subtitle of host publicationWorkshops of the International Conference for High Performance Computing, Networking, Storage and Analysis
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages597-606
Number of pages10
ISBN (Electronic)9798350355543
DOIs
StatePublished - 2024
Event2024 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC Workshops 2024 - Atlanta, United States
Duration: Nov 17 2024Nov 22 2024

Publication series

NameProceedings of SC 2024-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis

Conference

Conference2024 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC Workshops 2024
Country/TerritoryUnited States
CityAtlanta
Period11/17/2411/22/24

Keywords

  • n/a

ASJC Scopus subject areas

  • Information Systems
  • Software
  • Modeling and Simulation
  • Artificial Intelligence
  • Computer Networks and Communications
  • Computer Science Applications
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Establishing a High-Performance and Productive Ecosystem for Distributed Execution of Python Functions Using Globus Compute'. Together they form a unique fingerprint.

Cite this