Components and interfaces of a process management system for parallel programs

Ralph Butler, William Gropp, Ewing Lusk

Research output: Contribution to journalArticle

Abstract

Parallel jobs are different from sequential jobs and require a different type of process management. We present here a process management system for parallel programs such as those written using MPI. A primary goal of the system, which we call MPD (for multipurpose daemon), is to be scalable. By this we mean that startup of interactive parallel jobs comprising thousands of processes is quick, that signals can be quickly delivered to processes, and that stdin, stdout, and stderr are managed intuitively. Our primary target is parallel machines made up of clusters of SMPs, but the system is also useful in more tightly integrated environments. We describe how MPD enables fast startup and convenient runtime management of parallel jobs. We show how close control of stdio can support the easy implementation of a number of convenient system utilities, even a parallel debugger. We describe a simple but general interface that can be used to separate any process manager from a parallel library, which we use to keep MPD separate from MPICH.

Original languageEnglish (US)
Pages (from-to)1417-1429
Number of pages13
JournalParallel Computing
Volume27
Issue number11
DOIs
StatePublished - Oct 2001

Keywords

  • Parallel job management
  • Process management

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture
  • Computer Networks and Communications
  • Computer Graphics and Computer-Aided Design
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'Components and interfaces of a process management system for parallel programs'. Together they form a unique fingerprint.

  • Cite this