Providing efficient I/O redundancy in MPI environments

Willam D. Gropp, Robert Ross, Neill Miller

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

Highly parallel applications often use either highly parallel file systems or large numbers of independent disks. Either approach can provide the high data rates necessary for parallel applications. However, the failure of a single disk or server can render the data useless. Conventional techniques, such as those based on applying erasure correcting codes to each file write, are prohibitively expensive for massively parallel scientific applications because of the granularity of access at which the codes are applied. In this paper we demonstrate a scalable method for recovering from single disk failures that is optimized for typical scientific data sets. This approach exploits coarser-grained (but precise) semantics to reduce the overhead of constructing recovery data and makes use of parallel computation (proportional to the data size and independent of number of processors) to construct data. Experiments are presented showing the efficiency of this approach on a cluster with independent disks, and a technique is described for hiding the creation of redundant data within the MPI-IO implementation.

Original languageEnglish (US)
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
EditorsDieter Kranzlmuller, Peter Kacsuk, Jack Dongarra
PublisherSpringer
Pages77-86
Number of pages10
ISBN (Print)3540231633
DOIs
StatePublished - 2004
Externally publishedYes

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3241
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Providing efficient I/O redundancy in MPI environments'. Together they form a unique fingerprint.

Cite this