Toward understanding soft faults in high performance cluster networks

Jeffrey J. Evans, Seongbok Baik, Cynthia S. Hood, William Gropp

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Fault management in high performance cluster networks has been focused on the notion of hard faults (Le., link or node failures). Network degradations that negatively impact performance but do not result in failures often go unnoticed. In this paper, we classify such degradations as 80ft faults. In addition, we identify consistent performance as an important requirement in cluster networks. Using this service requirement, we describe a comprehensive strategy for cluster fault management.

Original languageEnglish (US)
Title of host publicationIntegrated Network Management VIII
Subtitle of host publicationManaging It All - IFIP/IEEE 8th International Symposium on Integrated Network Management, IM 2003
PublisherSpringer
Pages117-120
Number of pages4
ISBN (Print)9781475755213
DOIs
StatePublished - 2003
Externally publishedYes
EventIFIP/IEEE 8th International Symposium on Integrated Network Management, IM 2003 - Colorado Springs, CO, United States
Duration: Mar 24 2003Mar 28 2003

Publication series

NameIFIP Advances in Information and Communication Technology
Volume118
ISSN (Print)1868-4238

Other

OtherIFIP/IEEE 8th International Symposium on Integrated Network Management, IM 2003
Country/TerritoryUnited States
CityColorado Springs, CO
Period3/24/033/28/03

Keywords

  • Cluster
  • Fault management
  • Interconnection networks
  • Soft faults

ASJC Scopus subject areas

  • Information Systems and Management

Fingerprint

Dive into the research topics of 'Toward understanding soft faults in high performance cluster networks'. Together they form a unique fingerprint.

Cite this