Scaling file systems to support petascale clusters: A dependability analysis to support informed design choices

Shravan Gaonkar, Eric Rozier, Anthony Tong, William H. Sanders

Research output: Contribution to conferencePaper

Abstract

Petascale computing requires I/O subsystems that can keep up with the dramatic computing power demanded by such systems. TOP500.org ranks top computers based on their peak compute performance, but there has not been adequate investigation of the current state-of-the-art and future requirements of storage area networks that support petascale computers. Dependable scaling of an I/O subsystem to support petascale computing is not as simple as adding more storage servers. In this paper, we present a stochastic activity network model that uses failure rates computed from real logs to predict the reliability and availability of the storage architecture of the Abe cluster at the National Center for Supercomputing Applications (NCSA). We then use the model to evaluate the challenges encountered as one scales the number of storage servers to support petascale computing. The results present new insights regarding the dependability challenges that will be encountered when building next-generation petabyte storage. Furthermore, we provide insight into a new design approach that will enable system designers to integrate the trace-based analysis of parameter values from real system data into their stochastic models to allow informed design choices.

Original languageEnglish (US)
Pages386-391
Number of pages6
DOIs
StatePublished - Oct 13 2008
Event2008 International Conference on Dependable Systems and Networks, DSN-2008 - Anchorage, AK, United States
Duration: Jun 24 2008Jun 27 2008

Other

Other2008 International Conference on Dependable Systems and Networks, DSN-2008
CountryUnited States
CityAnchorage, AK
Period6/24/086/27/08

Fingerprint

Servers
Stochastic models
Availability

Keywords

  • Data analysis
  • Modeling techniques
  • Reliability and availability
  • Simulation
  • Storage systems

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Cite this

Gaonkar, S., Rozier, E., Tong, A., & Sanders, W. H. (2008). Scaling file systems to support petascale clusters: A dependability analysis to support informed design choices. 386-391. Paper presented at 2008 International Conference on Dependable Systems and Networks, DSN-2008, Anchorage, AK, United States. https://doi.org/10.1109/DSN.2008.4630107

Scaling file systems to support petascale clusters : A dependability analysis to support informed design choices. / Gaonkar, Shravan; Rozier, Eric; Tong, Anthony; Sanders, William H.

2008. 386-391 Paper presented at 2008 International Conference on Dependable Systems and Networks, DSN-2008, Anchorage, AK, United States.

Research output: Contribution to conferencePaper

Gaonkar, S, Rozier, E, Tong, A & Sanders, WH 2008, 'Scaling file systems to support petascale clusters: A dependability analysis to support informed design choices', Paper presented at 2008 International Conference on Dependable Systems and Networks, DSN-2008, Anchorage, AK, United States, 6/24/08 - 6/27/08 pp. 386-391. https://doi.org/10.1109/DSN.2008.4630107
Gaonkar S, Rozier E, Tong A, Sanders WH. Scaling file systems to support petascale clusters: A dependability analysis to support informed design choices. 2008. Paper presented at 2008 International Conference on Dependable Systems and Networks, DSN-2008, Anchorage, AK, United States. https://doi.org/10.1109/DSN.2008.4630107
Gaonkar, Shravan ; Rozier, Eric ; Tong, Anthony ; Sanders, William H. / Scaling file systems to support petascale clusters : A dependability analysis to support informed design choices. Paper presented at 2008 International Conference on Dependable Systems and Networks, DSN-2008, Anchorage, AK, United States.6 p.
@conference{57d1e9a322e2408b8896152e75577edf,
title = "Scaling file systems to support petascale clusters: A dependability analysis to support informed design choices",
abstract = "Petascale computing requires I/O subsystems that can keep up with the dramatic computing power demanded by such systems. TOP500.org ranks top computers based on their peak compute performance, but there has not been adequate investigation of the current state-of-the-art and future requirements of storage area networks that support petascale computers. Dependable scaling of an I/O subsystem to support petascale computing is not as simple as adding more storage servers. In this paper, we present a stochastic activity network model that uses failure rates computed from real logs to predict the reliability and availability of the storage architecture of the Abe cluster at the National Center for Supercomputing Applications (NCSA). We then use the model to evaluate the challenges encountered as one scales the number of storage servers to support petascale computing. The results present new insights regarding the dependability challenges that will be encountered when building next-generation petabyte storage. Furthermore, we provide insight into a new design approach that will enable system designers to integrate the trace-based analysis of parameter values from real system data into their stochastic models to allow informed design choices.",
keywords = "Data analysis, Modeling techniques, Reliability and availability, Simulation, Storage systems",
author = "Shravan Gaonkar and Eric Rozier and Anthony Tong and Sanders, {William H.}",
year = "2008",
month = "10",
day = "13",
doi = "10.1109/DSN.2008.4630107",
language = "English (US)",
pages = "386--391",
note = "2008 International Conference on Dependable Systems and Networks, DSN-2008 ; Conference date: 24-06-2008 Through 27-06-2008",

}

TY - CONF

T1 - Scaling file systems to support petascale clusters

T2 - A dependability analysis to support informed design choices

AU - Gaonkar, Shravan

AU - Rozier, Eric

AU - Tong, Anthony

AU - Sanders, William H.

PY - 2008/10/13

Y1 - 2008/10/13

N2 - Petascale computing requires I/O subsystems that can keep up with the dramatic computing power demanded by such systems. TOP500.org ranks top computers based on their peak compute performance, but there has not been adequate investigation of the current state-of-the-art and future requirements of storage area networks that support petascale computers. Dependable scaling of an I/O subsystem to support petascale computing is not as simple as adding more storage servers. In this paper, we present a stochastic activity network model that uses failure rates computed from real logs to predict the reliability and availability of the storage architecture of the Abe cluster at the National Center for Supercomputing Applications (NCSA). We then use the model to evaluate the challenges encountered as one scales the number of storage servers to support petascale computing. The results present new insights regarding the dependability challenges that will be encountered when building next-generation petabyte storage. Furthermore, we provide insight into a new design approach that will enable system designers to integrate the trace-based analysis of parameter values from real system data into their stochastic models to allow informed design choices.

AB - Petascale computing requires I/O subsystems that can keep up with the dramatic computing power demanded by such systems. TOP500.org ranks top computers based on their peak compute performance, but there has not been adequate investigation of the current state-of-the-art and future requirements of storage area networks that support petascale computers. Dependable scaling of an I/O subsystem to support petascale computing is not as simple as adding more storage servers. In this paper, we present a stochastic activity network model that uses failure rates computed from real logs to predict the reliability and availability of the storage architecture of the Abe cluster at the National Center for Supercomputing Applications (NCSA). We then use the model to evaluate the challenges encountered as one scales the number of storage servers to support petascale computing. The results present new insights regarding the dependability challenges that will be encountered when building next-generation petabyte storage. Furthermore, we provide insight into a new design approach that will enable system designers to integrate the trace-based analysis of parameter values from real system data into their stochastic models to allow informed design choices.

KW - Data analysis

KW - Modeling techniques

KW - Reliability and availability

KW - Simulation

KW - Storage systems

UR - http://www.scopus.com/inward/record.url?scp=53349175680&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=53349175680&partnerID=8YFLogxK

U2 - 10.1109/DSN.2008.4630107

DO - 10.1109/DSN.2008.4630107

M3 - Paper

AN - SCOPUS:53349175680

SP - 386

EP - 391

ER -