Metadata traces and workload models for evaluating big storage systems

Cristina L. Abad, Huong Luu, Nathan Roberts, Kihwal Lee, Yi Lu, Roy H. Campbell

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Efficient namespace metadata management is increasingly important as next-generation file systems are designed for peta and exascales. New schemes have been proposed; however, their evaluation has been insufficient due to a lack of appropriate namespace metadata traces. Specifically, no Big Data storage system metadata trace is publicly available and existing ones are a poor replacement. We studied publicly available traces and one Big Data trace from Yahoo! and note some of the differences and their implications to metadata management studies. We discuss the insufficiency of existing evaluation approaches and present a first step towards a statistical metadata workload model that can capture the relevant characteristics of a workload and is suitable for synthetic workload generation. We describe Mimesis, a synthetic workload generator, and evaluate its usefulness through a case study in a least recently used metadata cache for the Hadoop Distributed File System. Simulation results show that the traces generated by Mimesis mimic the original workload and can be used in place of the real trace providing accurate results.

Original languageEnglish (US)
Title of host publicationProceedings - 2012 IEEE/ACM 5th International Conference on Utility and Cloud Computing, UCC 2012
Pages125-132
Number of pages8
DOIs
StatePublished - Dec 1 2012
Event2012 IEEE/ACM 5th International Conference on Utility and Cloud Computing, UCC 2012 - Chicago, IL, United States
Duration: Nov 5 2012Nov 8 2012

Publication series

NameProceedings - 2012 IEEE/ACM 5th International Conference on Utility and Cloud Computing, UCC 2012

Other

Other2012 IEEE/ACM 5th International Conference on Utility and Cloud Computing, UCC 2012
CountryUnited States
CityChicago, IL
Period11/5/1211/8/12

Fingerprint

Metadata

Keywords

  • Big data
  • HDFS
  • MDS
  • Metadata
  • Storage

ASJC Scopus subject areas

  • Software

Cite this

Abad, C. L., Luu, H., Roberts, N., Lee, K., Lu, Y., & Campbell, R. H. (2012). Metadata traces and workload models for evaluating big storage systems. In Proceedings - 2012 IEEE/ACM 5th International Conference on Utility and Cloud Computing, UCC 2012 (pp. 125-132). [6424937] (Proceedings - 2012 IEEE/ACM 5th International Conference on Utility and Cloud Computing, UCC 2012). https://doi.org/10.1109/UCC.2012.27

Metadata traces and workload models for evaluating big storage systems. / Abad, Cristina L.; Luu, Huong; Roberts, Nathan; Lee, Kihwal; Lu, Yi; Campbell, Roy H.

Proceedings - 2012 IEEE/ACM 5th International Conference on Utility and Cloud Computing, UCC 2012. 2012. p. 125-132 6424937 (Proceedings - 2012 IEEE/ACM 5th International Conference on Utility and Cloud Computing, UCC 2012).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abad, CL, Luu, H, Roberts, N, Lee, K, Lu, Y & Campbell, RH 2012, Metadata traces and workload models for evaluating big storage systems. in Proceedings - 2012 IEEE/ACM 5th International Conference on Utility and Cloud Computing, UCC 2012., 6424937, Proceedings - 2012 IEEE/ACM 5th International Conference on Utility and Cloud Computing, UCC 2012, pp. 125-132, 2012 IEEE/ACM 5th International Conference on Utility and Cloud Computing, UCC 2012, Chicago, IL, United States, 11/5/12. https://doi.org/10.1109/UCC.2012.27
Abad CL, Luu H, Roberts N, Lee K, Lu Y, Campbell RH. Metadata traces and workload models for evaluating big storage systems. In Proceedings - 2012 IEEE/ACM 5th International Conference on Utility and Cloud Computing, UCC 2012. 2012. p. 125-132. 6424937. (Proceedings - 2012 IEEE/ACM 5th International Conference on Utility and Cloud Computing, UCC 2012). https://doi.org/10.1109/UCC.2012.27
Abad, Cristina L. ; Luu, Huong ; Roberts, Nathan ; Lee, Kihwal ; Lu, Yi ; Campbell, Roy H. / Metadata traces and workload models for evaluating big storage systems. Proceedings - 2012 IEEE/ACM 5th International Conference on Utility and Cloud Computing, UCC 2012. 2012. pp. 125-132 (Proceedings - 2012 IEEE/ACM 5th International Conference on Utility and Cloud Computing, UCC 2012).
@inproceedings{329e12fba15e474ca181ca552be09551,
title = "Metadata traces and workload models for evaluating big storage systems",
abstract = "Efficient namespace metadata management is increasingly important as next-generation file systems are designed for peta and exascales. New schemes have been proposed; however, their evaluation has been insufficient due to a lack of appropriate namespace metadata traces. Specifically, no Big Data storage system metadata trace is publicly available and existing ones are a poor replacement. We studied publicly available traces and one Big Data trace from Yahoo! and note some of the differences and their implications to metadata management studies. We discuss the insufficiency of existing evaluation approaches and present a first step towards a statistical metadata workload model that can capture the relevant characteristics of a workload and is suitable for synthetic workload generation. We describe Mimesis, a synthetic workload generator, and evaluate its usefulness through a case study in a least recently used metadata cache for the Hadoop Distributed File System. Simulation results show that the traces generated by Mimesis mimic the original workload and can be used in place of the real trace providing accurate results.",
keywords = "Big data, HDFS, MDS, Metadata, Storage",
author = "Abad, {Cristina L.} and Huong Luu and Nathan Roberts and Kihwal Lee and Yi Lu and Campbell, {Roy H.}",
year = "2012",
month = "12",
day = "1",
doi = "10.1109/UCC.2012.27",
language = "English (US)",
isbn = "9780769548623",
series = "Proceedings - 2012 IEEE/ACM 5th International Conference on Utility and Cloud Computing, UCC 2012",
pages = "125--132",
booktitle = "Proceedings - 2012 IEEE/ACM 5th International Conference on Utility and Cloud Computing, UCC 2012",

}

TY - GEN

T1 - Metadata traces and workload models for evaluating big storage systems

AU - Abad, Cristina L.

AU - Luu, Huong

AU - Roberts, Nathan

AU - Lee, Kihwal

AU - Lu, Yi

AU - Campbell, Roy H.

PY - 2012/12/1

Y1 - 2012/12/1

N2 - Efficient namespace metadata management is increasingly important as next-generation file systems are designed for peta and exascales. New schemes have been proposed; however, their evaluation has been insufficient due to a lack of appropriate namespace metadata traces. Specifically, no Big Data storage system metadata trace is publicly available and existing ones are a poor replacement. We studied publicly available traces and one Big Data trace from Yahoo! and note some of the differences and their implications to metadata management studies. We discuss the insufficiency of existing evaluation approaches and present a first step towards a statistical metadata workload model that can capture the relevant characteristics of a workload and is suitable for synthetic workload generation. We describe Mimesis, a synthetic workload generator, and evaluate its usefulness through a case study in a least recently used metadata cache for the Hadoop Distributed File System. Simulation results show that the traces generated by Mimesis mimic the original workload and can be used in place of the real trace providing accurate results.

AB - Efficient namespace metadata management is increasingly important as next-generation file systems are designed for peta and exascales. New schemes have been proposed; however, their evaluation has been insufficient due to a lack of appropriate namespace metadata traces. Specifically, no Big Data storage system metadata trace is publicly available and existing ones are a poor replacement. We studied publicly available traces and one Big Data trace from Yahoo! and note some of the differences and their implications to metadata management studies. We discuss the insufficiency of existing evaluation approaches and present a first step towards a statistical metadata workload model that can capture the relevant characteristics of a workload and is suitable for synthetic workload generation. We describe Mimesis, a synthetic workload generator, and evaluate its usefulness through a case study in a least recently used metadata cache for the Hadoop Distributed File System. Simulation results show that the traces generated by Mimesis mimic the original workload and can be used in place of the real trace providing accurate results.

KW - Big data

KW - HDFS

KW - MDS

KW - Metadata

KW - Storage

UR - http://www.scopus.com/inward/record.url?scp=84874278017&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84874278017&partnerID=8YFLogxK

U2 - 10.1109/UCC.2012.27

DO - 10.1109/UCC.2012.27

M3 - Conference contribution

AN - SCOPUS:84874278017

SN - 9780769548623

T3 - Proceedings - 2012 IEEE/ACM 5th International Conference on Utility and Cloud Computing, UCC 2012

SP - 125

EP - 132

BT - Proceedings - 2012 IEEE/ACM 5th International Conference on Utility and Cloud Computing, UCC 2012

ER -