Evaluation and analysis of GreenHDFS: A self-adaptive, energy-conserving variant of the hadoop distributed file system

Rini T. Kaushik, Milind Bhandarkar, Klara Nahrstedt

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present a detailed evaluation and sensitivity analysis of an energy-conserving, highly scalable variant of the Hadoop Distributed File System (HDFS) called Green-HDFS. GreenHDFS logically divides the servers in a Hadoop cluster into Hot and Cold Zones and relies on insightful data-classification driven energy-conserving data placement to realize guaranteed, substantially long periods (several days) of idleness in a significant subset of servers in the Cold Zone. Detailed lifespan analysis of the files in a large-scale production Hadoop cluster at Yahoo points at the viability of GreenHDFS. Simulation results with real-world Yahoo HDFS traces show that GreenHDFS can achieve 24% energy cost reduction by doing power management in only one top-level tenant directory in the cluster and meets all the scale-down mandates in spite of the unique scale-down challenges present in a Hadoop cluster. If GreenHDFS technique is applied to all the Hadoop clusters at Yahoo (amounting to 38000 servers), $2.1million can be saved in energy costs per annum. Sensitivity analysis shows that energy-conservation is minimally sensitive to the thresholds in GreenHDFS. Lifespan analysis points out that one-size-fits-all energy-management policies won't suffice in a multi-tenant Hadoop Cluster.

Original languageEnglish (US)
Title of host publicationProceedings - 2nd IEEE International Conference on Cloud Computing Technology and Science, CloudCom 2010
Pages274-287
Number of pages14
DOIs
StatePublished - Dec 1 2010
Event2nd IEEE International Conference on Cloud Computing Technology and Science, CloudCom 2010 - Indianapolis, IN, United States
Duration: Nov 30 2010Dec 3 2010

Publication series

NameProceedings - 2nd IEEE International Conference on Cloud Computing Technology and Science, CloudCom 2010

Other

Other2nd IEEE International Conference on Cloud Computing Technology and Science, CloudCom 2010
CountryUnited States
CityIndianapolis, IN
Period11/30/1012/3/10

Fingerprint

Distributed File System
Servers
Sensitivity analysis
Evaluation
Energy
Server
Life Span
Energy management
Cost reduction
Sensitivity Analysis
Energy conservation
Data Placement
Energy Management
Power Management
Data Classification
Energy Conservation
Costs
Viability
Divides
Trace

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Theoretical Computer Science

Cite this

Kaushik, R. T., Bhandarkar, M., & Nahrstedt, K. (2010). Evaluation and analysis of GreenHDFS: A self-adaptive, energy-conserving variant of the hadoop distributed file system. In Proceedings - 2nd IEEE International Conference on Cloud Computing Technology and Science, CloudCom 2010 (pp. 274-287). [5708461] (Proceedings - 2nd IEEE International Conference on Cloud Computing Technology and Science, CloudCom 2010). https://doi.org/10.1109/CloudCom.2010.109

Evaluation and analysis of GreenHDFS : A self-adaptive, energy-conserving variant of the hadoop distributed file system. / Kaushik, Rini T.; Bhandarkar, Milind; Nahrstedt, Klara.

Proceedings - 2nd IEEE International Conference on Cloud Computing Technology and Science, CloudCom 2010. 2010. p. 274-287 5708461 (Proceedings - 2nd IEEE International Conference on Cloud Computing Technology and Science, CloudCom 2010).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Kaushik, RT, Bhandarkar, M & Nahrstedt, K 2010, Evaluation and analysis of GreenHDFS: A self-adaptive, energy-conserving variant of the hadoop distributed file system. in Proceedings - 2nd IEEE International Conference on Cloud Computing Technology and Science, CloudCom 2010., 5708461, Proceedings - 2nd IEEE International Conference on Cloud Computing Technology and Science, CloudCom 2010, pp. 274-287, 2nd IEEE International Conference on Cloud Computing Technology and Science, CloudCom 2010, Indianapolis, IN, United States, 11/30/10. https://doi.org/10.1109/CloudCom.2010.109
Kaushik RT, Bhandarkar M, Nahrstedt K. Evaluation and analysis of GreenHDFS: A self-adaptive, energy-conserving variant of the hadoop distributed file system. In Proceedings - 2nd IEEE International Conference on Cloud Computing Technology and Science, CloudCom 2010. 2010. p. 274-287. 5708461. (Proceedings - 2nd IEEE International Conference on Cloud Computing Technology and Science, CloudCom 2010). https://doi.org/10.1109/CloudCom.2010.109
Kaushik, Rini T. ; Bhandarkar, Milind ; Nahrstedt, Klara. / Evaluation and analysis of GreenHDFS : A self-adaptive, energy-conserving variant of the hadoop distributed file system. Proceedings - 2nd IEEE International Conference on Cloud Computing Technology and Science, CloudCom 2010. 2010. pp. 274-287 (Proceedings - 2nd IEEE International Conference on Cloud Computing Technology and Science, CloudCom 2010).
@inproceedings{80fa98dfbeee4a84b4350b1a52255502,
title = "Evaluation and analysis of GreenHDFS: A self-adaptive, energy-conserving variant of the hadoop distributed file system",
abstract = "We present a detailed evaluation and sensitivity analysis of an energy-conserving, highly scalable variant of the Hadoop Distributed File System (HDFS) called Green-HDFS. GreenHDFS logically divides the servers in a Hadoop cluster into Hot and Cold Zones and relies on insightful data-classification driven energy-conserving data placement to realize guaranteed, substantially long periods (several days) of idleness in a significant subset of servers in the Cold Zone. Detailed lifespan analysis of the files in a large-scale production Hadoop cluster at Yahoo points at the viability of GreenHDFS. Simulation results with real-world Yahoo HDFS traces show that GreenHDFS can achieve 24{\%} energy cost reduction by doing power management in only one top-level tenant directory in the cluster and meets all the scale-down mandates in spite of the unique scale-down challenges present in a Hadoop cluster. If GreenHDFS technique is applied to all the Hadoop clusters at Yahoo (amounting to 38000 servers), $2.1million can be saved in energy costs per annum. Sensitivity analysis shows that energy-conservation is minimally sensitive to the thresholds in GreenHDFS. Lifespan analysis points out that one-size-fits-all energy-management policies won't suffice in a multi-tenant Hadoop Cluster.",
author = "Kaushik, {Rini T.} and Milind Bhandarkar and Klara Nahrstedt",
year = "2010",
month = "12",
day = "1",
doi = "10.1109/CloudCom.2010.109",
language = "English (US)",
isbn = "9780769543024",
series = "Proceedings - 2nd IEEE International Conference on Cloud Computing Technology and Science, CloudCom 2010",
pages = "274--287",
booktitle = "Proceedings - 2nd IEEE International Conference on Cloud Computing Technology and Science, CloudCom 2010",

}

TY - GEN

T1 - Evaluation and analysis of GreenHDFS

T2 - A self-adaptive, energy-conserving variant of the hadoop distributed file system

AU - Kaushik, Rini T.

AU - Bhandarkar, Milind

AU - Nahrstedt, Klara

PY - 2010/12/1

Y1 - 2010/12/1

N2 - We present a detailed evaluation and sensitivity analysis of an energy-conserving, highly scalable variant of the Hadoop Distributed File System (HDFS) called Green-HDFS. GreenHDFS logically divides the servers in a Hadoop cluster into Hot and Cold Zones and relies on insightful data-classification driven energy-conserving data placement to realize guaranteed, substantially long periods (several days) of idleness in a significant subset of servers in the Cold Zone. Detailed lifespan analysis of the files in a large-scale production Hadoop cluster at Yahoo points at the viability of GreenHDFS. Simulation results with real-world Yahoo HDFS traces show that GreenHDFS can achieve 24% energy cost reduction by doing power management in only one top-level tenant directory in the cluster and meets all the scale-down mandates in spite of the unique scale-down challenges present in a Hadoop cluster. If GreenHDFS technique is applied to all the Hadoop clusters at Yahoo (amounting to 38000 servers), $2.1million can be saved in energy costs per annum. Sensitivity analysis shows that energy-conservation is minimally sensitive to the thresholds in GreenHDFS. Lifespan analysis points out that one-size-fits-all energy-management policies won't suffice in a multi-tenant Hadoop Cluster.

AB - We present a detailed evaluation and sensitivity analysis of an energy-conserving, highly scalable variant of the Hadoop Distributed File System (HDFS) called Green-HDFS. GreenHDFS logically divides the servers in a Hadoop cluster into Hot and Cold Zones and relies on insightful data-classification driven energy-conserving data placement to realize guaranteed, substantially long periods (several days) of idleness in a significant subset of servers in the Cold Zone. Detailed lifespan analysis of the files in a large-scale production Hadoop cluster at Yahoo points at the viability of GreenHDFS. Simulation results with real-world Yahoo HDFS traces show that GreenHDFS can achieve 24% energy cost reduction by doing power management in only one top-level tenant directory in the cluster and meets all the scale-down mandates in spite of the unique scale-down challenges present in a Hadoop cluster. If GreenHDFS technique is applied to all the Hadoop clusters at Yahoo (amounting to 38000 servers), $2.1million can be saved in energy costs per annum. Sensitivity analysis shows that energy-conservation is minimally sensitive to the thresholds in GreenHDFS. Lifespan analysis points out that one-size-fits-all energy-management policies won't suffice in a multi-tenant Hadoop Cluster.

UR - http://www.scopus.com/inward/record.url?scp=79952465334&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79952465334&partnerID=8YFLogxK

U2 - 10.1109/CloudCom.2010.109

DO - 10.1109/CloudCom.2010.109

M3 - Conference contribution

AN - SCOPUS:79952465334

SN - 9780769543024

T3 - Proceedings - 2nd IEEE International Conference on Cloud Computing Technology and Science, CloudCom 2010

SP - 274

EP - 287

BT - Proceedings - 2nd IEEE International Conference on Cloud Computing Technology and Science, CloudCom 2010

ER -