Evaluation and analysis of GreenHDFS: A self-adaptive, energy-conserving variant of the hadoop distributed file system

Rini T. Kaushik, Milind Bhandarkar, Klara Nahrstedt

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present a detailed evaluation and sensitivity analysis of an energy-conserving, highly scalable variant of the Hadoop Distributed File System (HDFS) called Green-HDFS. GreenHDFS logically divides the servers in a Hadoop cluster into Hot and Cold Zones and relies on insightful data-classification driven energy-conserving data placement to realize guaranteed, substantially long periods (several days) of idleness in a significant subset of servers in the Cold Zone. Detailed lifespan analysis of the files in a large-scale production Hadoop cluster at Yahoo points at the viability of GreenHDFS. Simulation results with real-world Yahoo HDFS traces show that GreenHDFS can achieve 24% energy cost reduction by doing power management in only one top-level tenant directory in the cluster and meets all the scale-down mandates in spite of the unique scale-down challenges present in a Hadoop cluster. If GreenHDFS technique is applied to all the Hadoop clusters at Yahoo (amounting to 38000 servers), $2.1million can be saved in energy costs per annum. Sensitivity analysis shows that energy-conservation is minimally sensitive to the thresholds in GreenHDFS. Lifespan analysis points out that one-size-fits-all energy-management policies won't suffice in a multi-tenant Hadoop Cluster.

Original languageEnglish (US)
Title of host publicationProceedings - 2nd IEEE International Conference on Cloud Computing Technology and Science, CloudCom 2010
Pages274-287
Number of pages14
DOIs
StatePublished - 2010
Event2nd IEEE International Conference on Cloud Computing Technology and Science, CloudCom 2010 - Indianapolis, IN, United States
Duration: Nov 30 2010Dec 3 2010

Publication series

NameProceedings - 2nd IEEE International Conference on Cloud Computing Technology and Science, CloudCom 2010

Other

Other2nd IEEE International Conference on Cloud Computing Technology and Science, CloudCom 2010
CountryUnited States
CityIndianapolis, IN
Period11/30/1012/3/10

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Theoretical Computer Science

Fingerprint Dive into the research topics of 'Evaluation and analysis of GreenHDFS: A self-adaptive, energy-conserving variant of the hadoop distributed file system'. Together they form a unique fingerprint.

  • Cite this

    Kaushik, R. T., Bhandarkar, M., & Nahrstedt, K. (2010). Evaluation and analysis of GreenHDFS: A self-adaptive, energy-conserving variant of the hadoop distributed file system. In Proceedings - 2nd IEEE International Conference on Cloud Computing Technology and Science, CloudCom 2010 (pp. 274-287). [5708461] (Proceedings - 2nd IEEE International Conference on Cloud Computing Technology and Science, CloudCom 2010). https://doi.org/10.1109/CloudCom.2010.109