Characterizing and modeling cloud applications/jobs on a Google data center

Sheng Di, Derrick Kondo, Franck Cappello

Research output: Contribution to journalArticlepeer-review


In this paper, we characterize and model Google applications and jobs, based on a 1-month Google trace from a large-scale Google data center. We address four contributions: (1) we compute the valuable statistics about task events and resource utilization for Google applications, based on various types of resources and execution types; (2) we analyze the classification of applications via a K-means clustering algorithm with optimized number of sets, based on task events and resource usage; (3) we study the correlation of Google application properties and running features (e.g., job priority and scheduling class); (4) we finally build a model that can simulate Google jobs/tasks and dynamic events, in accordance with Google trace. Experiments show that the tasks simulated based on our model exhibit fairly analogous features with those in Google trace. 95+ % of tasks' simulation errors are < 20 %, confirming a high accuracy of our simulation model.

Original languageEnglish (US)
Pages (from-to)139-160
Number of pages22
JournalJournal of Supercomputing
Issue number1
StatePublished - Jul 2014
Externally publishedYes


  • Characterization and analysis
  • Cloud task
  • Google data center
  • Large-scale system trace

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Information Systems
  • Hardware and Architecture


Dive into the research topics of 'Characterizing and modeling cloud applications/jobs on a Google data center'. Together they form a unique fingerprint.

Cite this