TY - JOUR
T1 - Characterizing and modeling cloud applications/jobs on a Google data center
AU - Di, Sheng
AU - Kondo, Derrick
AU - Cappello, Franck
N1 - Funding Information:
Acknowledgments We thank Google Inc, in particular Charles Reiss and John Wilkes, for making their invaluable trace data available. This work is supported by ANR project Clouds@home (ANR-09-JCJC-0056-01), also in part by the Advanced Scientific Computing Research Program, Office of Science, U.S. Department of Energy, under Contract DE-AC02-06CH11357, and by the INRIA-Illinois Joint Laboratory for Petascale Computing. This paper has been created by UChicago Argonne, LLC, Operator of Argonne National Laboratory (“Argonne”). Argonne, a U.S. Department of Energy Office of Science laboratory, is operated under Contract No. DE-AC02-06CH11357. The U.S. Government retains for itself, and others acting on its behalf, a paid-up nonexclusive, irrevocable worldwide license in said article to reproduce, prepare derivative works, distribute copies to the public, and perform publicly and display publicly, by or on behalf of the Government.
PY - 2014/7
Y1 - 2014/7
N2 - In this paper, we characterize and model Google applications and jobs, based on a 1-month Google trace from a large-scale Google data center. We address four contributions: (1) we compute the valuable statistics about task events and resource utilization for Google applications, based on various types of resources and execution types; (2) we analyze the classification of applications via a K-means clustering algorithm with optimized number of sets, based on task events and resource usage; (3) we study the correlation of Google application properties and running features (e.g., job priority and scheduling class); (4) we finally build a model that can simulate Google jobs/tasks and dynamic events, in accordance with Google trace. Experiments show that the tasks simulated based on our model exhibit fairly analogous features with those in Google trace. 95+ % of tasks' simulation errors are < 20 %, confirming a high accuracy of our simulation model.
AB - In this paper, we characterize and model Google applications and jobs, based on a 1-month Google trace from a large-scale Google data center. We address four contributions: (1) we compute the valuable statistics about task events and resource utilization for Google applications, based on various types of resources and execution types; (2) we analyze the classification of applications via a K-means clustering algorithm with optimized number of sets, based on task events and resource usage; (3) we study the correlation of Google application properties and running features (e.g., job priority and scheduling class); (4) we finally build a model that can simulate Google jobs/tasks and dynamic events, in accordance with Google trace. Experiments show that the tasks simulated based on our model exhibit fairly analogous features with those in Google trace. 95+ % of tasks' simulation errors are < 20 %, confirming a high accuracy of our simulation model.
KW - Characterization and analysis
KW - Cloud task
KW - Google data center
KW - Large-scale system trace
UR - http://www.scopus.com/inward/record.url?scp=84905508406&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84905508406&partnerID=8YFLogxK
U2 - 10.1007/s11227-014-1131-z
DO - 10.1007/s11227-014-1131-z
M3 - Article
AN - SCOPUS:84905508406
SN - 0920-8542
VL - 69
SP - 139
EP - 160
JO - Journal of Supercomputing
JF - Journal of Supercomputing
IS - 1
ER -