TY - GEN
T1 - Theius
T2 - 1st IEEE International Conference on Cloud Engineering, IC2E 2013
AU - Tedesco, Jon
AU - Dudko, Roman
AU - Sharma, Abhishek
AU - Farivar, Reza
AU - Campbell, Roy
N1 - Copyright:
Copyright 2013 Elsevier B.V., All rights reserved.
PY - 2013
Y1 - 2013
N2 - As cloud computing clusters continue to grow, maintaining the health of these clusters becomes increasingly challenging. Recent work has studied how we can efficiently monitor the status of machines in these clusters and how we can detect problems or predict them before they occur, yet little work has focused on addressing the bottleneck between when these failures occur and when they are fixed: system administrators. As monitoring and failure detection systems mature, we are able to extract tremendous amounts of information about the status of the system in real time. However, this amount of data is difficult to understand for human beings, especially those inexperienced with the particular cluster. In this paper, we introduce a web-based visualization suite called Theius to allow system administrators to quickly understand the state of the cloud system as a whole. We outline the key features of this visualization tool, and show that it is more intuitive and easy to use than Ganglia, a state-of-the-art visualization tool for clusters. Likewise, we demonstrate that our tool can scale, presenting a use case with our visualization showing a 5000 node cluster. Although our tool is implemented for Hadoop clusters, our contribution is general to any cloud computing system.
AB - As cloud computing clusters continue to grow, maintaining the health of these clusters becomes increasingly challenging. Recent work has studied how we can efficiently monitor the status of machines in these clusters and how we can detect problems or predict them before they occur, yet little work has focused on addressing the bottleneck between when these failures occur and when they are fixed: system administrators. As monitoring and failure detection systems mature, we are able to extract tremendous amounts of information about the status of the system in real time. However, this amount of data is difficult to understand for human beings, especially those inexperienced with the particular cluster. In this paper, we introduce a web-based visualization suite called Theius to allow system administrators to quickly understand the state of the cloud system as a whole. We outline the key features of this visualization tool, and show that it is more intuitive and easy to use than Ganglia, a state-of-the-art visualization tool for clusters. Likewise, we demonstrate that our tool can scale, presenting a use case with our visualization showing a 5000 node cluster. Although our tool is implemented for Hadoop clusters, our contribution is general to any cloud computing system.
KW - Cloud computing
KW - Cluster computing
KW - Failure detection
KW - Failure prediction
KW - Hadoop
KW - Monitoring
KW - Visualization
UR - http://www.scopus.com/inward/record.url?scp=84881144234&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84881144234&partnerID=8YFLogxK
U2 - 10.1109/IC2E.2013.36
DO - 10.1109/IC2E.2013.36
M3 - Conference contribution
AN - SCOPUS:84881144234
SN - 9780769549453
T3 - Proceedings of the IEEE International Conference on Cloud Engineering, IC2E 2013
SP - 177
EP - 182
BT - Proceedings of the IEEE International Conference on Cloud Engineering, IC2E 2013
Y2 - 25 March 2013 through 28 March 2013
ER -