TY - GEN
T1 - Failure data analysis of a LAN of Windows NT based computers
AU - Kalyanakrishnam, M.
AU - Kalbarczyk, Zbigniew T
AU - Iyer, Ravishankar K
PY - 1999
Y1 - 1999
N2 - This paper presents results of a failure data analysis of a LAN of Windows NT machines. Data for the study was obtained from event logs collected over a six-month period from the mail routing network of a commercial organization. The study focuses on characterizing causes of machine reboots. The key observations from this study are: (1) most of the problems that lead to reboots are software related, (2) rebooting the machine does not always solve the problem (in about 60% of the reboots, the rebooted machine reported problems within an hour or two of the reboot), (3) there are indications of propagated or correlated failures, and (4) though the average availability evaluates to over 99%, the machine downtime lasts (on average) two hours. Since the machines are dedicated mail servers, bringing down one or more of them can potentially disrupt storage, forwarding, reception and delivery of mail. This suggests that the average availability is not a good measure to characterize this type of network service.
AB - This paper presents results of a failure data analysis of a LAN of Windows NT machines. Data for the study was obtained from event logs collected over a six-month period from the mail routing network of a commercial organization. The study focuses on characterizing causes of machine reboots. The key observations from this study are: (1) most of the problems that lead to reboots are software related, (2) rebooting the machine does not always solve the problem (in about 60% of the reboots, the rebooted machine reported problems within an hour or two of the reboot), (3) there are indications of propagated or correlated failures, and (4) though the average availability evaluates to over 99%, the machine downtime lasts (on average) two hours. Since the machines are dedicated mail servers, bringing down one or more of them can potentially disrupt storage, forwarding, reception and delivery of mail. This suggests that the average availability is not a good measure to characterize this type of network service.
UR - http://www.scopus.com/inward/record.url?scp=0033344278&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0033344278&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:0033344278
SN - 0769502911
T3 - Proceedings of the IEEE Symposium on Reliable Distributed Systems
SP - 178
EP - 187
BT - Proceedings of the IEEE Symposium on Reliable Distributed Systems
PB - IEEE
T2 - Proceedings of the 1999 18th IEEE Symposium on Reliable Distributed Systems (SRDS'99)
Y2 - 19 October 1999 through 22 October 1999
ER -