Abstract

This paper presents results of a failure data analysis of a LAN of Windows NT machines. Data for the study was obtained from event logs collected over a six-month period from the mail routing network of a commercial organization. The study focuses on characterizing causes of machine reboots. The key observations from this study are: (1) most of the problems that lead to reboots are software related, (2) rebooting the machine does not always solve the problem (in about 60% of the reboots, the rebooted machine reported problems within an hour or two of the reboot), (3) there are indications of propagated or correlated failures, and (4) though the average availability evaluates to over 99%, the machine downtime lasts (on average) two hours. Since the machines are dedicated mail servers, bringing down one or more of them can potentially disrupt storage, forwarding, reception and delivery of mail. This suggests that the average availability is not a good measure to characterize this type of network service.

Original languageEnglish (US)
Title of host publicationProceedings of the IEEE Symposium on Reliable Distributed Systems
PublisherIEEE
Pages178-187
Number of pages10
ISBN (Print)0769502911
StatePublished - Dec 1 1999
EventProceedings of the 1999 18th IEEE Symposium on Reliable Distributed Systems (SRDS'99) - Lausanne, Switz
Duration: Oct 19 1999Oct 22 1999

Publication series

NameProceedings of the IEEE Symposium on Reliable Distributed Systems
ISSN (Print)1060-9857

Other

OtherProceedings of the 1999 18th IEEE Symposium on Reliable Distributed Systems (SRDS'99)
CityLausanne, Switz
Period10/19/9910/22/99

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture
  • Computer Networks and Communications

Fingerprint Dive into the research topics of 'Failure data analysis of a LAN of Windows NT based computers'. Together they form a unique fingerprint.

Cite this