Coordinated analysis of heterogeneous monitor data in enterprise clouds for incident response

Uttam Thakore, Harigovind V. Ramasamy, William H. Sanders

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

During incident analysis and response, enterprise cloud administrators want to use as much of their generated monitor data as possible. However, the reality is that decisions are often dictated by the tools actually available to automatically process the monitor data, rather than by an understanding of the relevance of the data for incident response. The significant manual effort and domain expertise required to process diverse cloud monitors means that much monitor data remain unexamined. We propose a framework for simplifying the complexity of data analysis for incident response. Our framework enables coordinated analysis of both metric (numerical) data and log (semi-structured, textual) data and exposes salient features within those data. As a foundation for the framework, we define a taxonomy for fields within monitor data based on insights gained from analyzing logs and metrics collected from all levels of an experimental platform-As-A-service (PaaS) cloud (EPC). Using the taxonomy, we lay out a method for semi-Automated feature extraction and discovery across heterogeneous monitors. We then describe a method for feature clustering to promote effective analysis of the data, and to remove redundant and uninformative features. We discuss the application of our framework for incident response within the EPC, including root cause analysis.

Original languageEnglish (US)
Title of host publicationProceedings - 2019 IEEE 30th International Symposium on Software Reliability Engineering Workshops, ISSREW 2019
EditorsKatinka Wolter, Ina Schieferdecker, Barbara Gallina, Michel Cukier, Roberto Natella, Naghmeh Ivaki, Nuno Laranjeiro
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages53-58
Number of pages6
ISBN (Electronic)9781728151380
DOIs
StatePublished - Oct 2019
Event30th IEEE International Symposium on Software Reliability Engineering Workshops, ISSREW 2019 - Berlin, Germany
Duration: Oct 28 2019Oct 31 2019

Publication series

NameProceedings - 2019 IEEE 30th International Symposium on Software Reliability Engineering Workshops, ISSREW 2019

Conference

Conference30th IEEE International Symposium on Software Reliability Engineering Workshops, ISSREW 2019
Country/TerritoryGermany
CityBerlin
Period10/28/1910/31/19

Keywords

  • AIOps
  • cloud computing
  • incident response
  • log analysis
  • log clustering
  • reliability

ASJC Scopus subject areas

  • Software
  • Safety, Risk, Reliability and Quality

Fingerprint

Dive into the research topics of 'Coordinated analysis of heterogeneous monitor data in enterprise clouds for incident response'. Together they form a unique fingerprint.

Cite this