Sentiment Analysis based Error Detection for Large-Scale Systems

Khalid Ayedh Alharthi, Arshad Jhumka, Sheng Di, Franck Cappello, Edward Chuah

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Today's large-scale systems such as High Performance Computing (HPC) Systems are designed/utilized towards exascale computing, inevitably decreasing its reliability due to the increasing design complexity. HPC systems conduct extensive logging of their execution behaviour. In this paper, we leverage the inherent meaning behind the log messages and propose a novel sentiment analysis-based approach for the error detection in large-scale systems, by automatically mining the sentiments in the log messages. Our contributions are four-fold. (1) We develop a machine learning (ML) based approach to automatically build a sentiment lexicon, based on the system log message templates. (2) Using the sentiment lexicon, we develop an algorithm to detect system errors. (3) We develop an algorithm to identify the nodes and components with erroneous behaviors, based on sentiment polarity scores. (4) We evaluate our solution vs. other state-of-the-art machine/deep learning algorithms based on three representative supercomputers' system logs. Experiments show that our error detection algorithm can identify error messages with an average MCC score and f-score of 91% and 96% respectively, while state of the art ML/deep learning model (LSTM) obtains only 67% and 84%. To the best of our knowledge, this is the first work leveraging the sentiments embedded in log entries of large-scale systems for system health analysis.

Original languageEnglish (US)
Title of host publicationProceedings - 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages237-249
Number of pages13
ISBN (Electronic)9781665435727
DOIs
StatePublished - Jun 2021
Event51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2021 - Virtual, Online, Taiwan, Province of China
Duration: Jun 21 2021Jun 24 2021

Publication series

NameProceedings - 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2021

Conference

Conference51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2021
Country/TerritoryTaiwan, Province of China
CityVirtual, Online
Period6/21/216/24/21

Keywords

  • error detection
  • large-scale systems
  • logistic regression
  • Sentiment analysis lexicon
  • Stochastic Gradient Descent

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Information Systems and Management
  • Safety, Risk, Reliability and Quality

Fingerprint

Dive into the research topics of 'Sentiment Analysis based Error Detection for Large-Scale Systems'. Together they form a unique fingerprint.

Cite this