Batch and online anomaly detection for scientific applications in a Kubernetes environment

Sahand Hariri, Matias Carrasco Kind

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present a cloud based anomaly detection service framework that uses a containerized Spark cluster and ancillary user interfaces all managed by Kubernetes. The stack of technology put together allows for fast, reliable, resilient and easily scalable service for either batch or streaming data. At the heart of the service, we utilize an improved version of the algorithm Isolation Forest called Extended Isolation Forest for robust and efficient anomaly detection. We showcase the design and a normal workflow of our infrastructure which is ready to deploy on any Kubernetes cluster without extra technical knowledge. With exposed APIs and simple graphical interfaces, users can load any data and detect anomalies on the loaded set or on newly presented data points using a batch or a streaming mode. With the latter, users can subscribe and get notifications on the desired output. Our aim is to develop and apply these techniques to use with scientific data. In particular we are interested in finding anomalous objects within the overwhelming set of images and catalogs produced by current and future astronomical surveys, but that can be easily adopted to other fields.

Original languageEnglish (US)
Title of host publicationProceedings of the 9th Workshop on Scientific Cloud Computing, ScienceCloud 2018 - Co-located with HPDC 2018
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450358637
DOIs
StatePublished - Jun 11 2018
Event9th Workshop on Scientific Cloud Computing, ScienceCloud 2018 - Tempe, United States
Duration: Jun 11 2018 → …

Publication series

NameProceedings of the 9th Workshop on Scientific Cloud Computing, ScienceCloud 2018 - Co-located with HPDC 2018

Other

Other9th Workshop on Scientific Cloud Computing, ScienceCloud 2018
Country/TerritoryUnited States
CityTempe
Period6/11/18 → …

Keywords

  • Anomaly Detection
  • Apache Spark
  • Cloud Computing
  • Isolation Forest
  • Kubernetes

ASJC Scopus subject areas

  • Computer Science Applications
  • Software
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Batch and online anomaly detection for scientific applications in a Kubernetes environment'. Together they form a unique fingerprint.

Cite this