Sliding window calculations on streaming data using the Kepler scientific workflow system

Sven Köhler, Supriya Gulati, Gongjing Cao, Quinn Hart, Bertram Ludäscher

Research output: Contribution to journalConference articlepeer-review

Abstract

In many areas of science unbounded (potentially infinite) data streams need to be processed in a continuous manner, e.g., to compute running aggregates or sliding window aggregates. One important example is the computation of Growing Degree Days (GDD) from a stream of temperature data, which provides a heuristic tool to predict plant development and the maturity of crops. The process of data acquisition, processing, storage, and presentation forms a scientific workflow and scientific workflow systems have been developed to automate their execution. The whole workflow is decomposed into its individual steps, represented by actors, which in turn are connected by channels that describe the flow of data. This workflow representation allows to reuse existing components for different workflows, and, in principle, easy modification of existing workflows. In current streaming workflow designs in Kepler, data belonging to a particular time window is typically identified by counting data tokens on channels between actors. For example, this token-counting approach does not work for windows of variable length nor for overlapping windows. In this paper, we address these limitations and present a new actor design with two incoming streams: a time-stamp ordered data stream, and a stream of aggregation windows, ordered by their start time. We present a new Chunker actor that "stream-joins" the data from one stream with the windows presented on the second stream, where windows represent aggregation intervals of variable length and possibly overlapping time. Windows containing the corresponding data are output as soon as they are completed, i.e. once timestamps in the data stream pass the end time of a window. We illustrate the approach with an improved GDD workflow based on our new Chunker actor.

Original languageEnglish (US)
Pages (from-to)1639-1646
Number of pages8
JournalProcedia Computer Science
Volume9
DOIs
StatePublished - 2012
Externally publishedYes
Event12th Annual International Conference on Computational Science, ICCS 2012 - Omaha, NB, United States
Duration: Jun 4 2012Jun 6 2012

Keywords

  • Continuous queries
  • Data streaming
  • Scientific workflow

ASJC Scopus subject areas

  • Computer Science(all)

Fingerprint Dive into the research topics of 'Sliding window calculations on streaming data using the Kepler scientific workflow system'. Together they form a unique fingerprint.

Cite this