OSC: An Online Self-Configuring Big Data Framework for Optimization of QoS (TC-2020-02-0128.R1)

Zhendong Bei, Nam Sung Kim, Kai Hwang, Zhibin Yu

Research output: Contribution to journalArticlepeer-review


Big-data frameworks such as MapReduce/Hadoop or Spark have many performance-critical configuration parameters which may interact with each other in a complex way. Their optimal values for an application on a given cluster are affected by not only the application itself but also its input data. This makes offline auto-configuration approaches hard to be used in practice because the input data of an application may change at each run. To address this issue, we propose an Online Self-Configuring (OSC) approach that automatically determines the optimal parameter values for a given application. OSC synergistically integrates three key techniques. First, OSC leverages ensemble learning to build a precise performance model for a given application. Second, it quantifies the parameter importance and interaction intensity between them to accelerate the genetic algorithm for searching optimal configuration parameters. Third, OSC supports an incremental modeling approach to achieve low overhead of the models for online needs. Our implementation of OSC atop MapReduce/Hadoop 2.6 improves performance by 60% on average and up to 120% compared with the state-of-the-art approach. Lastly, the performance benefit of an application running on OSC generally increases along with its input data size.

Original languageEnglish (US)
JournalIEEE Transactions on Computers
StateAccepted/In press - 2021


  • Analytical models
  • Big Data
  • Big Data Systems
  • Cluster computing
  • Data models
  • Genetic algorithms
  • MapReduce/Hadoop
  • Online Self-Configuring
  • Sparks
  • Tuning
  • ensemble learning
  • genetic algorithm

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture
  • Computational Theory and Mathematics


Dive into the research topics of 'OSC: An Online Self-Configuring Big Data Framework for Optimization of QoS (TC-2020-02-0128.R1)'. Together they form a unique fingerprint.

Cite this