OSC: An Online Self-Configuring Big Data Framework for Optimization of QoS

Zhendong Bei, Nam Sung Kim, Kai Hwang, Zhibin Yu

Research output: Contribution to journalArticlepeer-review


Big-data frameworks such as MapReduce/Hadoop or Spark have many performance-critical configuration parameters which may interact with each other in a complex way. Their optimal values for an application on a given cluster are affected by not only the application itself but also its input data. This makes offline auto-configuration approaches hard to be used in practice because the input data of an application may change at each run. To address this issue, we propose an Online Self-Configuring (OSC) approach that automatically determines the optimal parameter values for a given application. OSC synergistically integrates three key techniques. First, OSC leverages ensemble learning to build a precise performance model for a given application. Second, it quantifies the importance of the parameters and interaction intensity between them to accelerate the genetic algorithm for searching optimal configuration parameters. Third, OSC supports an incremental modeling approach to achieve low overhead of the models for online needs. These techniques allow OSC to effectively learn the characteristics of an application and optimize its performance by automatically adjusting the configurations at runtime. Our implementation of OSC atop MapReduce/Hadoop 2.6 improves performance by 60 percent on average and up to 120 percent compared with the state-of-the-art approach. Lastly, the performance benefit of an application running on OSC generally increases along with its input data size.

Original languageEnglish (US)
Pages (from-to)809-823
Number of pages15
JournalIEEE Transactions on Computers
Issue number4
StatePublished - Apr 1 2022


  • Big data systems
  • MapReduce/hadoop
  • ensemble learning
  • genetic algorithm
  • online self-configuring

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture
  • Computational Theory and Mathematics


Dive into the research topics of 'OSC: An Online Self-Configuring Big Data Framework for Optimization of QoS'. Together they form a unique fingerprint.

Cite this