FLASH: Fast Bayesian optimization for data analytic pipelines

Yuyu Zhang, Mohammad Taha Bahadori, Hang Su, Jimeng Sun

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Modern data science relies on data analytic pipelines to organize interdependent computational steps. Such analytic pipelines often involve different algorithms across multiple steps, each with its own hyperparameters. To achieve the best performance, it is often critical to select optimal algorithms and to set appropriate hyperparameters, which requires large computational efforts. Bayesian optimization provides a principled way for searching optimal hyperparameters for a single algorithm. However, many challenges remain in solving pipeline optimization problems with high-dimensional and highly conditional search space. In this work, we propose Fast LineAr SearcH (FLASH), an effecient method for tuning analytic pipelines. FLASH is a two-layer Bayesian optimization framework, which firstly uses a parametric model to select promising algorithms, then computes a nonparametric model to finetune hyperparameters of the promising algorithms. FLASH also includes an effective caching algorithm which can further accelerate the search process. Extensive experiments on a number of benchmark datasets have demonstrated that FLASH significantly outperforms previous state-of-the-art methods in both search speed and accuracy. Using 50% of the time budget, FLASH achieves up to 20% improvement on test error rate compared to the baselines. FLASH also yields state-of-the-art performance on a real-world application for healthcare predictive modeling.

Original languageEnglish (US)
Title of host publicationKDD 2016 - Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages2065-2074
Number of pages10
ISBN (Electronic)9781450342322
DOIs
StatePublished - Aug 13 2016
Externally publishedYes
Event22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016 - San Francisco, United States
Duration: Aug 13 2016Aug 17 2016

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Volume13-17-August-2016

Other

Other22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016
CountryUnited States
CitySan Francisco
Period8/13/168/17/16

Keywords

  • Automated hyperparameter tuning
  • Bayesian optimization
  • Data analytic pipeline
  • Health analytics

ASJC Scopus subject areas

  • Software
  • Information Systems

Fingerprint Dive into the research topics of 'FLASH: Fast Bayesian optimization for data analytic pipelines'. Together they form a unique fingerprint.

Cite this