Baechi: Fast device placement of machine learning graphs

Beomyeol Jeon, Linda Cai, Pallavi Srivastava, Jintao Jiang, Xiaolan Ke, Yitao Meng, Cong Xie, Indranil Gupta

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Machine Learning graphs (or models) can be challenging or impossible to train when either devices have limited memory, or the models are large. Splitting the model graph across multiple devices, today, largely relies on learning-based approaches to generate this placement. While it results in models that train fast on data (i.e., with low step times), learning-based model-parallelism is time-consuming, taking many hours or days to create a placement plan of operators on devices. We present the Baechi system, where we adopt an algorithmic approach to the placement problem for running machine learning training graphs on a small cluster of memory-constrained devices. We implemented Baechi so that it works modularly with TensorFlow. Our experimental results using GPUs show that Baechi generates placement plans in time 654X - 206K X faster than today's learning-based approaches, and the placed model's step time is only up to 6.2% higher than expert-based placements.

Original languageEnglish (US)
Title of host publicationSoCC 2020 - Proceedings of the 2020 ACM Symposium on Cloud Computing
PublisherAssociation for Computing Machinery
Number of pages15
ISBN (Electronic)9781450381376
StatePublished - Oct 12 2020
Event11th ACM Symposium on Cloud Computing, SoCC 2020 - Virtual, Online, United States
Duration: Oct 19 2020Oct 21 2020

Publication series

NameSoCC 2020 - Proceedings of the 2020 ACM Symposium on Cloud Computing


Conference11th ACM Symposium on Cloud Computing, SoCC 2020
Country/TerritoryUnited States
CityVirtual, Online


  • TensorFlow
  • constrained memory
  • distributed systems
  • machine learning systems
  • placement algorithms

ASJC Scopus subject areas

  • Information Systems
  • Software
  • Artificial Intelligence

Cite this