Reining in the outliers in map-reduce clusters using mantri

Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha, Edward Harris

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Experience from an operational Map-Reduce cluster reveals that outliers significantly prolong job completion. The causes for outliers include run-time contention for processor, memory and other resources, disk failures, varying bandwidth and congestion along network paths and, imbalance in task workload. We present Mantri, a system that monitors tasks and culls outliers using cause- and resource-aware techniques. Mantri's strategies include restarting outliers, network-aware placement of tasks and protecting outputs of valuable tasks. Using real-time progress reports, Mantri detects and acts on outliers early in their lifetime. Early action frees up resources that can be used by subsequent tasks and expedites the job overall. Acting based on the causes and the resource and opportunity cost of actions lets Mantri improve over prior work that only duplicates the laggards. Deployment in Bing's production clusters and trace-driven simulations show that Mantri improves job completion times by 32%.

Original languageEnglish (US)
Title of host publicationProceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2010
PublisherUSENIX Association
Pages265-278
Number of pages14
ISBN (Electronic)9781931971799
StatePublished - 2019
Externally publishedYes
Event9th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2010 - Vancouver, Canada
Duration: Oct 4 2010Oct 6 2010

Publication series

NameProceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2010

Conference

Conference9th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2010
Country/TerritoryCanada
CityVancouver
Period10/4/1010/6/10

ASJC Scopus subject areas

  • Information Systems
  • Computer Networks and Communications
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Reining in the outliers in map-reduce clusters using mantri'. Together they form a unique fingerprint.

Cite this