Phurti: Application and network-aware flow scheduling for multi-tenant MapReduce clusters

Chris X. Cai, Shayan Saeed, Indranil Gupta, R H Campbell, Franck Le

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Traffic for a typical MapReduce job in a data center consists of multiple network flows. Traditionally, network resources have been allocated to optimize network-level metrics such as flow completion time or throughput. Some recent schemes propose using application-aware scheduling which can shorten the average job completion time. However, most of them treat the core network as a black box with sufficient capacity. Even if only one network link in the core network becomes a bottleneck, it can hurt application performance. We design and implement a centralized flow-scheduling framework called Phurti with the goal of improving the completion time for jobs in a cluster shared among multiple Hadoop jobs (multi-tenant). Phurti communicates both with the Hadoop framework to retrieve job-level network traffic information and the OpenFlow-based switches to learn about the network topology. Phurti implements a novel heuristic called Smallest Maximum Sequential-traffic First (SMSF) that uses collected application and network information to perform traffic scheduling for MapReduce jobs. Our evaluation with real Hadoop workloads shows that compared to application and network-agnostic scheduling strategies, Phurti improves job completion time for 95% of the jobs, decreases average job completion time by 20%, tail job completion time by 13% and scales well with the cluster size and number of jobs.

Original languageEnglish (US)
Title of host publicationProceedings - 2016 IEEE International Conference on Cloud Engineering, IC2E 2016
Subtitle of host publicationCo-located with the 1st IEEE International Conference on Internet-of-Things Design and Implementation, IoTDI 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages161-170
Number of pages10
ISBN (Electronic)9781509019618
DOIs
StatePublished - Jun 1 2016
Event4th IEEE Annual International Conference on Cloud Engineering, IC2E 2016 - Berlin, Germany
Duration: Apr 4 2016Apr 8 2016

Publication series

NameProceedings - 2016 IEEE International Conference on Cloud Engineering, IC2E 2016: Co-located with the 1st IEEE International Conference on Internet-of-Things Design and Implementation, IoTDI 2016

Other

Other4th IEEE Annual International Conference on Cloud Engineering, IC2E 2016
CountryGermany
CityBerlin
Period4/4/164/8/16

Fingerprint

Scheduling
Switches
Throughput
Topology

Keywords

  • MapReduce
  • SDN
  • network
  • scheduling

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Computer Networks and Communications

Cite this

Cai, C. X., Saeed, S., Gupta, I., Campbell, R. H., & Le, F. (2016). Phurti: Application and network-aware flow scheduling for multi-tenant MapReduce clusters. In Proceedings - 2016 IEEE International Conference on Cloud Engineering, IC2E 2016: Co-located with the 1st IEEE International Conference on Internet-of-Things Design and Implementation, IoTDI 2016 (pp. 161-170). [7484180] (Proceedings - 2016 IEEE International Conference on Cloud Engineering, IC2E 2016: Co-located with the 1st IEEE International Conference on Internet-of-Things Design and Implementation, IoTDI 2016). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IC2E.2016.21

Phurti : Application and network-aware flow scheduling for multi-tenant MapReduce clusters. / Cai, Chris X.; Saeed, Shayan; Gupta, Indranil; Campbell, R H; Le, Franck.

Proceedings - 2016 IEEE International Conference on Cloud Engineering, IC2E 2016: Co-located with the 1st IEEE International Conference on Internet-of-Things Design and Implementation, IoTDI 2016. Institute of Electrical and Electronics Engineers Inc., 2016. p. 161-170 7484180 (Proceedings - 2016 IEEE International Conference on Cloud Engineering, IC2E 2016: Co-located with the 1st IEEE International Conference on Internet-of-Things Design and Implementation, IoTDI 2016).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Cai, CX, Saeed, S, Gupta, I, Campbell, RH & Le, F 2016, Phurti: Application and network-aware flow scheduling for multi-tenant MapReduce clusters. in Proceedings - 2016 IEEE International Conference on Cloud Engineering, IC2E 2016: Co-located with the 1st IEEE International Conference on Internet-of-Things Design and Implementation, IoTDI 2016., 7484180, Proceedings - 2016 IEEE International Conference on Cloud Engineering, IC2E 2016: Co-located with the 1st IEEE International Conference on Internet-of-Things Design and Implementation, IoTDI 2016, Institute of Electrical and Electronics Engineers Inc., pp. 161-170, 4th IEEE Annual International Conference on Cloud Engineering, IC2E 2016, Berlin, Germany, 4/4/16. https://doi.org/10.1109/IC2E.2016.21
Cai CX, Saeed S, Gupta I, Campbell RH, Le F. Phurti: Application and network-aware flow scheduling for multi-tenant MapReduce clusters. In Proceedings - 2016 IEEE International Conference on Cloud Engineering, IC2E 2016: Co-located with the 1st IEEE International Conference on Internet-of-Things Design and Implementation, IoTDI 2016. Institute of Electrical and Electronics Engineers Inc. 2016. p. 161-170. 7484180. (Proceedings - 2016 IEEE International Conference on Cloud Engineering, IC2E 2016: Co-located with the 1st IEEE International Conference on Internet-of-Things Design and Implementation, IoTDI 2016). https://doi.org/10.1109/IC2E.2016.21
Cai, Chris X. ; Saeed, Shayan ; Gupta, Indranil ; Campbell, R H ; Le, Franck. / Phurti : Application and network-aware flow scheduling for multi-tenant MapReduce clusters. Proceedings - 2016 IEEE International Conference on Cloud Engineering, IC2E 2016: Co-located with the 1st IEEE International Conference on Internet-of-Things Design and Implementation, IoTDI 2016. Institute of Electrical and Electronics Engineers Inc., 2016. pp. 161-170 (Proceedings - 2016 IEEE International Conference on Cloud Engineering, IC2E 2016: Co-located with the 1st IEEE International Conference on Internet-of-Things Design and Implementation, IoTDI 2016).
@inproceedings{3bccd01e00384f7a8721648cfbe1be90,
title = "Phurti: Application and network-aware flow scheduling for multi-tenant MapReduce clusters",
abstract = "Traffic for a typical MapReduce job in a data center consists of multiple network flows. Traditionally, network resources have been allocated to optimize network-level metrics such as flow completion time or throughput. Some recent schemes propose using application-aware scheduling which can shorten the average job completion time. However, most of them treat the core network as a black box with sufficient capacity. Even if only one network link in the core network becomes a bottleneck, it can hurt application performance. We design and implement a centralized flow-scheduling framework called Phurti with the goal of improving the completion time for jobs in a cluster shared among multiple Hadoop jobs (multi-tenant). Phurti communicates both with the Hadoop framework to retrieve job-level network traffic information and the OpenFlow-based switches to learn about the network topology. Phurti implements a novel heuristic called Smallest Maximum Sequential-traffic First (SMSF) that uses collected application and network information to perform traffic scheduling for MapReduce jobs. Our evaluation with real Hadoop workloads shows that compared to application and network-agnostic scheduling strategies, Phurti improves job completion time for 95{\%} of the jobs, decreases average job completion time by 20{\%}, tail job completion time by 13{\%} and scales well with the cluster size and number of jobs.",
keywords = "MapReduce, SDN, network, scheduling",
author = "Cai, {Chris X.} and Shayan Saeed and Indranil Gupta and Campbell, {R H} and Franck Le",
year = "2016",
month = "6",
day = "1",
doi = "10.1109/IC2E.2016.21",
language = "English (US)",
series = "Proceedings - 2016 IEEE International Conference on Cloud Engineering, IC2E 2016: Co-located with the 1st IEEE International Conference on Internet-of-Things Design and Implementation, IoTDI 2016",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "161--170",
booktitle = "Proceedings - 2016 IEEE International Conference on Cloud Engineering, IC2E 2016",
address = "United States",

}

TY - GEN

T1 - Phurti

T2 - Application and network-aware flow scheduling for multi-tenant MapReduce clusters

AU - Cai, Chris X.

AU - Saeed, Shayan

AU - Gupta, Indranil

AU - Campbell, R H

AU - Le, Franck

PY - 2016/6/1

Y1 - 2016/6/1

N2 - Traffic for a typical MapReduce job in a data center consists of multiple network flows. Traditionally, network resources have been allocated to optimize network-level metrics such as flow completion time or throughput. Some recent schemes propose using application-aware scheduling which can shorten the average job completion time. However, most of them treat the core network as a black box with sufficient capacity. Even if only one network link in the core network becomes a bottleneck, it can hurt application performance. We design and implement a centralized flow-scheduling framework called Phurti with the goal of improving the completion time for jobs in a cluster shared among multiple Hadoop jobs (multi-tenant). Phurti communicates both with the Hadoop framework to retrieve job-level network traffic information and the OpenFlow-based switches to learn about the network topology. Phurti implements a novel heuristic called Smallest Maximum Sequential-traffic First (SMSF) that uses collected application and network information to perform traffic scheduling for MapReduce jobs. Our evaluation with real Hadoop workloads shows that compared to application and network-agnostic scheduling strategies, Phurti improves job completion time for 95% of the jobs, decreases average job completion time by 20%, tail job completion time by 13% and scales well with the cluster size and number of jobs.

AB - Traffic for a typical MapReduce job in a data center consists of multiple network flows. Traditionally, network resources have been allocated to optimize network-level metrics such as flow completion time or throughput. Some recent schemes propose using application-aware scheduling which can shorten the average job completion time. However, most of them treat the core network as a black box with sufficient capacity. Even if only one network link in the core network becomes a bottleneck, it can hurt application performance. We design and implement a centralized flow-scheduling framework called Phurti with the goal of improving the completion time for jobs in a cluster shared among multiple Hadoop jobs (multi-tenant). Phurti communicates both with the Hadoop framework to retrieve job-level network traffic information and the OpenFlow-based switches to learn about the network topology. Phurti implements a novel heuristic called Smallest Maximum Sequential-traffic First (SMSF) that uses collected application and network information to perform traffic scheduling for MapReduce jobs. Our evaluation with real Hadoop workloads shows that compared to application and network-agnostic scheduling strategies, Phurti improves job completion time for 95% of the jobs, decreases average job completion time by 20%, tail job completion time by 13% and scales well with the cluster size and number of jobs.

KW - MapReduce

KW - SDN

KW - network

KW - scheduling

UR - http://www.scopus.com/inward/record.url?scp=84978131803&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84978131803&partnerID=8YFLogxK

U2 - 10.1109/IC2E.2016.21

DO - 10.1109/IC2E.2016.21

M3 - Conference contribution

AN - SCOPUS:84978131803

T3 - Proceedings - 2016 IEEE International Conference on Cloud Engineering, IC2E 2016: Co-located with the 1st IEEE International Conference on Internet-of-Things Design and Implementation, IoTDI 2016

SP - 161

EP - 170

BT - Proceedings - 2016 IEEE International Conference on Cloud Engineering, IC2E 2016

PB - Institute of Electrical and Electronics Engineers Inc.

ER -