Automatic Communication Optimization of Parallel Applications in Public Clouds

Emmanuell D. Carreno, Matthias Diener, Eduardo H.M. Cruz, Philippe O.A. Navaux

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

One of the most important aspects that influences the performance of parallel applications is the speed of communication between their tasks. To optimize communication, tasks that exchange lots of data should be mapped to processing units that have a high network performance. This technique is called communication-aware task mapping and requires detailed information about the underlying network topology for an accurate mapping. Previous work on task mapping focuses on network clusters or shared memory architectures, in which the topology can be determined directly from the hardware environment. Cloud computing adds significant challenges to task mapping, since information about network topologies is not available to end users. Furthermore, the communication performance might change due to external factors, such as different usage patterns of other users. In this paper, we present a novel solution to perform communication-aware task mapping in the context of commercial cloud environments with multiple instances. Our proposal consists of a short profiling phase to discover the network topology and speed between cloud instances. The profiling can be executed before each application start as it causes only a negligible overhead. This information is then used together with the communication pattern of the parallel application to group tasks based on the amount of communication and to map groups with a lot of communication between them to cloud instances with a high network performance. In this way, application performance is increased, and data traffic between instances is reduced. We evaluated our proposal in a public cloud with a variety of MPI-based parallel benchmarks from the HPC domain, as well as a large scientific application. In the experiments, we observed substantial performance improvements (up to 11 times faster) compared to the default scheduling policies.

Original languageEnglish (US)
Title of host publicationProceedings - 2016 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1-10
Number of pages10
ISBN (Electronic)9781509024520
DOIs
StatePublished - Jul 18 2016
Externally publishedYes
Event16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2016 - Cartagena, Colombia
Duration: May 16 2016May 19 2016

Publication series

NameProceedings - 2016 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2016

Conference

Conference16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2016
Country/TerritoryColombia
CityCartagena
Period5/16/165/19/16

Keywords

  • Task mapping
  • communication
  • network performance
  • public clouds
  • scheduling

ASJC Scopus subject areas

  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Automatic Communication Optimization of Parallel Applications in Public Clouds'. Together they form a unique fingerprint.

Cite this