Improving strong-scaling of CNN training by exploiting finer-grained parallelism

Nikoli Dryden, Naoya Maruyama, Tom Benson, Tim Moon, Marc Snir, Brian Van Essen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Scaling CNN training is necessary to keep up with growing datasets and reduce training time. We also see an emerging need to handle datasets with very large samples, where memory requirements for training are large. Existing training frameworks use a data-parallel approach that partitions samples within a mini-batch, but limits to scaling the minibatch size and memory consumption makes this untenable for large samples. We describe and implement new approaches to convolution, which parallelize using spatial decomposition or a combination of sample and spatial decomposition. This introduces many performance knobs for a network, so we develop a performance model for CNNs and present a method for using it to automatically determine efficient parallelization strategies. We evaluate our algorithms with microbenchmarks and image classification with ResNet-50. Our algorithms allow us to prototype a model for a mesh-tangling dataset, where sample sizes are very large. We show that our parallelization achieves excellent strong and weak scaling and enables training for previously unreachable datasets.

Original languageEnglish (US)
Title of host publicationProceedings - 2019 IEEE 33rd International Parallel and Distributed Processing Symposium, IPDPS 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages210-220
Number of pages11
ISBN (Electronic)9781728112466
DOIs
StatePublished - May 2019
Event33rd IEEE International Parallel and Distributed Processing Symposium, IPDPS 2019 - Rio de Janeiro, Brazil
Duration: May 20 2019May 24 2019

Publication series

NameProceedings - 2019 IEEE 33rd International Parallel and Distributed Processing Symposium, IPDPS 2019

Conference

Conference33rd IEEE International Parallel and Distributed Processing Symposium, IPDPS 2019
CountryBrazil
CityRio de Janeiro
Period5/20/195/24/19

Keywords

  • Algorithms
  • Convolution
  • Deep learning
  • HPC
  • Performance modeling

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Hardware and Architecture
  • Information Systems and Management

Fingerprint Dive into the research topics of 'Improving strong-scaling of CNN training by exploiting finer-grained parallelism'. Together they form a unique fingerprint.

  • Cite this

    Dryden, N., Maruyama, N., Benson, T., Moon, T., Snir, M., & Van Essen, B. (2019). Improving strong-scaling of CNN training by exploiting finer-grained parallelism. In Proceedings - 2019 IEEE 33rd International Parallel and Distributed Processing Symposium, IPDPS 2019 (pp. 210-220). [8820780] (Proceedings - 2019 IEEE 33rd International Parallel and Distributed Processing Symposium, IPDPS 2019). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IPDPS.2019.00031