TY - GEN
T1 - A Deep Learning Dataloader with Shared Data Preparation
AU - Xie, Jian
AU - Xu, Jingwei
AU - Wang, Guochang
AU - Yao, Yuan
AU - Li, Zenan
AU - Cao, Chun
AU - Tong, Hanghang
N1 - Publisher Copyright:
© 2022 Neural information processing systems foundation. All rights reserved.
PY - 2022
Y1 - 2022
N2 - Parallelly executing multiple training jobs on overlapped datasets is a common practice in developing deep learning models. By default, each of the parallel jobs prepares (i.e., loads and preprocesses) the data independently, causing redundant consumption of I/O and CPU. Although a centralized cache component can reduce the redundancies by reusing the data preparation work, each job's random data shuffling results in a low sampling locality causing heavy cache thrashing. Prior work tries to improve the sampling locality by enforcing all the training jobs loading the same dataset in the same order and pace. However, such a solution is only efficient under strong constraints: all jobs are trained on the same dataset with the same starting moment and training speed. In this paper, we propose a new data loading method for efficiently training parallel DNNs with much flexible constraints. Our method is still highly efficient when different training jobs use different but overlapped datasets and have different starting moments and training speeds. To achieve this, we propose a dependent sampling algorithm (DSA) and a domain-specific cache policy. Moreover, a novel tree data structure is designed to efficiently implement DSA. Based on the proposed techniques, we implemented a prototype, named JOADER, which can share data preparation work as long as the datasets are overlapped for different training jobs. We evaluate the proposed JOADER, showing a greater versatility and superiority of training speed improvement (up to 200% on ResNet18) without affecting the accuracy.
AB - Parallelly executing multiple training jobs on overlapped datasets is a common practice in developing deep learning models. By default, each of the parallel jobs prepares (i.e., loads and preprocesses) the data independently, causing redundant consumption of I/O and CPU. Although a centralized cache component can reduce the redundancies by reusing the data preparation work, each job's random data shuffling results in a low sampling locality causing heavy cache thrashing. Prior work tries to improve the sampling locality by enforcing all the training jobs loading the same dataset in the same order and pace. However, such a solution is only efficient under strong constraints: all jobs are trained on the same dataset with the same starting moment and training speed. In this paper, we propose a new data loading method for efficiently training parallel DNNs with much flexible constraints. Our method is still highly efficient when different training jobs use different but overlapped datasets and have different starting moments and training speeds. To achieve this, we propose a dependent sampling algorithm (DSA) and a domain-specific cache policy. Moreover, a novel tree data structure is designed to efficiently implement DSA. Based on the proposed techniques, we implemented a prototype, named JOADER, which can share data preparation work as long as the datasets are overlapped for different training jobs. We evaluate the proposed JOADER, showing a greater versatility and superiority of training speed improvement (up to 200% on ResNet18) without affecting the accuracy.
UR - http://www.scopus.com/inward/record.url?scp=85153876936&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85153876936&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85153876936
T3 - Advances in Neural Information Processing Systems
BT - Advances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022
A2 - Koyejo, S.
A2 - Mohamed, S.
A2 - Agarwal, A.
A2 - Belgrave, D.
A2 - Cho, K.
A2 - Oh, A.
PB - Neural information processing systems foundation
T2 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022
Y2 - 28 November 2022 through 9 December 2022
ER -