TY - GEN
T1 - Efficient disk-to-disk sorting
T2 - International Workshop on Data-Intensive Scalable Computing Systems, DISCS 2015
AU - Eslami, Hassan
AU - Kougkas, Anthony
AU - Kotsifakou, Maria
AU - Kasampalis, Theodoros
AU - Feng, Kun
AU - Lu, Yin
AU - Gropp, William
AU - Sun, Xian He
AU - Chen, Yong
AU - Thakur, Rajeev
N1 - Publisher Copyright:
Copyright © 2015 ACM.
PY - 2015/11/15
Y1 - 2015/11/15
N2 - Many applications foreseen for exascale era should process huge amount of data. However, the IO infrastructure of current supercomputing architecture cannot be generalized to deal with this amount of data due to the need for excessive data movement from storage layers to compute nodes leading to limited scalability. There has been extensive studies addressing this challenge. Decoupled Execution Paradigm (DEP) is an attractive solution due to its unique features such as available fast storage devices close to computational units and available programmable units close to file system. In this paper we study the effectiveness of DEP for a well-known data-intensive kernel, disk-to-disk (aka out-of-core) sorting. We propose an optimized algorithm that uses almost all features of DEP pushing the performance of sorting in HPC even further compared to other existing solutions. Advantages in our algorithm are gained by exploiting programming units close to parallel file system to achieve higher IO throughput, compressing data before sending it over network or to disk, storing intermediate results of computation close to compute nodes, and fully overlapping IO with computation. We also provide an analytical model for our proposed algorithm. Our algorithm achieves 30% better performance compared to the theoretically optimal sorting algorithm running on the same testbed but not designed to exploit the DEP architecture.
AB - Many applications foreseen for exascale era should process huge amount of data. However, the IO infrastructure of current supercomputing architecture cannot be generalized to deal with this amount of data due to the need for excessive data movement from storage layers to compute nodes leading to limited scalability. There has been extensive studies addressing this challenge. Decoupled Execution Paradigm (DEP) is an attractive solution due to its unique features such as available fast storage devices close to computational units and available programmable units close to file system. In this paper we study the effectiveness of DEP for a well-known data-intensive kernel, disk-to-disk (aka out-of-core) sorting. We propose an optimized algorithm that uses almost all features of DEP pushing the performance of sorting in HPC even further compared to other existing solutions. Advantages in our algorithm are gained by exploiting programming units close to parallel file system to achieve higher IO throughput, compressing data before sending it over network or to disk, storing intermediate results of computation close to compute nodes, and fully overlapping IO with computation. We also provide an analytical model for our proposed algorithm. Our algorithm achieves 30% better performance compared to the theoretically optimal sorting algorithm running on the same testbed but not designed to exploit the DEP architecture.
KW - Decoupled execution paradigm
KW - Disk-to-disk sorting
KW - Parallel IO
KW - Parallel file system
KW - Performance optimization
UR - http://www.scopus.com/inward/record.url?scp=85009091548&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85009091548&partnerID=8YFLogxK
U2 - 10.1145/2831244.2831249
DO - 10.1145/2831244.2831249
M3 - Conference contribution
AN - SCOPUS:85009091548
T3 - Proceedings of DISCS 2015: The 2015 International Workshop on Data-Intensive Scalable Computing Systems - Held in conjunction with SC 2015: The International Conference for High Performance Computing, Networking, Storage and Analysis
BT - Proceedings of DISCS 2015
PB - Association for Computing Machinery
Y2 - 15 November 2015
ER -