TY - GEN
T1 - AME
T2 - 6th Workshop on Workflows in Support of Large-Scale Science, WORKS'11, Co-located with SC'11
AU - Zhang, Zhao
AU - Katz, Daniel S.
AU - Ripeanu, Matei
AU - Wilde, Michael
AU - Foster, Ian
PY - 2011
Y1 - 2011
N2 - Many-Task Computing (MTC) is a new application category that encompasses increasingly popular applications in biology, economics, and statistics. The high inter-task parallelism and data-intensive processing capabilities of these applications pose new challenges to existing supercomputer hardware-software stacks. These challenges include resource provisioning; task dispatching, dependency resolution, and load balancing; data management; and resilience. This paper examines the characteristics of MTC applications which create these challenges, and identifies related gaps in the middleware that supports these applications on extreme-scale systems. Based on this analysis, we propose AME, an Anyscale MTC Engine, which addresses the scalability aspects of these gaps. We describe the AME framework and present performance results for both synthetic benchmarks and real applications. Our results show that AME's dispatching performance linearly scales up to 14,120 tasks/second on 16,384 cores with high efficiency. The overhead of the intermediate data management scheme does not increase significantly up to 16,384 cores. AME eliminates 73% of the file transfer between compute nodes and the global filesystem for the Montage astronomy application running on 2,048 cores. Our results indicate that AME scales well on today's petascale machines, and is a strong candidate for exascale machines.
AB - Many-Task Computing (MTC) is a new application category that encompasses increasingly popular applications in biology, economics, and statistics. The high inter-task parallelism and data-intensive processing capabilities of these applications pose new challenges to existing supercomputer hardware-software stacks. These challenges include resource provisioning; task dispatching, dependency resolution, and load balancing; data management; and resilience. This paper examines the characteristics of MTC applications which create these challenges, and identifies related gaps in the middleware that supports these applications on extreme-scale systems. Based on this analysis, we propose AME, an Anyscale MTC Engine, which addresses the scalability aspects of these gaps. We describe the AME framework and present performance results for both synthetic benchmarks and real applications. Our results show that AME's dispatching performance linearly scales up to 14,120 tasks/second on 16,384 cores with high efficiency. The overhead of the intermediate data management scheme does not increase significantly up to 16,384 cores. AME eliminates 73% of the file transfer between compute nodes and the global filesystem for the Montage astronomy application running on 2,048 cores. Our results indicate that AME scales well on today's petascale machines, and is a strong candidate for exascale machines.
KW - Data management
KW - Load balancing
KW - Many-Task Computing
KW - Scheduling
KW - Supercomputer systems
UR - http://www.scopus.com/inward/record.url?scp=84863265497&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84863265497&partnerID=8YFLogxK
U2 - 10.1145/2110497.2110513
DO - 10.1145/2110497.2110513
M3 - Conference contribution
AN - SCOPUS:84863265497
SN - 9781450311007
T3 - WORKS'11 - Proceedings of the 6th Workshop on Workflows in Support of Large-Scale Science, Co-located with SC'11
SP - 137
EP - 146
BT - WORKS'11 - Proceedings of the 6th Workshop on Workflows in Support of Large-Scale Science, Co-located with SC'11
Y2 - 14 November 2011 through 14 November 2011
ER -