TY - GEN
T1 - What you should know about NAMD and charm++ but were hoping to ignore
AU - Phillips, James C.
N1 - Funding Information:
This research is part of the Blue Waters sustained-petascale computing project, which is supported by the National Science Foundation (awards OCI-0725070 and ACI-1238993) and the state of Illinois. Blue Waters is a joint effort of the University of Illinois at Urbana-Champaign and its National Center for Supercomputing Applications. This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357, through the Theta Early Science Program. This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number ACI-1548562, through allocation TG-MCB080133. This paper draws on the author's previous work in NAMD development at the Illinois NIH Center for Macromolecular Modeling and Bioinformatics, supported by the National Institutes of Health grant P41-GM104601, where NAMD development continues in close collaboration with Charm++ development at the Illinois Parallel Programming Laboratory, led by Laxmikant V. Kale.
Publisher Copyright:
© 2018 Association for Computing Machinery.
PY - 2018/7/22
Y1 - 2018/7/22
N2 - The biomolecular simulation program NAMD is used heavily at many HPC centers. Supporting NAMD users requires knowledge of the Charm++ parallel runtime system on which NAMD is built. Introduced in 1993, Charm++ supports message-driven, task-based, and other programming models and has demonstrated its portability across generations of architectures, interconnects, and operating systems. While Charm++ can use MPI as a portable communication layer, specialized high-performance layers are preferred for Cray, IBM, and InfiniBand networks and a new OFI layer supports Omni-Path. NAMD binaries using some specialized layers can be launched directly with mpiexec or its equivalent, or mpiexec can be called by the charmrun program to leverage system job-launch mechanisms. Charm++ supports multi-threaded parallelism within each process, with a single thread dedicated to communication and the rest for computation. The optimal balance between thread and process parallelism depends on the size of the simulation, features used, memory limitations, nodes count, and the core count and NUMA structure of each node. It is also important to enable the Charm++ built-in CPU affinity settings to bind worker and communication threads appropriately to processor cores. Appropriate execution configuration and CPU affinity settings are particularly non-intuitive on Intel KNL processors due to their high core counts and flat NUMA hierarchy. Rules and heuristics for default settings can provide good default performance in most cases and dramatically reduce the search space when optimizing for a specific simulation on particular machine. Upcoming Charm++ and NAMD releases will simplify and automate launch configuration and affinity settings.
AB - The biomolecular simulation program NAMD is used heavily at many HPC centers. Supporting NAMD users requires knowledge of the Charm++ parallel runtime system on which NAMD is built. Introduced in 1993, Charm++ supports message-driven, task-based, and other programming models and has demonstrated its portability across generations of architectures, interconnects, and operating systems. While Charm++ can use MPI as a portable communication layer, specialized high-performance layers are preferred for Cray, IBM, and InfiniBand networks and a new OFI layer supports Omni-Path. NAMD binaries using some specialized layers can be launched directly with mpiexec or its equivalent, or mpiexec can be called by the charmrun program to leverage system job-launch mechanisms. Charm++ supports multi-threaded parallelism within each process, with a single thread dedicated to communication and the rest for computation. The optimal balance between thread and process parallelism depends on the size of the simulation, features used, memory limitations, nodes count, and the core count and NUMA structure of each node. It is also important to enable the Charm++ built-in CPU affinity settings to bind worker and communication threads appropriately to processor cores. Appropriate execution configuration and CPU affinity settings are particularly non-intuitive on Intel KNL processors due to their high core counts and flat NUMA hierarchy. Rules and heuristics for default settings can provide good default performance in most cases and dramatically reduce the search space when optimizing for a specific simulation on particular machine. Upcoming Charm++ and NAMD releases will simplify and automate launch configuration and affinity settings.
KW - Charm++
KW - High-performance computing
KW - Molecular dynamics
KW - NAMD
KW - Scientific software tuning
KW - Structural biology
KW - User support
UR - http://www.scopus.com/inward/record.url?scp=85051420120&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85051420120&partnerID=8YFLogxK
U2 - 10.1145/3219104.3219134
DO - 10.1145/3219104.3219134
M3 - Conference contribution
AN - SCOPUS:85051420120
SN - 9781450364461
T3 - ACM International Conference Proceeding Series
BT - Practice and Experience in Advanced Research Computing 2018
PB - Association for Computing Machinery (ACM)
T2 - 2018 Practice and Experience in Advanced Research Computing Conference: Seamless Creativity, PEARC 2018
Y2 - 22 July 2017 through 26 July 2017
ER -