TY - GEN
T1 - Improving the start-up time of python applications on large scale HPC systems
AU - Maclean, Colin A.
AU - Leong, Hon Wai
AU - Enos, Jeremy
N1 - Publisher Copyright:
© 2017 Association for Computing Machinery.
PY - 2017/11/12
Y1 - 2017/11/12
N2 - Interpreted programming languages (e.g. Perl [15], Python [13], R [12]) are gaining popularity in modern scientific computation. The syntaxes are easy to read and learn, are flexible, and are portable for execution on different HPC systems without the need to recompile. This is achieved by executing platform independent byte code instructions in a virtual machine, in contrast to platform dependent machine language instructions run directly on the hardware. Debugging in such a language is easier, as the program runs through one statement at a time and stops whenever there is an error, prompting the location of the error almost immediately. On the other hand, a programming language that uses a compiler takes extra time for compilation and often requires additional linking to library dependencies. It is more difficult to debug, as an error is only generated after compilation. As such, writing and debugging in interpreter language is convenient as programmers can change the code quickly and test it without the need to recompile. Due to the nature of interpreted languages, the source files and dependencies typically exist as many different files, as is the case with Python, rather than being linked together from source then object files into a smaller number of executables and libraries. This property of interpreted languages demands a large number of input/ output operations per second (IOPs) for fast start-up times. When many nodes of a HPC system launch an interpreted program, this behavior places significant stress on the metadata server of parallel file systems, leading to poor start-up performance and impacts the file system performance for all users.
AB - Interpreted programming languages (e.g. Perl [15], Python [13], R [12]) are gaining popularity in modern scientific computation. The syntaxes are easy to read and learn, are flexible, and are portable for execution on different HPC systems without the need to recompile. This is achieved by executing platform independent byte code instructions in a virtual machine, in contrast to platform dependent machine language instructions run directly on the hardware. Debugging in such a language is easier, as the program runs through one statement at a time and stops whenever there is an error, prompting the location of the error almost immediately. On the other hand, a programming language that uses a compiler takes extra time for compilation and often requires additional linking to library dependencies. It is more difficult to debug, as an error is only generated after compilation. As such, writing and debugging in interpreter language is convenient as programmers can change the code quickly and test it without the need to recompile. Due to the nature of interpreted languages, the source files and dependencies typically exist as many different files, as is the case with Python, rather than being linked together from source then object files into a smaller number of executables and libraries. This property of interpreted languages demands a large number of input/ output operations per second (IOPs) for fast start-up times. When many nodes of a HPC system launch an interpreted program, this behavior places significant stress on the metadata server of parallel file systems, leading to poor start-up performance and impacts the file system performance for all users.
UR - http://www.scopus.com/inward/record.url?scp=85040004558&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85040004558&partnerID=8YFLogxK
U2 - 10.1145/3155105.3155107
DO - 10.1145/3155105.3155107
M3 - Conference contribution
AN - SCOPUS:85040004558
T3 - Proceedings of HPCSYSPROS 2017: HPC Systems Professionals Workshop, Held in conjunction with SC 2017: The International Conference for High Performance Computing, Networking, Storage and Analysis
BT - Proceedings of HPCSYSPROS 2017
PB - Association for Computing Machinery
T2 - HPC Systems Professionals Workshop, HPCSYSPROS 2017
Y2 - 12 November 2017 through 17 November 2017
ER -