TY - JOUR
T1 - HPC-colony
T2 - Services and interfaces for very large systems
AU - Chakravorty, Sayantan
AU - Mendes, Celso L.
AU - Kalé, Laxmikant V.
AU - Jones, Terry
AU - Tauferner, Andrew
AU - Inglett, Todd
AU - Moreira, José
PY - 2006/4
Y1 - 2006/4
N2 - Traditional full-featured operating systems are known to have properties that limit the scalability of distributed memory parallel programs, the most common programming paradigm utilized in high end computing. Furthermore, as processor counts increase with the most capable systems, the necessary activity to manage the system becomes more of a burden. To make a general purpose operating system scale to such levels, new technology is required for parallel resource management and global system management (including fault management). In this paper, we describe the shortcomings of full-featured operating systems and runtime systems and discuss an approach to scale such systems to one hundred thousand processors with both scalable parallel application performance and efficient system management.
AB - Traditional full-featured operating systems are known to have properties that limit the scalability of distributed memory parallel programs, the most common programming paradigm utilized in high end computing. Furthermore, as processor counts increase with the most capable systems, the necessary activity to manage the system becomes more of a burden. To make a general purpose operating system scale to such levels, new technology is required for parallel resource management and global system management (including fault management). In this paper, we describe the shortcomings of full-featured operating systems and runtime systems and discuss an approach to scale such systems to one hundred thousand processors with both scalable parallel application performance and efficient system management.
UR - http://www.scopus.com/inward/record.url?scp=33646423852&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33646423852&partnerID=8YFLogxK
U2 - 10.1145/1131322.1131334
DO - 10.1145/1131322.1131334
M3 - Article
AN - SCOPUS:33646423852
SN - 0163-5980
VL - 40
SP - 43
EP - 49
JO - Operating Systems Review (ACM)
JF - Operating Systems Review (ACM)
IS - 2
ER -