TY - GEN
T1 - Scal-tool
T2 - 1999 ACM/IEEE Conference on Supercomputing, SC 1999
AU - Solihin, Yan
AU - Lam, Vinh
AU - Torrellas, Josep
N1 - Funding Information:
1This work was supported in part by the National Science Foundation under grants NCSA-PACI ACI 96-19019COOP, NSF Young Investigator Award MIP-9457436, ASC-9612099, and MIP-9619351, gifts from Intel and IBM, and NCSA machine time under grant AST910367N.
Funding Information:
This work was supported in part by the National Science Foundation under grants NCSA-PACI ACI 96-19019COOP, NSF Young Investigator Award MIP-9457436, ASC-9612099, and MIP-9619351, gifts from Intel and IBM, and NCSA machine time under grant AST910367N. We thank Yong Luo, Harvey Wasserman, and their colleagues for their feedback. We also thank the referees and the graduate students from the I-ACOMA group. Finally, we thank Dave Nicol for his help with the paper.
Publisher Copyright:
© 1999 IEEE.
PY - 1999
Y1 - 1999
N2 - Distributed Shared-Memory (DSM) multiprocessors provide an attractive combination of cost-effective commodity architecture and, thanks to the shared-memory abstraction, relative ease of programming. Unfortunately, it is well known that tuning applications for scalable performance in these machines is time-consuming. To address this problem, programmers use performance monitoring tools. However, these tools are often costly to run, especially if highly-processed information is desired. In addition, they usually cannot be used to experiment with hypothetical architecture organizations. In this paper, we present Scal-Tool, a tool that isolates and quantifies scalability bottlenecks in parallel applications running on DSM machines. The scalability bottlenecks currently quantified include insufficient caching space, load imbalance, and synchronization. The tool is based on an empirical model that uses as inputs measurements from hardware event counters in the processor. A major advantage of the tool is that it is quite inexpensive to run: it only needs the event counter values for the application running with a few different processor counts and data set sizes. In addition, it provides ways to analyze variations of several machine parameters.
AB - Distributed Shared-Memory (DSM) multiprocessors provide an attractive combination of cost-effective commodity architecture and, thanks to the shared-memory abstraction, relative ease of programming. Unfortunately, it is well known that tuning applications for scalable performance in these machines is time-consuming. To address this problem, programmers use performance monitoring tools. However, these tools are often costly to run, especially if highly-processed information is desired. In addition, they usually cannot be used to experiment with hypothetical architecture organizations. In this paper, we present Scal-Tool, a tool that isolates and quantifies scalability bottlenecks in parallel applications running on DSM machines. The scalability bottlenecks currently quantified include insufficient caching space, load imbalance, and synchronization. The tool is based on an empirical model that uses as inputs measurements from hardware event counters in the processor. A major advantage of the tool is that it is quite inexpensive to run: it only needs the event counter values for the application running with a few different processor counts and data set sizes. In addition, it provides ways to analyze variations of several machine parameters.
UR - http://www.scopus.com/inward/record.url?scp=84963769243&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84963769243&partnerID=8YFLogxK
U2 - 10.1109/SC.1999.10066
DO - 10.1109/SC.1999.10066
M3 - Conference contribution
AN - SCOPUS:84963769243
T3 - ACM/IEEE SC 1999 Conference, SC 1999
SP - 17
BT - ACM/IEEE SC 1999 Conference, SC 1999
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 13 November 1999 through 19 November 1999
ER -