This study uses real system measurements to investigate the relationships between loop granularity, parallel loop distribution and barrier wait times, and their impact on the multiprogramming performance of loop parallel applications on the CEDAR shared-memory multiprocessor The overhead due to multiprogramming varies from 5% for applications with large loop granularity to 140% for applications with very fine-gram loops. This is because applications with fine-gram loops have unequal parallel work distribution among the clusters m multiprogrammed environments, while the parallel work in applications with large loop granularity is equally distributed. Moreover, increased barrier wait times of the mam task and wait-for-work times of the helper tasks also contribute to the multi-programming performance degradation of the fine-grain loop parallel applications. We propose and implement a self-preemption technique to address the problem of met eased barrier wait times and wait-for-work times. Using this technique, the overhead due to multiprogramming is reduced by as much as 100%, and speedups of 1.1 to 1.7 are obtained.
|Original language||English (US)|
|Journal||Proceedings of the International Conference on Parallel Processing|
|State||Published - 1994|
|Event||23rd International Conference on Parallel Processing, ICPP 1994 - Raleigh, NC, United States|
Duration: Aug 15 1994 → Aug 19 1994
ASJC Scopus subject areas
- Hardware and Architecture