Multiple petaflops-lass machines will appear during the coming year, and many multipetaflops machines are on the anvil. It will be a substantial challenge to make existing parallel CSE applications run efficiently on them, and even more challenging to design new applications that can effectively leverage the large computational power of these machines. Multicore chips and SMP nodes are becoming popular and pose challenges of their own. Further, a new set of challenges in productivity arise, especially if we wish to have a broader set of applications and people to use these machines. Reviewed here is a set of techniques that have proved useful in multiple parallel applications that have scaled to tens of thousands of processors, on machines such as the Blue Gene/L, Blue Gene/P, Cray XT3, and XT4. New challenges and potential solutions for the performance issues are identified. Issues presented by multicore chips and SMP nodes also rre addressed. Also reviewed are some new and old ideas for increasing productivity in parallel programming substantially.
ASJC Scopus subject areas
- Physics and Astronomy(all)