TY - GEN
T1 - Tradeoffs in designing accelerator architectures for visual computing
AU - Mahesri, Aqeel
AU - Johnson, Daniel
AU - Crago, Neal
AU - Patel, Sanjay J.
PY - 2008
Y1 - 2008
N2 - Visualization, interaction, and simulation (VIS) constitute a class of applications that is growing in importance. This class includes applications such as graphics rendering, video encoding, simulation, and computer vision. These applications are ideally suited for accelerators because of their parallelizability and demand for high throughput. We compile a benchmark suite, VIS-Bench, to serve as a proxy for this application class. We use VISBench to examine some important high level decisions for an accelerator architecture. We propose a highly parallel base architecture. We examine the need for synchronization and data communication. We also examine GPU-style SIMD execution and find that a MIMD architecture usually performs better. Given these high level choices, we use VISBench to explore the microarchitectural design space. We analyze area versus performance tradeoffs in designing individual cores and the memory hierarchy. We find that a design made of small, simple cores achieves much higher throughput than a general purpose uniprocessor. Further, we find that a limited amount of support for ILP within each core aids overall performance. We find that fine-grained multithreading improves performance, but only up to a point. We find that word-level (SSE-style) SIMD provides a poor performance to area ratio. Finally, we find that sufficient memory and cache bandwidth is essential to performance.
AB - Visualization, interaction, and simulation (VIS) constitute a class of applications that is growing in importance. This class includes applications such as graphics rendering, video encoding, simulation, and computer vision. These applications are ideally suited for accelerators because of their parallelizability and demand for high throughput. We compile a benchmark suite, VIS-Bench, to serve as a proxy for this application class. We use VISBench to examine some important high level decisions for an accelerator architecture. We propose a highly parallel base architecture. We examine the need for synchronization and data communication. We also examine GPU-style SIMD execution and find that a MIMD architecture usually performs better. Given these high level choices, we use VISBench to explore the microarchitectural design space. We analyze area versus performance tradeoffs in designing individual cores and the memory hierarchy. We find that a design made of small, simple cores achieves much higher throughput than a general purpose uniprocessor. Further, we find that a limited amount of support for ILP within each core aids overall performance. We find that fine-grained multithreading improves performance, but only up to a point. We find that word-level (SSE-style) SIMD provides a poor performance to area ratio. Finally, we find that sufficient memory and cache bandwidth is essential to performance.
UR - http://www.scopus.com/inward/record.url?scp=66749170578&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=66749170578&partnerID=8YFLogxK
U2 - 10.1109/MICRO.2008.4771788
DO - 10.1109/MICRO.2008.4771788
M3 - Conference contribution
AN - SCOPUS:66749170578
SN - 9781424428366
T3 - Proceedings of the Annual International Symposium on Microarchitecture, MICRO
SP - 164
EP - 175
BT - 2008 Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-41
T2 - 2008 - 41st Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-41
Y2 - 8 November 2008 through 12 November 2008
ER -