Automatic discovery of coarse-grained parallelism in media applications

Shane Ryoo, Sain Zee Ueng, Christopher I. Rodrigues, Robert E. Kidd, Matthew I. Frank, Wen Mei W. Hwu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

With the increasing use of multi-core microprocessors and hardware accelerators in embedded media processing systems, there is an increasing need to discover coarse-grained parallelism in media applications written in C and C++. Common versions of these codes use a pointer-heavy, sequential programming model to implement algorithms with high levels of inherent parallelism. The lack of automated tools capable of discovering this parallelism has hampered the productivity of parallel programmers and application-specific hardware designers, as well as inhibited the development of automatic parallelizing compilers. Automatic discovery is challenging due to shifts in the prevalent programming languages, scalability problems of analysis techniques, and the lack of experimental research in combining the numerous analyses necessary to achieve a clear view of the relations among memory accesses in complex programs. This paper is based on a coherent prototype system designed to automatically find multiple levels of coarse-grained parallelism. It visits several of the key analyses that are necessary to discover parallelism in contemporary media applications, distinguishing those that perform satisfactorily at this time from those that do not yet have practical, scalable solutions. We show that, contrary to common belief, a compiler with a strong, synergistic portfolio of modern analysis capabilities can automatically discover a very substantial amount of coarse-grained parallelism in complex media applications such as an MPEG-4 encoder. These results suggest that an automatic coarse-grained parallelism discovery tool can be built to greatly enhance the software and hardware development processes of future embedded media processing systems.

Original languageEnglish (US)
Title of host publicationTransactions on High-Performance Embedded Architectures and Compilers I
Pages194-213
Number of pages20
DOIs
StatePublished - Dec 1 2007

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4050 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Fingerprint

Parallelism
Hardware
Processing
Computer programming languages
Particle accelerators
Microprocessor chips
Scalability
Parallelizing Compilers
Hardware Accelerator
Productivity
MPEG-4
Necessary
Data storage equipment
Microprocessor
Encoder
C++
Compiler
Development Process
Programming Model
Programming Languages

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Ryoo, S., Ueng, S. Z., Rodrigues, C. I., Kidd, R. E., Frank, M. I., & Hwu, W. M. W. (2007). Automatic discovery of coarse-grained parallelism in media applications. In Transactions on High-Performance Embedded Architectures and Compilers I (pp. 194-213). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4050 LNCS). https://doi.org/10.1007/978-3-540-71528-3_13

Automatic discovery of coarse-grained parallelism in media applications. / Ryoo, Shane; Ueng, Sain Zee; Rodrigues, Christopher I.; Kidd, Robert E.; Frank, Matthew I.; Hwu, Wen Mei W.

Transactions on High-Performance Embedded Architectures and Compilers I. 2007. p. 194-213 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4050 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Ryoo, S, Ueng, SZ, Rodrigues, CI, Kidd, RE, Frank, MI & Hwu, WMW 2007, Automatic discovery of coarse-grained parallelism in media applications. in Transactions on High-Performance Embedded Architectures and Compilers I. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 4050 LNCS, pp. 194-213. https://doi.org/10.1007/978-3-540-71528-3_13
Ryoo S, Ueng SZ, Rodrigues CI, Kidd RE, Frank MI, Hwu WMW. Automatic discovery of coarse-grained parallelism in media applications. In Transactions on High-Performance Embedded Architectures and Compilers I. 2007. p. 194-213. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-540-71528-3_13
Ryoo, Shane ; Ueng, Sain Zee ; Rodrigues, Christopher I. ; Kidd, Robert E. ; Frank, Matthew I. ; Hwu, Wen Mei W. / Automatic discovery of coarse-grained parallelism in media applications. Transactions on High-Performance Embedded Architectures and Compilers I. 2007. pp. 194-213 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{bb21b945a0254d6cb9901ccfa1a0672b,
title = "Automatic discovery of coarse-grained parallelism in media applications",
abstract = "With the increasing use of multi-core microprocessors and hardware accelerators in embedded media processing systems, there is an increasing need to discover coarse-grained parallelism in media applications written in C and C++. Common versions of these codes use a pointer-heavy, sequential programming model to implement algorithms with high levels of inherent parallelism. The lack of automated tools capable of discovering this parallelism has hampered the productivity of parallel programmers and application-specific hardware designers, as well as inhibited the development of automatic parallelizing compilers. Automatic discovery is challenging due to shifts in the prevalent programming languages, scalability problems of analysis techniques, and the lack of experimental research in combining the numerous analyses necessary to achieve a clear view of the relations among memory accesses in complex programs. This paper is based on a coherent prototype system designed to automatically find multiple levels of coarse-grained parallelism. It visits several of the key analyses that are necessary to discover parallelism in contemporary media applications, distinguishing those that perform satisfactorily at this time from those that do not yet have practical, scalable solutions. We show that, contrary to common belief, a compiler with a strong, synergistic portfolio of modern analysis capabilities can automatically discover a very substantial amount of coarse-grained parallelism in complex media applications such as an MPEG-4 encoder. These results suggest that an automatic coarse-grained parallelism discovery tool can be built to greatly enhance the software and hardware development processes of future embedded media processing systems.",
author = "Shane Ryoo and Ueng, {Sain Zee} and Rodrigues, {Christopher I.} and Kidd, {Robert E.} and Frank, {Matthew I.} and Hwu, {Wen Mei W.}",
year = "2007",
month = "12",
day = "1",
doi = "10.1007/978-3-540-71528-3_13",
language = "English (US)",
isbn = "3540715274",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "194--213",
booktitle = "Transactions on High-Performance Embedded Architectures and Compilers I",

}

TY - GEN

T1 - Automatic discovery of coarse-grained parallelism in media applications

AU - Ryoo, Shane

AU - Ueng, Sain Zee

AU - Rodrigues, Christopher I.

AU - Kidd, Robert E.

AU - Frank, Matthew I.

AU - Hwu, Wen Mei W.

PY - 2007/12/1

Y1 - 2007/12/1

N2 - With the increasing use of multi-core microprocessors and hardware accelerators in embedded media processing systems, there is an increasing need to discover coarse-grained parallelism in media applications written in C and C++. Common versions of these codes use a pointer-heavy, sequential programming model to implement algorithms with high levels of inherent parallelism. The lack of automated tools capable of discovering this parallelism has hampered the productivity of parallel programmers and application-specific hardware designers, as well as inhibited the development of automatic parallelizing compilers. Automatic discovery is challenging due to shifts in the prevalent programming languages, scalability problems of analysis techniques, and the lack of experimental research in combining the numerous analyses necessary to achieve a clear view of the relations among memory accesses in complex programs. This paper is based on a coherent prototype system designed to automatically find multiple levels of coarse-grained parallelism. It visits several of the key analyses that are necessary to discover parallelism in contemporary media applications, distinguishing those that perform satisfactorily at this time from those that do not yet have practical, scalable solutions. We show that, contrary to common belief, a compiler with a strong, synergistic portfolio of modern analysis capabilities can automatically discover a very substantial amount of coarse-grained parallelism in complex media applications such as an MPEG-4 encoder. These results suggest that an automatic coarse-grained parallelism discovery tool can be built to greatly enhance the software and hardware development processes of future embedded media processing systems.

AB - With the increasing use of multi-core microprocessors and hardware accelerators in embedded media processing systems, there is an increasing need to discover coarse-grained parallelism in media applications written in C and C++. Common versions of these codes use a pointer-heavy, sequential programming model to implement algorithms with high levels of inherent parallelism. The lack of automated tools capable of discovering this parallelism has hampered the productivity of parallel programmers and application-specific hardware designers, as well as inhibited the development of automatic parallelizing compilers. Automatic discovery is challenging due to shifts in the prevalent programming languages, scalability problems of analysis techniques, and the lack of experimental research in combining the numerous analyses necessary to achieve a clear view of the relations among memory accesses in complex programs. This paper is based on a coherent prototype system designed to automatically find multiple levels of coarse-grained parallelism. It visits several of the key analyses that are necessary to discover parallelism in contemporary media applications, distinguishing those that perform satisfactorily at this time from those that do not yet have practical, scalable solutions. We show that, contrary to common belief, a compiler with a strong, synergistic portfolio of modern analysis capabilities can automatically discover a very substantial amount of coarse-grained parallelism in complex media applications such as an MPEG-4 encoder. These results suggest that an automatic coarse-grained parallelism discovery tool can be built to greatly enhance the software and hardware development processes of future embedded media processing systems.

UR - http://www.scopus.com/inward/record.url?scp=34547311216&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34547311216&partnerID=8YFLogxK

U2 - 10.1007/978-3-540-71528-3_13

DO - 10.1007/978-3-540-71528-3_13

M3 - Conference contribution

AN - SCOPUS:34547311216

SN - 3540715274

SN - 9783540715276

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 194

EP - 213

BT - Transactions on High-Performance Embedded Architectures and Compilers I

ER -