Wen-Mei W Hwu

1984 …2019
If you made any changes in Pure, your changes will be visible here soon.

Research Output 1984 2019

2012

Efficient performance evaluation of memory hierarchy for highly multithreaded graphics processors

Baghsorkhi, S. S., Gelado, I., Delahaye, M. & Hwu, W. M. W., Aug 1 2012, In : ACM SIGPLAN Notices. 47, 8, p. 23-33 11 p.

Research output: Contribution to journalArticle

Data storage equipment
Sampling
Monitoring
Hardware
Graphics processing unit

Floating-point considerations

Kirk, D. B. & Hwu, W-M. W., Jan 1 2012, Programming Massively Parallel Processors: A Hands-on Approach, Second Edition. Elsevier Science, p. 151-171 21 p.

Research output: Chapter in Book/Report/Conference proceedingChapter

GPU Computing Gems Jade Edition

Hwu, W-M. W., Jan 1 2012, Elsevier Inc.

Research output: Book/ReportBook

Gems
Finance
Graphics processing unit
Environmental engineering
Computer systems programming

High-speed interferometric synthetic aperture microscopy on a graphics processing unit

Ahmad, A., Shemonski, N., Adie, S. G., Kim, H., Hwu, W. M. W., Carney, P. S. & Boppart, S. A., 2012, Frontiers in Optics, FIO 2012.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

synthetic apertures
high speed
microscopy
tomography
imaging techniques

History of GPU computing

Kirk, D. B. & Hwu, W-M. W., Jan 1 2012, Programming Massively Parallel Processors: A Hands-on Approach, Second Edition. Elsevier Science, p. 23-39 17 p.

Research output: Chapter in Book/Report/Conference proceedingChapter

Graphics processing unit

Implementing a GPU programming model on a non-GPU accelerator architecture

Kofsky, S. M., Johnson, D. R., Stratton, J. A., Hwu, W. M. W., Patel, S. J. & Lumetta, S. S., Mar 8 2012, Computer Architecture - ISCA 2010 International Workshops, A4MMC, AMAS-BT, EAMA, WEED, WIOSCA, Revised Selected Papers. p. 40-51 12 p. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); vol. 6161 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Accelerator
Programming Model
Particle accelerators
Parallel architectures
Degradation

Interferometric synthetic aperture microscopy with computational adaptive optics for high-resolution tomography of scattering tissue

Adie, S. G., Ahmad, A., Shemonski, N., Graf, B. W., Kim, H., Hwu, W. M. W., Carney, P. S. & Boppart, S. A., 2012, Biomedical Optics, BIOMED 2012.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

synthetic apertures
adaptive optics
Microscopy
tomography
Tomography

Introduction

Hwu, W. M. W., Dec 1 2012, GPU Computing Gems Jade Edition. Elsevier Inc., p. xv-xvi

Research output: Chapter in Book/Report/Conference proceedingForeword/postscript

Introduction

Kirk, D. B. & Hwu, W-M. W., Jan 1 2012, Programming Massively Parallel Processors: A Hands-on Approach, Second Edition. Elsevier Science, p. 1-21 21 p.

Research output: Chapter in Book/Report/Conference proceedingChapter

Introduction to data parallelism and CUDA C

Kirk, D. B. & Hwu, W-M. W., Jan 1 2012, Programming Massively Parallel Processors: A Hands-on Approach, Second Edition. Elsevier Science, p. 41-62 22 p.

Research output: Chapter in Book/Report/Conference proceedingChapter

Optimization and architecture effects on GPU computing workload performance

Stratton, J. A., Anssari, N., Rodrigues, C., Sung, I. J., Obeid, N., Chang, L., Liu, G. D. & Hwu, W-M. W., Dec 12 2012, 2012 Innovative Parallel Computing, InPar 2012. 6339605. (2012 Innovative Parallel Computing, InPar 2012).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Hardware
Bandwidth
Dynamic random access storage
Coarsening
Throughput

Parallel patterns: Sparse matrix-vector multiplication: An introduction to compaction and regularization in parallel algorithms

Kirk, D. B. & Hwu, W-M. W., Jan 1 2012, Programming Massively Parallel Processors: A Hands-on Approach, Second Edition. Elsevier Science, p. 217-234 18 p.

Research output: Chapter in Book/Report/Conference proceedingChapter

Parallel algorithms
Compaction

Parallel patterns: Prefix sum: An introduction to work efficiency in parallel algorithms

Kirk, D. B. & Hwu, W-M. W., Jan 1 2012, Programming Massively Parallel Processors: A Hands-on Approach, Second Edition. Elsevier Science, p. 197-216 20 p.

Research output: Chapter in Book/Report/Conference proceedingChapter

Parallel algorithms

Parallel patterns: Convolution: With an introduction to constant memory and caches

Kirk, D. B. & Hwu, W-M. W., Jan 1 2012, Programming Massively Parallel Processors: A Hands-on Approach, Second Edition. Elsevier Science, p. 173-196 24 p.

Research output: Chapter in Book/Report/Conference proceedingChapter

Convolution
Data storage equipment

Parallel programming and computational thinking

Kirk, D. B. & Hwu, W-M. W., Jan 1 2012, Programming Massively Parallel Processors: A Hands-on Approach, Second Edition. Elsevier Science, p. 281-295 15 p.

Research output: Chapter in Book/Report/Conference proceedingChapter

Parallel programming

Performance analysis and tuning for general purpose graphics processing units (GPGPU)

Kim, H., Vuduc, R., Baghsorkhi, S., Hwu, W-M. W. & Jee Choi, C., Nov 21 2012, Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU). p. 1-94 94 p. (Synthesis Lectures on Computer Architecture; vol. 20).

Research output: Chapter in Book/Report/Conference proceedingChapter

Tuning
Data storage equipment
Hardware
Cache memory
Memory architecture

Performance considerations

Kirk, D. B. & Hwu, W-M. W., Jan 1 2012, Programming Massively Parallel Processors: A Hands-on Approach, Second Edition. Elsevier Science, p. 123-149 27 p.

Research output: Chapter in Book/Report/Conference proceedingChapter

Preface

Kirk, D. B. & Hwu, W-M. W., Jan 1 2012, Programming Massively Parallel Processors: A Hands-on Approach, Second Edition. Elsevier Science, p. xiii-xviii

Research output: Chapter in Book/Report/Conference proceedingForeword/postscript

TIGER: tiled iterative genome assembler.

Wu, X. L., Heo, Y., El Hajj, I., Hwu, W. M., Chen, D. & Ma, J., 2012, In : Unknown Journal. 13 Suppl 19

Research output: Contribution to journalArticle

Tigers
Genome
Genes
Data storage equipment
Sequencing
2013

ClMPI: An opencl extension for interoperation with the message passing interface

Takizawa, H., Sugawara, M., Hirasawa, S., Gelado, I., Kobayashi, H. & Hwu, W. M. W., Jan 1 2013, Proceedings - IEEE 27th International Parallel and Distributed Processing Symposium Workshops and PhD Forum, IPDPSW 2013. IEEE Computer Society, p. 1138-1148 11 p. 6651000. (Proceedings - IEEE 27th International Parallel and Distributed Processing Symposium Workshops and PhD Forum, IPDPSW 2013).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Message Passing Interface
Message passing
Data transfer
Data Transfer
Program processors

Comparison based sorting for systems with multiple GPUs

Tanasic, I., Vilanova, L., Jordà, M., Cabezas, J., Gelado, I., Navarro, N. & Hwu, W-M. W., Apr 15 2013, Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, GPGPU 2013. p. 1-11 11 p. (ACM International Conference Proceeding Series).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Sorting
Graphics processing unit

Efficient compilation of CUDA kernels for high-performance computing on FPGAs

Papakonstantinou, A., Gururaj, K., Stratton, J. A., Chen, D., Cong, J. & Hwu, W-M. W., Oct 21 2013, In : Transactions on Embedded Computing Systems. 13, 2, 25.

Research output: Contribution to journalArticle

Field programmable gate arrays (FPGA)
Particle accelerators
Imaging techniques
Processing

More IMPATIENT: A gridding-accelerated Toeplitz-based strategy for non-Cartesian high-resolution 3D MRI on GPUs

Gai, J., Obeid, N., Holtrop, J. L., Wu, X. L., Lam, F., Fu, M., Haldar, J. P., Hwu, W. M. W., Liang, Z. P. & Sutton, B. P., May 2013, In : Journal of Parallel and Distributed Computing. 73, 5, p. 686-697 12 p.

Research output: Contribution to journalArticle

Otto Toeplitz
Image Reconstruction
Magnetic resonance imaging
High Resolution
Image reconstruction

Programming massively parallel processors: A hands-on approach, second edition

Kirk, D. B. & Hwu, W-M. W., Jan 1 2013, Elsevier Science. 496 p.

Research output: Book/ReportBook

Parallel programming
Program processors
Parallel processing systems
Magnetic resonance imaging
Sales

Rapid computation of sodium bioscales using gpu-accelerated image reconstruction

Atkinson, I. C., Liu, G., Obeid, N., Thulborn, K. R. & Hwu, W. M., Mar 1 2013, In : International Journal of Imaging Systems and Technology. 23, 1, p. 29-35 7 p.

Research output: Contribution to journalArticle

Image reconstruction
Sodium
Tissue
Imaging techniques
Program processors

Real-time in vivo computed optical interferometric tomography

Ahmad, A., Shemonski, N. D., Adie, S. G., Kim, H. S., Hwu, W. M. W., Carney, P. S. & Boppart, S. A., Jun 1 2013, In : Nature Photonics. 7, 6, p. 444-448 5 p.

Research output: Contribution to journalArticle

Optical tomography
Tomography
tomography
high resolution
Tissue

Scalable SIMD-parallel memory allocation for many-core machines

Huang, X., Rodrigues, C. I., Jones, S., Buck, I. & Hwu, W-M. W., Jun 1 2013, In : Journal of Supercomputing. 64, 3, p. 1008-1020 13 p.

Research output: Contribution to journalArticle

Storage allocation (computer)
Many-core
Throughput
Data storage equipment
Computer systems programming

Throughput-oriented kernel porting onto FPGAs

Papakonstantinou, A., Chen, D., Hwu, W-M. W., Cong, J. & Yun, L., Jul 12 2013, Proceedings of the 50th Annual Design Automation Conference, DAC 2013. 11. (Proceedings - Design Automation Conference).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Field Programmable Gate Array
Field programmable gate arrays (FPGA)
Throughput
kernel
Coloring
2014

Adaptive cache bypass and insertion for many-core accelerators

Chen, X., Wu, S., Chang, L. W., Huang, W. S., Pearson, C., Wang, Z. & Hwu, W. M. W., Jan 1 2014, 2nd ACM International Workshop on Many-Core Embedded Systems, MES 2014 - In Conjunction with the 41st International Symposium on Computer Architecture, ISCA 2014. Association for Computing Machinery, p. 1-8 8 p. (ACM International Conference Proceeding Series).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Particle accelerators
Data storage equipment
Energy efficiency
Graphics processing unit

A guide for implementing tridiagonal solvers on GPUs

Chang, L. W. & Hwu, W-M. W., Jan 1 2014, Numerical Computations with GPUs. Springer International Publishing, p. 29-44 16 p.

Research output: Chapter in Book/Report/Conference proceedingChapter

Graphics processing unit

Automatic execution of single-GPU computations across multiple GPUs

Cabezas, J., Vilanova, L., Gelado, I., Jablin, T. B., Navarro, N. & Hwu, W-M. W., Jan 1 2014, PACT 2014 - Proceedings of the 23rd International Conference on Parallel Architectures and Compilation Techniques. Institute of Electrical and Electronics Engineers Inc., p. 467-468 2 p. (Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

kernel
Decompose
Runtime Systems
Data Distribution
Interconnect

BLESS: Bloom filter-based error correction solution for high-throughput sequencing reads

Heo, Y., Wu, X. L., Chen, D., Ma, J. & Hwu, W. M., May 15 2014, In : Bioinformatics. 30, 10, p. 1354-1362 9 p.

Research output: Contribution to journalArticle

Bloom Filter
Error correction
Error Correction
Sequencing
High Throughput

In-place transposition of rectangular matrices on accelerators

Sung, I. J., Gómez-Luna, J., González-Linares, J. M., Guil, N. & Hwu, W-M. W., Mar 10 2014, PPoPP 2014 - Proceedings of the 2014 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. p. 207-218 12 p. (Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Particle accelerators
Program processors
Throughput
Data storage equipment
Data transfer

In-place transposition of rectangular matrices on accelerators

Sung, I. J., Gómez-Luna, J., González-Linares, J. M., Guil, N. & Hwu, W-M. W., Aug 2014, In : ACM SIGPLAN Notices. 49, 8, p. 207-218 12 p.

Research output: Contribution to journalArticle

Particle accelerators
Program processors
Throughput
Data storage equipment
Data transfer

Triolet: A programming system that unifies algorithmic skeleton interfaces for high-performance cluster computing

Rodrigues, C., Jablin, T., Dakkak, A. & Hwu, W-M. W., Mar 10 2014, PPoPP 2014 - Proceedings of the 2014 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. p. 247-258 12 p. (Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Cluster computing
Computer systems programming
Data storage equipment
Parallel programming
Electric fuses

Triolet: A programming system that unifies algorithmic skeleton interfaces for high-performance cluster computing

Rodrigues, C., Jablin, T., Dakkak, A. & Hwu, W. M., Aug 2014, In : ACM SIGPLAN Notices. 49, 8, p. 247-258 12 p.

Research output: Contribution to journalArticle

Cluster computing
Computer systems programming
Data storage equipment
Parallel programming
Electric fuses

What is ahead for parallel computing

Hwu, W-M. W., Jul 2014, In : Journal of Parallel and Distributed Computing. 74, 7, p. 2574-2581 8 p.

Research output: Contribution to journalArticle

Parallel processing systems
Parallel Computing
Parallel algorithms
Parallel Algorithms
Many-core
2015

Adaptive Cache Management for Energy-Efficient GPU Computing

Chen, X., Chang, L. W., Rodrigues, C. I., Lv, J., Wang, Z. & Hwu, W-M. W., Jan 15 2015, Proceedings - 47th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2014. January ed. IEEE Computer Society, p. 343-355 13 p. 7011400. (Proceedings of the Annual International Symposium on Microarchitecture, MICRO; vol. 2015-January, no. January).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Data storage equipment
Energy efficiency
Throughput
Graphics processing unit

Automatic parallelization of kernels in shared-memory multi-GPU nodes

Cabezas, J., Vilanova, L., Gelado, I., Jablin, T. B., Navarro, N. & Hwu, W-M. W., Jun 8 2015, ICS 2015 - Proceedings of the 29th ACM International Conference on Supercomputing. Association for Computing Machinery, p. 3-13 11 p. (Proceedings of the International Conference on Supercomputing; vol. 2015-June).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Data storage equipment
Graphics processing unit
Scheduling
Costs

Compiler Technology

Chung, W. H. J., Lyu, Y. H., Sung, I. J. R., Lee, Y. W. & Hwu, W-M. W., Dec 4 2015, Heterogeneous System Architecture: A New Compute Platform Infrastructure. Elsevier Inc., p. 97-129 33 p.

Research output: Chapter in Book/Report/Conference proceedingChapter

Data storage equipment
Parallel programming
Computer programming languages
Program processors
Code generation

Enhancing the Usability and Utilization of Accelerated Architectures via Docker

Haydel, N., Gesing, S., Taylor, I., Madey, G., Dakkak, A., De Gonzalo, S. G. & Hwu, W-M. W., Jan 1 2015, Proceedings - 2015 IEEE/ACM 8th International Conference on Utility and Cloud Computing, UCC 2015. Rana, O., Buyya, R. & Raicu, I. (eds.). Institute of Electrical and Electronics Engineers Inc., p. 361-367 7 p. 7431432. (Proceedings - 2015 IEEE/ACM 8th International Conference on Utility and Cloud Computing, UCC 2015).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Containers
Application programming interfaces (API)
Particle accelerators
Interfaces (computer)
Program processors

FPGA accelerated DNA error correction

Ramachandran, A., Heo, Y., Hwu, W-M. W., Ma, J. & Chen, D., Apr 22 2015, Proceedings of the 2015 Design, Automation and Test in Europe Conference and Exhibition, DATE 2015. Institute of Electrical and Electronics Engineers Inc., Vol. 2015-April. p. 1371-1376 6 p. 7092605

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Error correction
Field programmable gate arrays (FPGA)
DNA
Genes
Throughput

GPU-SM: Shared memory multi-GPU programming

Cabezas, J., Jordà, M., Gelado, I., Navarro, N. & Hwu, W-M. W., Feb 7 2015, ACM International Conference Proceeding Series. Gong, X. (ed.). Association for Computing Machinery, p. 13-24 12 p. (ACM International Conference Proceeding Series; vol. 2015-February).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Computer programming
Data storage equipment
Graphics processing unit
Computer systems
Data structures
Specifications
Program processors
Sanders
Hardware
Data storage equipment

In-place data sliding algorithms for many-core architectures

Luna, J. G., Chang, L. W., Sung, I. J., Hwu, W-M. W. & Guil, N., Dec 8 2015, Proceedings - 2015 44th International Annual Conference on Parallel Processing, ICPP 2015. Institute of Electrical and Electronics Engineers Inc., p. 210-219 10 p. 7349576. (Proceedings of the International Conference on Parallel Processing; vol. 2015-December).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Many-core
Data storage equipment
Irregular
Benchmark
Algebra

Introduction

Hwu, W-M. W., Dec 4 2015, Heterogeneous System Architecture: A New Compute Platform Infrastructure. Elsevier Inc., p. 1-5 5 p.

Research output: Chapter in Book/Report/Conference proceedingChapter

Specifications

Locality-centric thread scheduling for bulk-synchronous programming models on CPU architectures

Kim, H. S., Hajj, I. E., Stratton, J., Lumetta, S. S. & Hwu, W-M. W., Mar 3 2015, Proceedings of the 2015 IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2015. Institute of Electrical and Electronics Engineers Inc., p. 257-268 12 p. 7054205. (Proceedings of the 2015 IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2015).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Computer programming
Locality
Thread
Programming Model
Program processors

Mapping high-level programming languages to OpenCL 2.0: A compiler writer's perspective

Sung, I. J., Chung, W. H., Lee, Y. W. & Hwu, W-M. W., May 18 2015, Heterogeneous Computing with OpenCL 2.0: Third Edition. Elsevier Inc., p. 249-272 24 p.

Research output: Chapter in Book/Report/Conference proceedingChapter

High level languages
Application programming interfaces (API)
Computer programming languages
Object oriented programming
Data transfer

Optimized Data Transfers Based on the OpenCL Event Management Mechanism

Takizawa, H., Hirasawa, S., Sugawara, M., Gelado, I., Kobayashi, H. & Hwu, W. M. W., Jan 1 2015, In : Scientific Programming. 2015, 576498.

Research output: Contribution to journalArticle

Data transfer
Communication

Runtime and Architecture Support for Efficient Data Exchange in Multi-Accelerator Applications

Cabezas, J., Gelado, I., Stone, J. E., Navarro, N., Kirk, D. B. & Hwu, W-M. W., May 1 2015, In : IEEE Transactions on Parallel and Distributed Systems. 26, 5, p. 1405-1418 14 p., 6803940.

Research output: Contribution to journalArticle

Electronic data interchange
Particle accelerators
Hardware
Parallel programming
Maintainability