Wen-Mei W Hwu

If you made any changes in Pure these will be visible here soon.

Research Output

Filter
Conference contribution
2010

Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs

Stratton, J. A., Grover, V., Marathe, J., Aarts, B., Murphy, M., Hu, Z. & Hwu, W-M. W., Jul 1 2010, Proceedings of the 2010 CGO - The 8th International Symposium on Code Generation and Optimization. p. 111-119 9 p.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Exploiting more parallelism from applications having generalized reductions on GPU architectures

Wu, X. L., Obeid, N. & Hwu, W-M. W., Nov 19 2010, Proceedings - 10th IEEE International Conference on Computer and Information Technology, CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, ScalCom-2010. p. 1175-1180 6 p. 5577899. (Proceedings - 10th IEEE International Conference on Computer and Information Technology, CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, ScalCom-2010).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Sparse regularization in MRI iterative reconstruction using GPUs

Zhuo, Y., Sutton, B., Wu, X. L., Haldar, J., Hwu, W. M. & Liang, Z. P., Dec 1 2010, Proceedings - 2010 3rd International Conference on Biomedical Engineering and Informatics, BMEI 2010. p. 578-582 5 p. 5640008. (Proceedings - 2010 3rd International Conference on Biomedical Engineering and Informatics, BMEI 2010; vol. 2).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

XMalloc: A scalable lock-free dynamic memory allocator for many-core machines

Huang, X., Rodrigues, C. I., Jones, S., Buck, I. & Hwu, W. M., Nov 19 2010, Proceedings - 10th IEEE International Conference on Computer and Information Technology, CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, ScalCom-2010. p. 1134-1139 6 p. 5577907. (Proceedings - 10th IEEE International Conference on Computer and Information Technology, CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, ScalCom-2010).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2009

Accelerating mr image reconstruction on GPUs

Hwu, W. M. W., Nandakumar, D., Haldar, J., Atkinson, I. C., Sutton, B., Liang, Z. P. & Thulborn, K. R., Nov 17 2009, Proceedings - 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, ISBI 2009. p. 1283-1286 4 p. 5193297. (Proceedings - 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, ISBI 2009).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

FCUDA: Enabling efficient compilation of CUDA kernels onto FPGAs

Papakonstantinou, A., Gururaj, K., Stratton, J. A., Chen, D., Cong, J. & Hwu, W. M. W., Nov 11 2009, 2009 IEEE 7th Symposium on Application Specific Processors, SASP 2009. p. 35-42 8 p. 5226333. (2009 IEEE 7th Symposium on Application Specific Processors, SASP 2009).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

GPU clusters for high-performance computing

Kindratenko, V., Enos, J. J., Shi, G., Showerman, M. T., Arnold, G. W., Stone, J. E., Phillips, J. C. & Hwu, W-M. W., Dec 21 2009, 2009 IEEE International Conference on Cluster Computing and Workshops, CLUSTER '09. 5289128. (Proceedings - IEEE International Conference on Cluster Computing, ICCC).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

High performance computation and display of molecular orbitals on and multi-core cpus

Stone, J. E., Saam, J., Hardy, D. J., Vandivort, K. L., Hwu, W-M. W. & Schulten, K. J., Jul 23 2009, Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, GPGPU-2. 1 p. (Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, GPGPU-2).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

High-performance CUDA kernel execution on FPGAs

Papakonstantinou, A., Gururaj, K., Stratton, J. A., Chen, D., Cong, J. & Hwu, W-M. W., Nov 24 2009, ICS'09 - Proceedings of the 23rd International Conference on Supercomputing. p. 515-516 2 p. 1542357. (Proceedings of the International Conference on Supercomputing).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Long time-scale simulations of in vivo diffusion using GPU hardware

Roberts, E., Stone, J. E., Sepúlveda, L., Hwu, W-M. W. & Luthey-Schulten, Z. A., Nov 25 2009, IPDPS 2009 - Proceedings of the 2009 IEEE International Parallel and Distributed Processing Symposium. 5160930. (IPDPS 2009 - Proceedings of the 2009 IEEE International Parallel and Distributed Processing Symposium).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Optimization of tele-immersion codes

Sidelnik, A., Sung, I. J., Wu, W., Garzarán, M. J., Hwu, W. M., Nahrstedt, K., Padua, D. & Patel, S. J., Jul 23 2009, Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, GPGPU-2. 1 p. (Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, GPGPU-2).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2008

Accelerating advanced MRI reconstructions on GPUs

Stone, S. S., Haldar, J. P., Tsao, S. C., Hwu, W. M. W., Liang, Z. P. & Sutton, B. P., 2008, Conference on Computing Frontiers - Proceedings of the 2008 Conference on Computing Frontiers, CF'08. Association for Computing Machinery, p. 261-272 12 p. (Conference on Computing Frontiers - Proceedings of the 2008 Conference on Computing Frontiers, CF'08).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

CUBA: An architecture for efficient CPU/Co-processor data communication

Gelado, I., Kelm, J. H., Ryoo, S., Lumetta, S. S., Navarro, N. & Hwu, W. M. W., 2008, ICS'08 - Proceedings of the 2008 ACM International Conference on Supercomputing. p. 299-308 10 p. (Proceedings of the International Conference on Supercomputing).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

CUDA-Lite: Reducing GPU programming complexity

Ueng, S. Z., Lathara, M., Baghsorkhi, S. S. & Hwu, W. M. W., 2008, Languages and Compilers for Parallel Computing - 21st International Workshop, LCPC 2008, Revised Selected Papers. p. 1-15 15 p. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); vol. 5335 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

GPU acceleration of cutoff pair potentials for molecular modeling applications

Rodrigues, C. I., Hardy, D. J., Stone, J. E., Schulten, K. & Hwu, W. M. W., Dec 1 2008, Conference on Computing Frontiers - Proceedings of the 2008 Conference on Computing Frontiers, CF'08. p. 273-282 10 p. (Conference on Computing Frontiers - Proceedings of the 2008 Conference on Computing Frontiers, CF'08).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Iteration disambiguation for parallelism identification in time-sliced applications

Ryoo, S., Rodrigues, C. I. & Hwu, W. M. W., Oct 27 2008, Languages and Compilers for Parallel Computing - 20th International Workshop, LCPC 2007, Revised Selected Papers. p. 110-124 15 p. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); vol. 5234 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

MCUDA: An efficient implementation of CUDA kernels for multi-core CPUs

Stratton, J. A., Stone, S. S. & Hwu, W-M. W., Dec 1 2008, Languages and Compilers for Parallel Computing - 21st International Workshop, LCPC 2008, Revised Selected Papers. p. 16-30 15 p. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); vol. 5335 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Optimization principles and application performance evaluation of a multithreaded GPU using CUDA

Ryoo, S., Rodrigues, C. I., Baghsorkhi, S. S., Stone, S. S., Kirk, D. B. & Hwu, W. M. W., Dec 1 2008, PPoPP'08 - Proceedings of the 2008 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. p. 73-82 10 p. (Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Program optimization space pruning for a multithreaded GPU

Ryoo, S., Rodrigues, C. I., Stone, S. S., Baghsorkhi, S. S., Ueng, S. Z., Stratton, J. A. & Hwu, W. M. W., May 19 2008, Proceedings of the 2008 CGO - Sixth International Symposium on Code Generation and Optimization. p. 195-204 10 p. (Proceedings of the 2008 CGO - Sixth International Symposium on Code Generation and Optimization).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Visualization and analysis of GPU summer school applicants and participants

Wah, E., Johnson, E., Auvil, L., Thakkar, U., Hwu, W. M., Kirk, D., Dunning, T. H. & Glotzer, S. C., Dec 1 2008, Proceedings - 4th IEEE International Conference on eScience, eScience 2008. p. 362-363 2 p. 4736797. (Proceedings - 4th IEEE International Conference on eScience, eScience 2008).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2007

Automatic discovery of coarse-grained parallelism in media applications

Ryoo, S., Ueng, S. Z., Rodrigues, C. I., Kidd, R. E., Frank, M. I. & Hwu, W. M. W., Dec 1 2007, Transactions on High-Performance Embedded Architectures and Compilers I. p. 194-213 20 p. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); vol. 4050 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

CIGAR: Application partitioning for a CPU/coprocessor architecture

Kelm, J. H., Gelado, I., Murphy, M. J., Navarro, N., Lumetta, S. S. & Hwu, W-M. W., Dec 1 2007, 16th International Conference on Parallel Architecture and Compilation Techniques, PACT 2007. p. 317-326 10 p. 4336222. (Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Corezilla: Build and tame the multicore beast?

Sarno, L., Hwu, W. M. W., Lund, C., Levy, M., Larus, J. R., Reinders, J., Cameron, G., Lennard, C. & Corporation, T., Aug 2 2007, 2007 44th ACM/IEEE Design Automation Conference, DAC'07. p. 632-633 2 p. 4261259. (Proceedings - Design Automation Conference).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Implicitly parallel programming models for thousand-core microprocessors

Hwu, W-M. W., Ryoo, S., Ueng, S. Z., Keim, J. H., Gelado, I., Stone, S. S., Kidd, R. E., Baghsorkhi, S. S., Mahesri, A. A., Tsao, S. C., Navarro, N., Lumetta, S. S., Frank, M. I. & Patel, S. J., Aug 2 2007, 2007 44th ACM/IEEE Design Automation Conference, DAC'07. p. 754-759 6 p. 4261284. (Proceedings - Design Automation Conference).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2006

Improved Superblock optimization in GCC

Kidd, R. & Hwu, W-M. W., 2006, Proceedings of the GCC Developers' Summit 2006. p. 85-96 12 p.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2005

"Flea-flicker" Multipass pipelining: An alternative to the high-power out-of-order offense

Barnes, R. D., Ryoo, S. & Hwu, W-M. W., Dec 1 2005, MICRO-38: Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture. p. 319-330 12 p. 1540970. (Proceedings of the Annual International Symposium on Microarchitecture, MICRO).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

The future of computer architecture research: An industrial perspective

Hwu, W. M. & Patel, S., Dec 12 2005, Proceedings - 11th International Symposium on High-Performance Computer Architecture, HPCA-11 2005. 1 p. (Proceedings - International Symposium on High-Performance Computer Architecture).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2003

Beating in-order stalls with "flea-flicker" two-pass pipelining

Barnes, R. D., Patel, S. J., Nystrom, E. M., Navarro, N., Sias, J. W. & Hwu, W-M. W., Jan 1 2003, Proceedings - 36th International Symposium on Microarchitecture, MICRO 2003. IEEE Computer Society, p. 387-398 12 p. 1253243. (Proceedings of the Annual International Symposium on Microarchitecture, MICRO; vol. 2003-January).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2002

Code coverage and input variability: Effects on architecture and compiler research

Hunter, H. C. & Hwu, W. M. W., Dec 1 2002, Proceedings of the 2002 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, CASES '02. p. 79-87 9 p. (Proceedings of the 2002 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, CASES '02).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Vacuum packing: Extracting hardware-detected program phases for post-link optimization

Barnes, R. D., Nystrom, E. M., Merten, M. C. & Hwu, W-M. W., Jan 1 2002, Proceedings - 35th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2002. IEEE Computer Society, p. 233-244 12 p. 1176253. (Proceedings of the Annual International Symposium on Microarchitecture, MICRO; vol. 2002-January).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2000

An empirical study of function pointers using SPEC benchmarks

Cheng, B. C. & Hwu, W. M. W., Jan 1 2000, Languages and Compilers for Parallel Computing - 12th International Workshop, LCPC 1999, Proceedings. Carter, L. & Ferrante, J. (eds.). Springer-Verlag Berlin Heidelberg, p. 490-493 4 p. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); vol. 1863).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1999

An architecture framework for introducing predicated execution into embedded microprocessors

Connors, D. A., Puiatti, J. M., August, D. I., Crozier, K. M. & Hwu, W. M. W., 1999, Euro-Par 1999 - Parallel Processing: 5th International Conference, Proceedings. Springer-Verlag Berlin Heidelberg, p. 1301-1311 11 p. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); vol. 1685 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1998

A study of code reuse and sharing characteristics of Java applications

Conte, M. T., Trick, A. R., Gyllenhaal, J. C. & Hwu, W. W., Jan 1 1998, Workload Characterization: Methodology and Case Studies - Based on the 1st Workshop on Workload Characterization. Maynard, A. M. G. & John, L. K. (eds.). Institute of Electrical and Electronics Engineers Inc., p. 27-35 9 p. 809356. (Workload Characterization: Methodology and Case Studies - Based on the 1st Workshop on Workload Characterization; vol. 1998-November).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Improving static branch prediction in a compiler

Deitrich, B. L., Chen, B. C. & Hwu, W. M. W., Jan 1 1998, Proceedings - 1998 International Conference on Parallel Architectures and Compilation Techniques, PACT 1998. Institute of Electrical and Electronics Engineers Inc., p. 214-221 8 p. (Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1997

Architectural support for compiler-synthesized dynamic branch prediction strategies: Rationale and initial results

August, D. I., Connors, D. A., Gyllenhaal, J. C. & Hwu, W-M. W., 1997, IEEE High-Performance Computer Architecture Symposium Proceedings. Anon (ed.). IEEE, p. 84-93 10 p.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Framework for balancing control flow and predication

August, D. I., Hwu, W. M. W. & Mahlke, S. A., Dec 1 1997, Proceedings of the Annual International Symposium on Microarchitecture. p. 92-103 12 p. (Proceedings of the Annual International Symposium on Microarchitecture).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1995

Application of compiler-assisted multiple-instruction retry to VLIW architectures

Chen, S. K., Fuchs, W. K. & Hwu, W-M. W., 1995, Proceedings of the Conference on Fault-Tolerant Parallel and Distributed Systems. IEEE, p. 51-58 8 p.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

A study of the effects of compiler-controlled speculation on instruction and data caches

Bringmann, R. A., Mahlke, S. A. & Hwu, W. M. W., Jan 1 1995, Proceedings of the 28th Annual Hawaii International Conference on System Sciences, HICSS 1995. IEEE Computer Society, p. 211-220 10 p. 375392. (Proceedings of the Annual Hawaii International Conference on System Sciences; vol. 1).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Comparison of full and partial predicated execution support for ILP processors

Mahlke, S. A., Hank, R. E., McCormick, J. E., August, D. I. & Hwu, W-M. W., 1995, ACM SIGARCH (Association for Computing Nachinery Special Interest Group on Computer Architecture) - Conference Proceedings. ACM, p. 138-149 12 p.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Comparison of full and partial predicated execution support for ILP processors

Mahlke, S. A., Hank, R. E., McCormick, J. E., August, D. I. & Hwu, W. M. W., Jan 1 1995, Conference Proceedings - Annual International Symposium on Computer Architecture, ISCA. p. 138-149 12 p. (Conference Proceedings - Annual International Symposium on Computer Architecture, ISCA).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1994

An analytical approach to scheduling code for superscalar and VLIW architectures

Chen, S. K., Fuchs, W. & Hwu, W-M. W., Jan 1 1994, Proceedings of the 1994 International Conference on Parallel Processing, ICPP 1994. Institute of Electrical and Electronics Engineers Inc., 4115732. (Proceedings of the International Conference on Parallel Processing; vol. 1).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Characterizing the impact of predicated execution on branch prediction

Mahlke, S. A., Hank, R. E., Bringmann, R. A., Gyllenhaal, J. C., Gallagher, D. M. & Hwu, W-M. W., Nov 30 1994, Proceedings of the 27th Annual International Symposium on Microarchitecture, MICRO 1994. IEEE Computer Society, p. 217-227 11 p. (Proceedings of the Annual International Symposium on Microarchitecture, MICRO; vol. Part F129425).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Data relocation and prefetching for programs with large data sets

Yamada, Y., Gyllenhall, J., Haab, G. & Hwu, W-M. W., Nov 30 1994, Proceedings of the 27th Annual International Symposium on Microarchitecture, MICRO 1994. IEEE Computer Society, p. 118-127 10 p. (Proceedings of the Annual International Symposium on Microarchitecture, MICRO; vol. Part F129425).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Dynamic memory disambiguation using the memory conflict buffer

Gallagher, D. M., Chen, W. Y., Mahlke, S. A., Gyllenhaal, J. C. & Hwu, W. M. W., Nov 1 1994, Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 1994. Association for Computing Machinery, p. 183-193 11 p. (International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS; vol. Part F129531).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Speculative execution exception recovery using write-back suppression

Bringmann, R. A., Mahlke, S. A., Hank, R. E., Gyllenhaal, J. C. & Hwu, W-M. W., 1994, Proceedings of the Annual International Symposium on Microarchitecture. Anon (ed.). Publ by IEEE, p. 214-223 10 p.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Superblock formation using static program analysis

Hank, R. E., Mahlke, S. A., Bringmann, R. A., Gyllenhaal, J. C. & Hwu, W-M. W., Jan 1 1994, Proceedings of the Annual International Symposium on Microarchitecture. Anon (ed.). Publ by IEEE, p. 247-255 9 p. (Proceedings of the Annual International Symposium on Microarchitecture).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

The application of compiler-assisted multiple-instruction retry to VLIW architectures

Chen, S. K., Fuchs, W. K. & Hwu, W. M. W., Jan 1 1994, Proceedings of IEEE Workshop on Fault-Tolerant Parallel and Distributed Systems, FTPDS 1994. Pradhan, D. & Avresky, D. (eds.). Institute of Electrical and Electronics Engineers Inc., p. 51-58 8 p. 494474. (Proceedings of IEEE Workshop on Fault-Tolerant Parallel and Distributed Systems, FTPDS 1994).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1993

Reverse if-conversion

Warter, N. J., Mahlke, S. A., Hwu, W. M. W. & Rau, B. R., Dec 1 1993, Proc ACM SIGPLAN 93 Conf Program Lang Des Implementation. Anon (ed.). Publ by ACM, p. 290-299 10 p. (Proc ACM SIGPLAN 93 Conf Program Lang Des Implementation).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

The benefit of predicated execution for software pipelining

Warter, N. J., Lavery, D. R. & Hwu, W. M. W., Jan 1 1993, Proceedings of the 26th Hawaii International Conference on System Sciences, HICSS 1993. IEEE Computer Society, p. 497-506 10 p. 1198122. (Proceedings of the Annual Hawaii International Conference on System Sciences; vol. 1).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Using profile information to assist advanced compiler optimization and scheduling

Chen, W., Bringmann, R., Mahlke, S., Anik, S., Kiyohara, T., Warter, N., Lavery, D., Hwu, W. M., Hank, R. & Gyllenhaal, J., Jan 1 1993, Languages and Compilers for Parallel Computing - 5th International Workshop, Proceedings. Padua, D., Nicolau, A., Gelernter, D. & Banerjee, U. (eds.). Springer-Verlag, p. 31-48 18 p. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); vol. 757 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution