Wen-Mei W Hwu

1984 …2019
If you made any changes in Pure, your changes will be visible here soon.

Research Output 1984 2019

A study of the energy saving and capacity improvement potential of power control in multi-hop wireless networks

Monkst, J. P., Ebert, J. P., Woliszi, A. A. & Hwu, W-M. W., Jan 1 2001, In : Conference on Local Computer Networks. p. 550-559 10 p., 72.

Research output: Contribution to journalArticle

Power control
Wireless networks
Energy conservation
Topology
Packet networks

A tiling-scheme Viterbi decoder in Software Defined Radio for GPUs

Lin, C. S., Liu, W. L., Yeh, W. T., Chang, L. W., Hwu, W-M. W., Chen, S. J. & Hsiung, P. A., Oct 31 2011, 7th International Conference on Wireless Communications, Networking and Mobile Computing, WiCOM 2011. 6036680. (7th International Conference on Wireless Communications, Networking and Mobile Computing, WiCOM 2011).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Decoding
radio
Hamming distance
Merging
Program processors

Automatic discovery of coarse-grained parallelism in media applications

Ryoo, S., Ueng, S. Z., Rodrigues, C. I., Kidd, R. E., Frank, M. I. & Hwu, W-M. W., Dec 1 2007, Transactions on High-Performance Embedded Architectures and Compilers I. p. 194-213 20 p. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); vol. 4050 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Parallelism
Hardware
Processing
Computer programming languages
Particle accelerators

Automatic execution of single-GPU computations across multiple GPUs

Cabezas, J., Vilanova, L., Gelado, I., Jablin, T. B., Navarro, N. & Hwu, W-M. W., Jan 1 2014, PACT 2014 - Proceedings of the 23rd International Conference on Parallel Architectures and Compilation Techniques. Institute of Electrical and Electronics Engineers Inc., p. 467-468 2 p. (Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

kernel
Decompose
Runtime Systems
Data Distribution
Interconnect

Automatic Generation of Warp-Level Primitives and Atomic Instructions for Fast and Portable Parallel Reduction on GPUs

Gonzalo, S. G. D., Huang, S., Gomez-Luna, J., Hammond, S., Mutlu, O. & Hwu, W-M. W., Mar 5 2019, CGO 2019 - Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization. Moseley, T., Jimborean, A. & Kandemir, M. T. (eds.). Institute of Electrical and Electronics Engineers Inc., p. 73-84 12 p. 8661187. (CGO 2019 - Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Portability
Shuffle
Hardware
Domain-specific Languages
Programming

Automatic parallelization of kernels in shared-memory multi-GPU nodes

Cabezas, J., Vilanova, L., Gelado, I., Jablin, T. B., Navarro, N. & Hwu, W-M. W., Jun 8 2015, ICS 2015 - Proceedings of the 29th ACM International Conference on Supercomputing. Association for Computing Machinery, p. 3-13 11 p. (Proceedings of the International Conference on Supercomputing; vol. 2015-June).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Data storage equipment
Graphics processing unit
Scheduling
Costs

Beating in-order stalls with "Flea-Flicker" two-pass pipelining

Barnes, R. D., Sias, J. W., Nystrom, E. M., Patel, S. J., Navarro, J. & Hwu, W-M. W., Jan 1 2006, In : IEEE Transactions on Computers. 55, 1, p. 18-33 16 p.

Research output: Contribution to journalArticle

Pipelining
Pipelines
Latency
Compiler
Percent

Benchmark characterization

Conte, T. M. & Hwu, W-M. W., Jan 1 1991, In : Proceedings of the Annual Hawaii International Conference on System Sciences. 1, p. 364-372 9 p., 183907.

Research output: Contribution to journalConference article

Computer operating systems
Computer systems
Data storage equipment
Costs

Benchmark Characterization

Conte, T. M. & Hwu, W-M. W., Jan 1991, Computer, 24, 1, p. 48-56 9 p.

Research output: Contribution to specialist publicationArticle

Digital storage
Computer operating systems

Benchmark characterization for experimental system evaluation

Conte, T. M. & Hwu, W-M. W., 1990, Proceedings of the Hawaii International Conference on System Science. Hoevel, L. W., Shriver, B. D., Nunamaker, J. F. J., Sprague, R. H. J. & Milutinovic, V. (eds.). Publ by Western Periodicals Co, Vol. 1. p. 6-18 13 p.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Data storage equipment
Benchmarking
Systems analysis

BLESS: Bloom filter-based error correction solution for high-throughput sequencing reads

Heo, Y., Wu, X. L., Chen, D., Ma, J. & Hwu, W-M. W., May 15 2014, In : Bioinformatics. 30, 10, p. 1354-1362 9 p.

Research output: Contribution to journalArticle

Bloom Filter
Error correction
Error Correction
Sequencing
High Throughput

BLESS 2: Accurate, memory-efficient and fast error correction method

Heo, Y., Ramachandran, A., Hwu, W-M. W., Ma, J. & Chen, D., Aug 1 2016, In : Bioinformatics. 32, 15, p. 2369-2371 3 p.

Research output: Contribution to journalArticle

Error correction
Error Correction
Data storage equipment
Genome
Efficiency
Bottom-up
Scalability
Context
Subgraph
Benchmark

Branch recovery with compiler-assisted multiple instruction retry

Alewine, N. J., Chen, S. K., Li, C. C., Fuchs, W. K. & Hwu, W-M. W., Jan 1 1992, FTCS 1992 - 22nd Annual International Symposium on Fault-Tolerant Computing. Institute of Electrical and Electronics Engineers Inc., p. 66-73 8 p. 243614. (FTCS 1992 - 22nd Annual International Symposium on Fault-Tolerant Computing).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Hardware
Recovery
Hazards

C COMPILER FOR HPS I, A HIGHLY PARALLEL EXECUTION ENGINE.

Shebanow, M. C., Patt, Y. N., Hwu, W-M. W. & Melvin, S., 1986, In : Proceedings of the Hawaii International Conference on System Science. 2 a, p. 520-528 9 p.

Research output: Contribution to journalArticle

Engines

Chai: Collaborative heterogeneous applications for integrated-Architectures

Ǵomez-Luna, J., Hajj, I. E., Chang, L. W., Garćia-Flores, V., De Gonzalo, S. G., Jablin, T. B., Pẽna, A. J. & Hwu, W-M. W., Jul 11 2017, ISPASS 2017 - IEEE International Symposium on Performance Analysis of Systems and Software. Institute of Electrical and Electronics Engineers Inc., p. 43-54 12 p. 7975269. (ISPASS 2017 - IEEE International Symposium on Performance Analysis of Systems and Software).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Computer programming languages
Program processors
Data storage equipment
Specifications
Experiments

Characterizing the impact of predicated execution on branch prediction

Mahlke, S. A., Hank, R. E., Bringmann, R. A., Gyllenhaal, J. C., Gallagher, D. M. & Hwu, W-M. W., Nov 30 1994, Proceedings of the 27th Annual International Symposium on Microarchitecture, MICRO 1994. IEEE Computer Society, p. 217-227 11 p. (Proceedings of the Annual International Symposium on Microarchitecture, MICRO; vol. Part F129425).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Characterizing the impact of predicated execution on branch prediction

Mahlke, S. A., Hank, R. E., Bringmann, R. A., Gyllenhaal, J. C., Gallagher, D. M. & Hwu, W-M. W., Dec 7 1994, Professional Engineering, 7, 21, p. 217-227 11 p.

Research output: Contribution to specialist publicationArticle

Checkpoint Repair for High-Performance Out-of-Order Execution Machines

Hwu, W-M. W. & Patt, Y. N., Dec 1987, In : IEEE Transactions on Computers. C-36, 12, p. 1496-1514 19 p.

Research output: Contribution to journalArticle

Checkpoint
Repair
High Performance
Branch Prediction
Cache memory

CHECKPOINT REPAIR FOR OUT-OF-ORDER EXECUTION MACHINES.

Hwu, W-M. W. & Patt, Y. N., 1987, Conference Proceedings - Annual Symposium on Computer Architecture. IEEE, p. 18-26 9 p.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Repair
Cache memory
Supercomputers
Engines
Hardware

CIGAR: Application partitioning for a CPU/coprocessor architecture

Kelm, J. H., Gelado, I., Murphy, M. J., Navarro, N., Lumetta, S. S. & Hwu, W-M. W., Dec 1 2007, 16th International Conference on Parallel Architecture and Compilation Techniques, PACT 2007. p. 317-326 10 p. 4336222. (Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Program processors
Partitioning
Prototyping
Embedded Processor
Methodology

ClMPI: An opencl extension for interoperation with the message passing interface

Takizawa, H., Sugawara, M., Hirasawa, S., Gelado, I., Kobayashi, H. & Hwu, W-M. W., Jan 1 2013, Proceedings - IEEE 27th International Parallel and Distributed Processing Symposium Workshops and PhD Forum, IPDPSW 2013. IEEE Computer Society, p. 1138-1148 11 p. 6651000. (Proceedings - IEEE 27th International Parallel and Distributed Processing Symposium Workshops and PhD Forum, IPDPSW 2013).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Message Passing Interface
Message passing
Data transfer
Data Transfer
Program processors

Code coverage and input variability: Effects on architecture and compiler research

Hunter, H. C. & Hwu, W-M. W., Dec 1 2002, Proceedings of the 2002 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, CASES '02. p. 79-87 9 p. (Proceedings of the 2002 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, CASES '02).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Telecommunication
Benchmarking
Experiments
Compliance

Code reordering and speculation support for dynamic optimization systems

Nystrom, E. M., Barnes, R. D., Merten, M. C. & Hwu, W-M. W., Jan 1 2001, In : Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT. p. 163-174 12 p.

Research output: Contribution to journalConference article

Speculation
Dynamic Optimization
Reordering
Exception
Optimization

Collaborative (CPU + GPU) Algorithms for Triangle Counting and Truss Decomposition

Mailthody, V. S., Date, K., Qureshi, Z., Pearson, C., Nagi, R., Xiong, J. & Hwu, W-M. W., Nov 26 2018, 2018 IEEE High Performance Extreme Computing Conference, HPEC 2018. Institute of Electrical and Electronics Engineers Inc., 8547517. (2018 IEEE High Performance Extreme Computing Conference, HPEC 2018).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Program processors
Decomposition
Hardware
Graphics processing unit

Collaborative (CPU + GPU) algorithms for triangle counting and truss decomposition on the Minsky architecture: Static graph challenge: Subgraph isomorphism

Date, K., Feng, K., Nagi, R., Xiong, J., Kim, N. S. & Hwu, W-M. W., Oct 30 2017, 2017 IEEE High Performance Extreme Computing Conference, HPEC 2017. Institute of Electrical and Electronics Engineers Inc., 8091042. (2017 IEEE High Performance Extreme Computing Conference, HPEC 2017).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Program processors
Decomposition
Data storage equipment
Benchmarking
Graphics processing unit

Collaborative computing for heterogeneous integrated systems

Chang, L. W., Gómez-Luna, J., El Hajj, I., Huang, S., Chen, D. & Hwu, W-M. W., Apr 17 2017, ICPE 2017 - Proceedings of the 2017 ACM/SPEC International Conference on Performance Engineering. Association for Computing Machinery, Inc, p. 385-388 4 p. (ICPE 2017 - Proceedings of the 2017 ACM/SPEC International Conference on Performance Engineering).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Computer supported cooperative work
Program processors
Field programmable gate arrays (FPGA)
Computer systems
Data storage equipment

Combining trace sampling with single pass methods for efficient cache simulation

Conte, T. M., Hirsch, M. A. & Hwu, W-M. W., Dec 1 1998, In : IEEE Transactions on Computers. 47, 6, p. 714-719 6 p.

Research output: Contribution to journalArticle

Memory Hierarchy
Cache
Trace
Sampling
Data storage equipment

Comparative performance evaluation of multi-GPU MLFMA implementation for 2-D VIE problems

Pearson, C., Hidayetoglu, M., Ren, W., Chew, W. C. & Hwu, W-M. W., Jul 25 2017, CEM 2017 - 2017 Computing and Electromagnetics International Workshop. Gurel, L. (ed.). Institute of Electrical and Electronics Engineers Inc., p. 63-64 2 p. 7991888. (CEM 2017 - 2017 Computing and Electromagnetics International Workshop).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

multipoles
evaluation
two dimensional bodies
water
supercomputers

Comparing software and hardware schemes for reducing the cost of branches.

Hwu, W-M. W., Conte, T. M. & Chang, P. P., May 1989, In : Conference Proceedings - Annual Symposium on Computer Architecture. 16, p. 224-233 10 p.

Research output: Contribution to journalArticle

Hardware
Pipelines
Costs
Throughput

Comparing static and dynamic code scheduling for multiple-instruction-issue processors

Chang, P. P., Chen, W. Y., Mahlke, S. A. & Hwu, W-M. W., Sep 1 1991, MICRO 1991 - Proceedings of the 24th Annual International Symposium on Microarchitecture. IEEE Computer Society, p. 25-33 9 p. (Proceedings of the Annual International Symposium on Microarchitecture, MICRO).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Scheduling
Hardware
Experiments

Comparison based sorting for systems with multiple GPUs

Tanasic, I., Vilanova, L., Jordà, M., Cabezas, J., Gelado, I., Navarro, N. & Hwu, W-M. W., Apr 15 2013, Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, GPGPU 2013. p. 1-11 11 p. (ACM International Conference Proceeding Series).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Sorting
Graphics processing unit

Comparison of full and partial predicated execution support for ILP processors

Mahlke, S. A., Hank, R. E., McCormick, J. E., August, D. I. & Hwu, W-M. W., 1995, Conference Proceedings - Annual International Symposium on Computer Architecture, ISCA. p. 138-149 12 p.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Inductive logic programming (ILP)
Code generation

Comparison of full and partial predicated execution support for ILP processors

Mahlke, S. A., Hank, R. E., McCormick, J. E., August, D. I. & Hwu, W-M. W., 1995, ACM SIGARCH (Association for Computing Nachinery Special Interest Group on Computer Architecture) - Conference Proceedings. ACM, p. 138-149 12 p.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Inductive logic programming (ILP)
Code generation

COMPARISON OF SEVERAL EVOLVING (UNIVERSITY) SUPERCOMPUTER ARCHITECTURES.

Patt, Y. N., Sheldon, R. G., Shebanow, M., Ponder, C. & Hwu, W-M. W., 1984, Unknown Host Publication Title. IEEE, p. 15-26 12 p.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Supercomputers

Compiler-Assisted Multiple Instruction Rollback Recovery Using a Read Buffer

Alewine, N. J., Chen, S. K., Fuchs, W. K. & Hwu, W-M. W., Sep 1995, In : IEEE Transactions on Computers. 44, 9, p. 1096-1107 12 p.

Research output: Contribution to journalArticle

Rollback Recovery
Compiler
Buffer
Hazards
Hardware

Compiler-Based Multiple Instruction Retry

Li, C. C. J., Chen, S. K., Fuchs, W. K. & Hwu, W-M. W., Jan 1995, In : IEEE Transactions on Computers. 44, 1, p. 35-46 12 p.

Research output: Contribution to journalArticle

Compiler
Experiments
Compilation
Termination
Interference

Compiler code transformations for superscalar-based high-performance systems

Mahlke, S. A., Chen, W. Y., Gyuenhaal, J. C., Hwu, W-M. W., Chang, P. P. & Kiyohara, T., Dec 1 1992, Proceedings of the 1992 ACM/IEEE conference on Supercomputing, Supercomputing 1992. Werner, R. (ed.). Association for Computing Machinery, p. 808-817 10 p. (Proceedings of the International Conference on Supercomputing; vol. Part F129723).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Supercomputers

Compiler-directed dynamic computation reuse: Rationale and initial results

Connors, D. A. & Hwu, W-M. W., Dec 1 1999, In : Proceedings of the Annual International Symposium on Microarchitecture. p. 158-169 12 p.

Research output: Contribution to journalConference article

Hardware

Compiler-directed early load-address generation

Cheng, B. C., Connors, D. A. & Hwu, W-M. W., Dec 1 1998, In : Proceedings of the Annual International Symposium on Microarchitecture. p. 138-147 10 p.

Research output: Contribution to journalConference article

Hardware
Pipelines

Compiler Technology

Chung, W. H. J., Lyu, Y. H., Sung, I. J. R., Lee, Y. W. & Hwu, W-M. W., Dec 4 2015, Heterogeneous System Architecture: A New Compute Platform Infrastructure. Elsevier Inc., p. 97-129 33 p.

Research output: Chapter in Book/Report/Conference proceedingChapter

Data storage equipment
Parallel programming
Computer programming languages
Program processors
Code generation

Compiler Technology for Future Microprocessors

Hwu, W-M. W., Hank, R. E., Lavery, D. M., Haab, G. E., Gyllenhaal, J. C., August, D. I., Gallagher, D. M. & Mahlke, S. A., Dec 1995, In : Proceedings of the IEEE. 83, 12, p. 1625-1640 16 p.

Research output: Contribution to journalArticle

Microprocessor chips
Hardware
Processing

Compute unified device architecture application suitability

Hwu, W-M. W., Rodrigues, C., Ryoo, S. & Stratton, J., May 1 2009, In : Computing in Science and Engineering. 11, 3, p. 16-26 11 p., 4814979.

Research output: Contribution to journalArticle

Graphics processing unit

Conclusion and future outlook

Kirk, D. B. & Hwu, W-M. W., Jan 1 2012, Programming Massively Parallel Processors: A Hands-on Approach, Second Edition. Elsevier Science, p. 459-469 11 p.

Research output: Chapter in Book/Report/Conference proceedingChapter

Control flow optimization for supercomputer scalar processing

Chang, P. P. & Hwu, W-M. W., Jun 1 1989, Proceedings of the 3rd International Conference on Supercomputing, ICS 1989. Association for Computing Machinery, p. 145-153 9 p. (Proceedings of the International Conference on Supercomputing; vol. Part F130180).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Supercomputers
Flow control
Processing
Pipelines
Hardware

Corezilla: Build and tame the multicore beast?

Sarno, L., Hwu, W. M. W., Lund, C., Levy, M., Larus, J. R., Reinders, J., Cameron, G., Lennard, C. & Corporation, T., Aug 2 2007, 2007 44th ACM/IEEE Design Automation Conference, DAC'07. p. 632-633 2 p. 4261259. (Proceedings - Design Automation Conference).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Software engineering
Systems analysis
Hardware

CRITICAL ISSUES REGARDING HPS, A HIGH PERFORMANCE MICROARCHITECTURE.

Patt, Y. N., Melvin, S. W., Hwu, W. M. & Shebanow, M. C., Dec 1 1985, MICRO: Annual Microprogramming Workshop. ACM, p. 109-116 8 p. (MICRO: Annual Microprogramming Workshop).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

CUBA: An architecture for efficient CPU/Co-processor data communication

Gelado, I., Kelm, J. H., Ryoo, S., Lumetta, S. S., Navarro, N. & Hwu, W-M. W., Dec 15 2008, ICS'08 - Proceedings of the 2008 ACM International Conference on Supercomputing. p. 299-308 10 p. (Proceedings of the International Conference on Supercomputing).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Program processors
Communication
Data structures
Data storage equipment
Coprocessor

CUDA application development

Hwu, W-M. W., May 20 2016, 2008 IEEE Hot Chips 20 Symposium, HCS 2008. Institute of Electrical and Electronics Engineers Inc., 7476522

Research output: Chapter in Book/Report/Conference proceedingConference contribution

CUDA dynamic parallelism

Kirk, D. B. & Hwu, W-M. W., Jan 1 2012, Programming Massively Parallel Processors: A Hands-on Approach, Second Edition. Elsevier Science, p. 435-457 23 p.

Research output: Chapter in Book/Report/Conference proceedingChapter