Wen-Mei W Hwu

1984 …2019
If you made any changes in Pure, your changes will be visible here soon.

Research Output 1984 2019

2019

Analysis and modeling of collaborative execution strategies for heterogeneous CPU-FPGA architectures

Huang, S., De Gonzalo, S. G., El-Hadedy, M., Chang, L. W., Gómez-Luna, J., Milojicic, D., El Hajj, I., Chalamalasetti, S. R., Mutlu, O., Chen, D. & Hwu, W. M., Apr 4 2019, ICPE 2019 - Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering. Association for Computing Machinery, Inc, p. 79-90 12 p. (ICPE 2019 - Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Program processors
Field programmable gate arrays (FPGA)
Particle accelerators
Data storage equipment
Computer programming languages

An efficient GPU implementation technique for higher-order 3D stencils

Anjum, O., Simon, G. D. G., Hidayetoglu, M. & Hwu, W. M., Aug 2019, Proceedings - 21st IEEE International Conference on High Performance Computing and Communications, 17th IEEE International Conference on Smart City and 5th IEEE International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2019. Xiao, Z., Yang, L. T., Balaji, P., Li, T., Li, K. & Zomaya, A. (eds.). Institute of Electrical and Electronics Engineers Inc., p. 552-561 10 p. 8855722. (Proceedings - 21st IEEE International Conference on High Performance Computing and Communications, 17th IEEE International Conference on Smart City and 5th IEEE International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2019).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Data storage equipment
Bandwidth
Graphics processing unit
Grid
Scaling

Automatic Generation of Warp-Level Primitives and Atomic Instructions for Fast and Portable Parallel Reduction on GPUs

Gonzalo, S. G. D., Huang, S., Gomez-Luna, J., Hammond, S., Mutlu, O. & Hwu, W. M., Mar 5 2019, CGO 2019 - Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization. Moseley, T., Jimborean, A. & Kandemir, M. T. (eds.). Institute of Electrical and Electronics Engineers Inc., p. 73-84 12 p. 8661187. (CGO 2019 - Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Portability
Shuffle
Hardware
Domain-specific Languages
Programming

DeepStore: In-storage acceleration for intelligent queries

Mailthody, V. S., Qureshi, Z., Liang, W., Feng, Z., Gonzalo, S. G. D., Li, Y., Franke, H., Xiong, J., Huang, J. & Hwu, W. M., Oct 12 2019, MICRO 2019 - 52nd Annual IEEE/ACM International Symposium on Microarchitecture, Proceedings. IEEE Computer Society, p. 224-238 15 p. (Proceedings of the Annual International Symposium on Microarchitecture, MICRO).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Particle accelerators
Energy efficiency
Texturing
Image retrieval
Simulators

Evaluating characteristics of CUDA communication primitives on high-bandwidth interconnects

Pearson, C., Dakkak, A., Hashash, S., Li, C., Chung, I. H., Xiong, J. & Hwu, W-M. W., Apr 4 2019, ICPE 2019 - Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering. Association for Computing Machinery, Inc, p. 209-218 10 p. (ICPE 2019 - Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Open Access
Data transfer
Bandwidth
Program processors
Communication
Data storage equipment

FlatFlash: Exploiting the Byte-Accessibility of SSDs within A Unified Memory-Storage Hierarchy

Abulila, A., Mailthody, V. S., Qureshi, Z., Huang, J., Kim, N. S., Xiong, J. & Hwu, W. M., Apr 4 2019, ASPLOS 2019 - 24th International Conference on Architectural Support for Programming Languages and Operating Systems. Association for Computing Machinery, p. 971-985 15 p. (International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Open Access
Data storage equipment
Dynamic random access storage
Flash-based SSDs
Cost effectiveness
Metadata

FPGA/DNN co-design: An efficient design methodology for IoT intelligence on the edge

Hao, C., Zhang, X., Li, Y., Huang, S., Xiong, J., Rupnow, K., Hwu, W-M. W. & Chen, D., Jun 2 2019, Proceedings of the 56th Annual Design Automation Conference 2019, DAC 2019. Institute of Electrical and Electronics Engineers Inc., a206. (Proceedings - Design Automation Conference).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Co-design
Field Programmable Gate Array
Design Methodology
Field programmable gate arrays (FPGA)
Accelerator

Hardware-Software Co-Design for an Analog-Digital Accelerator for Machine Learning

Ambrosi, J., Ankit, A., Antunes, R., Chalamalasetti, S. R., Chatterjee, S., El Hajj, I., Fachini, G., Faraboschi, P., Foltin, M., Huang, S., Hwu, W-M. W., Knuppe, G., Lakshminarasimha, S. V., Milojicic, D., Parthasarathy, M., Ribeiro, F., Rosa, L., Roy, K., Silveira, P. & Strachan, J. P., Feb 8 2019, 2018 IEEE International Conference on Rebooting Computing, ICRC 2018. Institute of Electrical and Electronics Engineers Inc., 8638612. (2018 IEEE International Conference on Rebooting Computing, ICRC 2018).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Particle accelerators
Learning systems
Hardware
Memristors
Neural networks

Implementing neural machine translation with bi-directional GRU and attention mechanism on FPGAs using HLS

Li, Q., Zhang, X., Xiong, J. J., Hwu, W-M. W. & Chen, D., Jan 21 2019, ASP-DAC 2019 - 24th Asia and South Pacific Design Automation Conference. Institute of Electrical and Electronics Engineers Inc., p. 693-698 6 p. (Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Field programmable gate arrays (FPGA)
Data storage equipment
Computer hardware
Energy efficiency
Statistics

MLModelScope: Evaluate and introspect cognitive pipelines

Li, C., Dakkak, A., Xiong, J. & Hwu, W-M. W., Jul 2019, Proceedings - 2019 IEEE World Congress on Services, SERVICES 2019. Chang, C. K., Chen, P., Goul, M., Oyama, K., Reiff-Marganiec, S., Sun, Y., Wang, S. & Wang, Z. (eds.). Institute of Electrical and Electronics Engineers Inc., p. 335-338 4 p. 8817116. (Proceedings - 2019 IEEE World Congress on Services, SERVICES 2019).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Pipelines
Learning systems
Innovation
Hardware
Deep learning

Near-Memory and In-Storage FPGA Acceleration for Emerging Cognitive Computing Workloads

Dhar, A., Huang, S., Xiong, J., Jamsek, D., Mesnet, B., Huang, J., Kim, N. S., Hwu, W. M. & Chen, D., Jul 2019, Proceedings - 2019 IEEE Computer Society Annual Symposium on VLSI, ISVLSI 2019. IEEE Computer Society, p. 68-75 8 p. 8839401. (Proceedings of IEEE Computer Society Annual Symposium on VLSI, ISVLSI; vol. 2019-July).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Field programmable gate arrays (FPGA)
Particle accelerators
Data storage equipment
Bandwidth
Data transfer

PUMA: A Programmable Ultra-efficient Memristor-based Accelerator for Machine Learning Inference

Ankit, A., El Hajj, I., Rahul Chalamalasetti, S., Ndu, G., Foltin, M., Williams, R. S., Faraboschi, P., Hwu, W-M. W., Paul Strachan, J., Roy, K. & Milojicic, D. S., Apr 4 2019, ASPLOS 2019 - 24th International Conference on Architectural Support for Programming Languages and Operating Systems. Association for Computing Machinery, p. 715-731 17 p. (International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Open Access
Memristors
Particle accelerators
Learning systems
Image recognition
Energy efficiency

TrIMS: Transparent and isolated model sharing for low latency deep learning inference in function-as-a-service

Dakkak, A., Li, C., De Gonzalo, S. G., Xiong, J. & Hwu, W-M. W., Jul 2019, Proceedings - 2019 IEEE International Conference on Cloud Computing, CLOUD 2019 - Part of the 2019 IEEE World Congress on Services. Bertino, E., Chang, C. K., Chen, P., Damiani, E., Goul, M. & Oyama, K. (eds.). IEEE Computer Society, p. 372-382 11 p. 8814494. (IEEE International Conference on Cloud Computing, CLOUD; vol. 2019-July).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Containers
Pipelines
Image classification
Cloud computing
Program processors