Abstract
Heterogeneous systems with hardware accelerators are increasingly common, and various optimized implementations/algorithms exist for computation kernels. However, no single best combination of code version and device (C&D) can outper-form others across all input cases, demanding a method to select the best C&D pair based on input. We present machine learning-based code version and device selection method, named MLCD, that uses input data characteristics to select the best C&D pair dynamically. We also apply active learning to reduce the number of samples needed to construct the model. Demonstrated on two different CPU-GPU systems, MLCD achieves near-optimal speed-up regardless of which systems tested. Concretely, reporting results from system one with mid-end hardwares, it achieves 99.9%, 95.6%, 99.9%, and 98.6% of the optimal acceleration attainable through the ideal choice of C&D pairs in General Matrix Multiply, PageRank, N-body Simulation, and K-Motif Counting, respectively. MLCD achieves a speed-up of 2.57×, 1.58×, 2.68×, and 1.09× compared to baselines without MLCD. Additionally, MLCD handles end-to-end applications, achieving up to 10% and 46% speed-up over GPU-only and CPU-only solutions with Graph Neural Networks. Furthermore, it achieves 7.28× average speed-up in execution latency over the state-of-the-art approach and determines suitable code versions for unseen input 108 − 1010× faster.
Original language | English (US) |
---|---|
Journal | IEEE Transactions on Computers |
DOIs | |
State | Accepted/In press - 2025 |
Keywords
- Active Learning
- Heterogeneous Systems
- Input Data-aware
- Machine Learning
- Performance Optimizations
ASJC Scopus subject areas
- Software
- Theoretical Computer Science
- Hardware and Architecture
- Computational Theory and Mathematics