Abstract
It is well known that graphics processing units (GPUs) are able to accelerate highly parallelizable algorithms with a high speedup. However, for less-parallelizable algorithms such as the finite element method, novel schemes are needed to achieve a high speedup. In this paper, the dual-field domain decomposition (DFDD) method based on element-level decomposition (DFDD-ELD) is accelerated on a large GPU cluster. By using element-level subdomains, the DFDD-ELD computation can be easily mapped onto GPU's granular processors and is thus highly parallelizable. Various electromagnetic problems are simulated to demonstrate the speedup and scalability of DFDD-ELD on a GPU cluster. With a careful GPU memory arrangement and thread allocation, we are able to achieve a significant speedup by utilizing GPUs in a message-passing interface (MPI)-based cluster environment. The same acceleration strategy can be applied to the acceleration of the discontinuous Galerkin time-domain (DGTD) algorithms.
Original language | English (US) |
---|---|
Article number | 6832499 |
Pages (from-to) | 4706-4715 |
Number of pages | 10 |
Journal | IEEE Transactions on Antennas and Propagation |
Volume | 62 |
Issue number | 9 |
DOIs | |
State | Published - Sep 1 2014 |
Keywords
- Circuit analysis
- Compute unified device architecture (CUDA)
- Finite-element analysis
- GPU cluster
- Graphics processing unit (GPU)
- High-performance computing
- Message-passing interface (MPI)
- Multi-GPU
- Parallel programming
- Radar cross section
- Time-domain analysis
ASJC Scopus subject areas
- Electrical and Electronic Engineering