Acceleration of the dual-field domain decomposition algorithm using MPI-CUDA on large-scale computing systems

Huan Ting Meng, Jian Ming Jin

Research output: Contribution to journalArticlepeer-review


It is well known that graphics processing units (GPUs) are able to accelerate highly parallelizable algorithms with a high speedup. However, for less-parallelizable algorithms such as the finite element method, novel schemes are needed to achieve a high speedup. In this paper, the dual-field domain decomposition (DFDD) method based on element-level decomposition (DFDD-ELD) is accelerated on a large GPU cluster. By using element-level subdomains, the DFDD-ELD computation can be easily mapped onto GPU's granular processors and is thus highly parallelizable. Various electromagnetic problems are simulated to demonstrate the speedup and scalability of DFDD-ELD on a GPU cluster. With a careful GPU memory arrangement and thread allocation, we are able to achieve a significant speedup by utilizing GPUs in a message-passing interface (MPI)-based cluster environment. The same acceleration strategy can be applied to the acceleration of the discontinuous Galerkin time-domain (DGTD) algorithms.

Original languageEnglish (US)
Article number6832499
Pages (from-to)4706-4715
Number of pages10
JournalIEEE Transactions on Antennas and Propagation
Issue number9
StatePublished - Sep 1 2014


  • Circuit analysis
  • Compute unified device architecture (CUDA)
  • Finite-element analysis
  • GPU cluster
  • Graphics processing unit (GPU)
  • High-performance computing
  • Message-passing interface (MPI)
  • Multi-GPU
  • Parallel programming
  • Radar cross section
  • Time-domain analysis

ASJC Scopus subject areas

  • Electrical and Electronic Engineering


Dive into the research topics of 'Acceleration of the dual-field domain decomposition algorithm using MPI-CUDA on large-scale computing systems'. Together they form a unique fingerprint.

Cite this