Acceleration of the dual-field domain decomposition algorithm using MPI-CUDA on large-scale computing systems

Huan Ting Meng, Jianming Jin

Research output: Contribution to journalArticle

Abstract

It is well known that graphics processing units (GPUs) are able to accelerate highly parallelizable algorithms with a high speedup. However, for less-parallelizable algorithms such as the finite element method, novel schemes are needed to achieve a high speedup. In this paper, the dual-field domain decomposition (DFDD) method based on element-level decomposition (DFDD-ELD) is accelerated on a large GPU cluster. By using element-level subdomains, the DFDD-ELD computation can be easily mapped onto GPU's granular processors and is thus highly parallelizable. Various electromagnetic problems are simulated to demonstrate the speedup and scalability of DFDD-ELD on a GPU cluster. With a careful GPU memory arrangement and thread allocation, we are able to achieve a significant speedup by utilizing GPUs in a message-passing interface (MPI)-based cluster environment. The same acceleration strategy can be applied to the acceleration of the discontinuous Galerkin time-domain (DGTD) algorithms.

Original languageEnglish (US)
Article number6832499
Pages (from-to)4706-4715
Number of pages10
JournalIEEE Transactions on Antennas and Propagation
Volume62
Issue number9
DOIs
StatePublished - Sep 1 2014

Fingerprint

Message passing
Decomposition
Domain decomposition methods
Graphics processing unit
Scalability
Finite element method
Data storage equipment

Keywords

  • Circuit analysis
  • Compute unified device architecture (CUDA)
  • Finite-element analysis
  • GPU cluster
  • Graphics processing unit (GPU)
  • High-performance computing
  • Message-passing interface (MPI)
  • Multi-GPU
  • Parallel programming
  • Radar cross section
  • Time-domain analysis

ASJC Scopus subject areas

  • Electrical and Electronic Engineering

Cite this

Acceleration of the dual-field domain decomposition algorithm using MPI-CUDA on large-scale computing systems. / Meng, Huan Ting; Jin, Jianming.

In: IEEE Transactions on Antennas and Propagation, Vol. 62, No. 9, 6832499, 01.09.2014, p. 4706-4715.

Research output: Contribution to journalArticle

@article{0e138e3fb7534b18a61b730279197a3a,
title = "Acceleration of the dual-field domain decomposition algorithm using MPI-CUDA on large-scale computing systems",
abstract = "It is well known that graphics processing units (GPUs) are able to accelerate highly parallelizable algorithms with a high speedup. However, for less-parallelizable algorithms such as the finite element method, novel schemes are needed to achieve a high speedup. In this paper, the dual-field domain decomposition (DFDD) method based on element-level decomposition (DFDD-ELD) is accelerated on a large GPU cluster. By using element-level subdomains, the DFDD-ELD computation can be easily mapped onto GPU's granular processors and is thus highly parallelizable. Various electromagnetic problems are simulated to demonstrate the speedup and scalability of DFDD-ELD on a GPU cluster. With a careful GPU memory arrangement and thread allocation, we are able to achieve a significant speedup by utilizing GPUs in a message-passing interface (MPI)-based cluster environment. The same acceleration strategy can be applied to the acceleration of the discontinuous Galerkin time-domain (DGTD) algorithms.",
keywords = "Circuit analysis, Compute unified device architecture (CUDA), Finite-element analysis, GPU cluster, Graphics processing unit (GPU), High-performance computing, Message-passing interface (MPI), Multi-GPU, Parallel programming, Radar cross section, Time-domain analysis",
author = "Meng, {Huan Ting} and Jianming Jin",
year = "2014",
month = "9",
day = "1",
doi = "10.1109/TAP.2014.2330608",
language = "English (US)",
volume = "62",
pages = "4706--4715",
journal = "IEEE Transactions on Antennas and Propagation",
issn = "0018-926X",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "9",

}

TY - JOUR

T1 - Acceleration of the dual-field domain decomposition algorithm using MPI-CUDA on large-scale computing systems

AU - Meng, Huan Ting

AU - Jin, Jianming

PY - 2014/9/1

Y1 - 2014/9/1

N2 - It is well known that graphics processing units (GPUs) are able to accelerate highly parallelizable algorithms with a high speedup. However, for less-parallelizable algorithms such as the finite element method, novel schemes are needed to achieve a high speedup. In this paper, the dual-field domain decomposition (DFDD) method based on element-level decomposition (DFDD-ELD) is accelerated on a large GPU cluster. By using element-level subdomains, the DFDD-ELD computation can be easily mapped onto GPU's granular processors and is thus highly parallelizable. Various electromagnetic problems are simulated to demonstrate the speedup and scalability of DFDD-ELD on a GPU cluster. With a careful GPU memory arrangement and thread allocation, we are able to achieve a significant speedup by utilizing GPUs in a message-passing interface (MPI)-based cluster environment. The same acceleration strategy can be applied to the acceleration of the discontinuous Galerkin time-domain (DGTD) algorithms.

AB - It is well known that graphics processing units (GPUs) are able to accelerate highly parallelizable algorithms with a high speedup. However, for less-parallelizable algorithms such as the finite element method, novel schemes are needed to achieve a high speedup. In this paper, the dual-field domain decomposition (DFDD) method based on element-level decomposition (DFDD-ELD) is accelerated on a large GPU cluster. By using element-level subdomains, the DFDD-ELD computation can be easily mapped onto GPU's granular processors and is thus highly parallelizable. Various electromagnetic problems are simulated to demonstrate the speedup and scalability of DFDD-ELD on a GPU cluster. With a careful GPU memory arrangement and thread allocation, we are able to achieve a significant speedup by utilizing GPUs in a message-passing interface (MPI)-based cluster environment. The same acceleration strategy can be applied to the acceleration of the discontinuous Galerkin time-domain (DGTD) algorithms.

KW - Circuit analysis

KW - Compute unified device architecture (CUDA)

KW - Finite-element analysis

KW - GPU cluster

KW - Graphics processing unit (GPU)

KW - High-performance computing

KW - Message-passing interface (MPI)

KW - Multi-GPU

KW - Parallel programming

KW - Radar cross section

KW - Time-domain analysis

UR - http://www.scopus.com/inward/record.url?scp=84913554871&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84913554871&partnerID=8YFLogxK

U2 - 10.1109/TAP.2014.2330608

DO - 10.1109/TAP.2014.2330608

M3 - Article

AN - SCOPUS:84913554871

VL - 62

SP - 4706

EP - 4715

JO - IEEE Transactions on Antennas and Propagation

JF - IEEE Transactions on Antennas and Propagation

SN - 0018-926X

IS - 9

M1 - 6832499

ER -