High-throughput Ant Colony Optimization on graphics processing units

José M. Cecilia, Antonio Llanes, José L. Abellán, Juan Gómez-Luna, Li Wen Chang, Wen-Mei W Hwu

Research output: Contribution to journalArticle

Abstract

Nowadays, computer researchers can face ever-complex scientific problems by using a hardware and software co-design. One successful approach is exploring novel massively-parallel Natural-inspired algorithms, such as the Ant Colony Optimization (ACO) algorithm, through the exploitation of high-throughput accelerators such as GPUs, which are designed to provide high levels of parallelism and low Energy per instruction (EP) cost through heavy vectorization. In this paper, we demonstrate how to take advantage of contemporary hardware-based CUDA vectorization to optimize the ACO algorithm when applied to the Traveling Salesman Problem (TSP). Several parallel designs are proposed and analyzed on two different CUDA architectures. Our results reveal that our vectorization approaches can obtain good performance on these architectures. Moreover, atomic operations are under study showing good benefits on latest generations of CUDA architectures. This work lays the groundwork for future developments of ACO algorithm on high-performance platforms.

Original languageEnglish (US)
Pages (from-to)261-274
Number of pages14
JournalJournal of Parallel and Distributed Computing
Volume113
DOIs
StatePublished - Mar 2018

Fingerprint

Vectorization
Ant colony optimization
Graphics Processing Unit
High Throughput
Optimization Algorithm
Throughput
Hardware
Co-design
Travelling salesman problems
Accelerator
Exploitation
Parallelism
Traveling salesman problem
High Performance
Optimise
Particle accelerators
Software
Costs
Energy
Demonstrate

Keywords

  • ACO
  • Agnostic vectorization
  • Atomic operations
  • GPUs
  • TSP

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture
  • Computer Networks and Communications
  • Artificial Intelligence

Cite this

High-throughput Ant Colony Optimization on graphics processing units. / Cecilia, José M.; Llanes, Antonio; Abellán, José L.; Gómez-Luna, Juan; Chang, Li Wen; Hwu, Wen-Mei W.

In: Journal of Parallel and Distributed Computing, Vol. 113, 03.2018, p. 261-274.

Research output: Contribution to journalArticle

Cecilia, José M. ; Llanes, Antonio ; Abellán, José L. ; Gómez-Luna, Juan ; Chang, Li Wen ; Hwu, Wen-Mei W. / High-throughput Ant Colony Optimization on graphics processing units. In: Journal of Parallel and Distributed Computing. 2018 ; Vol. 113. pp. 261-274.
@article{93958de864e24a3eaf7fe29994994943,
title = "High-throughput Ant Colony Optimization on graphics processing units",
abstract = "Nowadays, computer researchers can face ever-complex scientific problems by using a hardware and software co-design. One successful approach is exploring novel massively-parallel Natural-inspired algorithms, such as the Ant Colony Optimization (ACO) algorithm, through the exploitation of high-throughput accelerators such as GPUs, which are designed to provide high levels of parallelism and low Energy per instruction (EP) cost through heavy vectorization. In this paper, we demonstrate how to take advantage of contemporary hardware-based CUDA vectorization to optimize the ACO algorithm when applied to the Traveling Salesman Problem (TSP). Several parallel designs are proposed and analyzed on two different CUDA architectures. Our results reveal that our vectorization approaches can obtain good performance on these architectures. Moreover, atomic operations are under study showing good benefits on latest generations of CUDA architectures. This work lays the groundwork for future developments of ACO algorithm on high-performance platforms.",
keywords = "ACO, Agnostic vectorization, Atomic operations, GPUs, TSP",
author = "Cecilia, {Jos{\'e} M.} and Antonio Llanes and Abell{\'a}n, {Jos{\'e} L.} and Juan G{\'o}mez-Luna and Chang, {Li Wen} and Hwu, {Wen-Mei W}",
year = "2018",
month = "3",
doi = "10.1016/j.jpdc.2017.12.002",
language = "English (US)",
volume = "113",
pages = "261--274",
journal = "Journal of Parallel and Distributed Computing",
issn = "0743-7315",
publisher = "Academic Press Inc.",

}

TY - JOUR

T1 - High-throughput Ant Colony Optimization on graphics processing units

AU - Cecilia, José M.

AU - Llanes, Antonio

AU - Abellán, José L.

AU - Gómez-Luna, Juan

AU - Chang, Li Wen

AU - Hwu, Wen-Mei W

PY - 2018/3

Y1 - 2018/3

N2 - Nowadays, computer researchers can face ever-complex scientific problems by using a hardware and software co-design. One successful approach is exploring novel massively-parallel Natural-inspired algorithms, such as the Ant Colony Optimization (ACO) algorithm, through the exploitation of high-throughput accelerators such as GPUs, which are designed to provide high levels of parallelism and low Energy per instruction (EP) cost through heavy vectorization. In this paper, we demonstrate how to take advantage of contemporary hardware-based CUDA vectorization to optimize the ACO algorithm when applied to the Traveling Salesman Problem (TSP). Several parallel designs are proposed and analyzed on two different CUDA architectures. Our results reveal that our vectorization approaches can obtain good performance on these architectures. Moreover, atomic operations are under study showing good benefits on latest generations of CUDA architectures. This work lays the groundwork for future developments of ACO algorithm on high-performance platforms.

AB - Nowadays, computer researchers can face ever-complex scientific problems by using a hardware and software co-design. One successful approach is exploring novel massively-parallel Natural-inspired algorithms, such as the Ant Colony Optimization (ACO) algorithm, through the exploitation of high-throughput accelerators such as GPUs, which are designed to provide high levels of parallelism and low Energy per instruction (EP) cost through heavy vectorization. In this paper, we demonstrate how to take advantage of contemporary hardware-based CUDA vectorization to optimize the ACO algorithm when applied to the Traveling Salesman Problem (TSP). Several parallel designs are proposed and analyzed on two different CUDA architectures. Our results reveal that our vectorization approaches can obtain good performance on these architectures. Moreover, atomic operations are under study showing good benefits on latest generations of CUDA architectures. This work lays the groundwork for future developments of ACO algorithm on high-performance platforms.

KW - ACO

KW - Agnostic vectorization

KW - Atomic operations

KW - GPUs

KW - TSP

UR - http://www.scopus.com/inward/record.url?scp=85039856989&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85039856989&partnerID=8YFLogxK

U2 - 10.1016/j.jpdc.2017.12.002

DO - 10.1016/j.jpdc.2017.12.002

M3 - Article

AN - SCOPUS:85039856989

VL - 113

SP - 261

EP - 274

JO - Journal of Parallel and Distributed Computing

JF - Journal of Parallel and Distributed Computing

SN - 0743-7315

ER -