Automatically mapping code on an intelligent memory architecture

Jaejin Lee, Yan Solihin, Josep Torrellas

Research output: Contribution to conferencePaper

Abstract

This paper presents an algorithm to automatically map code on a generic intelligent memory system that consists of a host processor and a simpler memory processor. To achieve high performance with this type of architecture, code needs to be partitioned and scheduled such that each section is assigned to the processor on which it runs most efficiently. In addition, the two processors should overlap their execution as much as possible. With our algorithm, applications are mapped fully automatically using both static and dynamic information. Using a set of standard applications and a simulated architecture, we show average speedups of 1.7 for numerical applications and 1.2 for non-numerical applications over a single host with plain memory. The speedups are very close and often higher than ideal speedups on a more expensive multiprocessor system composed of two identical host processors. Our work shows that heterogeneity can be cost-effectively exploited and represents one step toward effectively mapping code on intelligent memory systems.

Original languageEnglish (US)
Pages121-132
Number of pages12
StatePublished - Jan 1 2001
Event7th International Symposium on High-Performance Computer Architecture - Nuevo Leon, Mex
Duration: Oct 20 2000Oct 24 2000

Other

Other7th International Symposium on High-Performance Computer Architecture
CityNuevo Leon, Mex
Period10/20/0010/24/00

Fingerprint

Memory architecture
Data storage equipment
Costs

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Lee, J., Solihin, Y., & Torrellas, J. (2001). Automatically mapping code on an intelligent memory architecture. 121-132. Paper presented at 7th International Symposium on High-Performance Computer Architecture, Nuevo Leon, Mex, .

Automatically mapping code on an intelligent memory architecture. / Lee, Jaejin; Solihin, Yan; Torrellas, Josep.

2001. 121-132 Paper presented at 7th International Symposium on High-Performance Computer Architecture, Nuevo Leon, Mex, .

Research output: Contribution to conferencePaper

Lee, J, Solihin, Y & Torrellas, J 2001, 'Automatically mapping code on an intelligent memory architecture' Paper presented at 7th International Symposium on High-Performance Computer Architecture, Nuevo Leon, Mex, 10/20/00 - 10/24/00, pp. 121-132.
Lee J, Solihin Y, Torrellas J. Automatically mapping code on an intelligent memory architecture. 2001. Paper presented at 7th International Symposium on High-Performance Computer Architecture, Nuevo Leon, Mex, .
Lee, Jaejin ; Solihin, Yan ; Torrellas, Josep. / Automatically mapping code on an intelligent memory architecture. Paper presented at 7th International Symposium on High-Performance Computer Architecture, Nuevo Leon, Mex, .12 p.
@conference{fdf303956116418199341401a573c06b,
title = "Automatically mapping code on an intelligent memory architecture",
abstract = "This paper presents an algorithm to automatically map code on a generic intelligent memory system that consists of a host processor and a simpler memory processor. To achieve high performance with this type of architecture, code needs to be partitioned and scheduled such that each section is assigned to the processor on which it runs most efficiently. In addition, the two processors should overlap their execution as much as possible. With our algorithm, applications are mapped fully automatically using both static and dynamic information. Using a set of standard applications and a simulated architecture, we show average speedups of 1.7 for numerical applications and 1.2 for non-numerical applications over a single host with plain memory. The speedups are very close and often higher than ideal speedups on a more expensive multiprocessor system composed of two identical host processors. Our work shows that heterogeneity can be cost-effectively exploited and represents one step toward effectively mapping code on intelligent memory systems.",
author = "Jaejin Lee and Yan Solihin and Josep Torrellas",
year = "2001",
month = "1",
day = "1",
language = "English (US)",
pages = "121--132",
note = "7th International Symposium on High-Performance Computer Architecture ; Conference date: 20-10-2000 Through 24-10-2000",

}

TY - CONF

T1 - Automatically mapping code on an intelligent memory architecture

AU - Lee, Jaejin

AU - Solihin, Yan

AU - Torrellas, Josep

PY - 2001/1/1

Y1 - 2001/1/1

N2 - This paper presents an algorithm to automatically map code on a generic intelligent memory system that consists of a host processor and a simpler memory processor. To achieve high performance with this type of architecture, code needs to be partitioned and scheduled such that each section is assigned to the processor on which it runs most efficiently. In addition, the two processors should overlap their execution as much as possible. With our algorithm, applications are mapped fully automatically using both static and dynamic information. Using a set of standard applications and a simulated architecture, we show average speedups of 1.7 for numerical applications and 1.2 for non-numerical applications over a single host with plain memory. The speedups are very close and often higher than ideal speedups on a more expensive multiprocessor system composed of two identical host processors. Our work shows that heterogeneity can be cost-effectively exploited and represents one step toward effectively mapping code on intelligent memory systems.

AB - This paper presents an algorithm to automatically map code on a generic intelligent memory system that consists of a host processor and a simpler memory processor. To achieve high performance with this type of architecture, code needs to be partitioned and scheduled such that each section is assigned to the processor on which it runs most efficiently. In addition, the two processors should overlap their execution as much as possible. With our algorithm, applications are mapped fully automatically using both static and dynamic information. Using a set of standard applications and a simulated architecture, we show average speedups of 1.7 for numerical applications and 1.2 for non-numerical applications over a single host with plain memory. The speedups are very close and often higher than ideal speedups on a more expensive multiprocessor system composed of two identical host processors. Our work shows that heterogeneity can be cost-effectively exploited and represents one step toward effectively mapping code on intelligent memory systems.

UR - http://www.scopus.com/inward/record.url?scp=0034818341&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0034818341&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:0034818341

SP - 121

EP - 132

ER -