Speeding up irregular applications in shared-memory multiprocessors: memory binding and group prefetching

Zheng Zhang, Josep Torrellas

Research output: Contribution to conferencePaper

Abstract

While many parallel applications exhibit good spatial locality, other important codes in areas like graph problem-solving or CAD do not. Often, these irregular codes contain small records accessed via pointers. Consequently, while the former applications benefit from long cache lines, the latter prefer short lines. One good solution is to combine short lines with prefetching. In this way, each application can exploit the amount of spatial locality that it has. However, prefetching, if provided, should also work for the irregular codes. This paper presents a new prefetching scheme that, while usable by regular applications, is specifically targeted to irregular ones: Memory Binding and Group Prefetching. The idea is to hardware-bind and prefetch together groups of data that the programmer suggests are strongly related to each other. Examples are the different fields in a record or two records linked by a permanent pointer. This prefetching scheme, combined with short cache lines, results in a memory hierarchy design that can be exploited by both regular and irregular applications. Overall, it is better to use a system with short lines (16-32 bytes) and our prefetching than a system with long lines (128 bytes) with or without our prefetching. The former system runs 6 out of 7 Splash-class applications faster. In particular, some of the most irregular applications run 25-40% faster.

Original languageEnglish (US)
Pages188-199
Number of pages12
StatePublished - Jan 1 1995
EventProceedings of the 22nd Annual International Symposium on Computer Architecture - Santa Margherita Ligure, Italy
Duration: Jun 22 1995Jun 24 1995

Other

OtherProceedings of the 22nd Annual International Symposium on Computer Architecture
CitySanta Margherita Ligure, Italy
Period6/22/956/24/95

Fingerprint

Data storage equipment
Computer aided design
Hardware

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Zhang, Z., & Torrellas, J. (1995). Speeding up irregular applications in shared-memory multiprocessors: memory binding and group prefetching. 188-199. Paper presented at Proceedings of the 22nd Annual International Symposium on Computer Architecture, Santa Margherita Ligure, Italy, .

Speeding up irregular applications in shared-memory multiprocessors : memory binding and group prefetching. / Zhang, Zheng; Torrellas, Josep.

1995. 188-199 Paper presented at Proceedings of the 22nd Annual International Symposium on Computer Architecture, Santa Margherita Ligure, Italy, .

Research output: Contribution to conferencePaper

Zhang, Z & Torrellas, J 1995, 'Speeding up irregular applications in shared-memory multiprocessors: memory binding and group prefetching' Paper presented at Proceedings of the 22nd Annual International Symposium on Computer Architecture, Santa Margherita Ligure, Italy, 6/22/95 - 6/24/95, pp. 188-199.
Zhang Z, Torrellas J. Speeding up irregular applications in shared-memory multiprocessors: memory binding and group prefetching. 1995. Paper presented at Proceedings of the 22nd Annual International Symposium on Computer Architecture, Santa Margherita Ligure, Italy, .
Zhang, Zheng ; Torrellas, Josep. / Speeding up irregular applications in shared-memory multiprocessors : memory binding and group prefetching. Paper presented at Proceedings of the 22nd Annual International Symposium on Computer Architecture, Santa Margherita Ligure, Italy, .12 p.
@conference{7fc09749db614beba2016588ecb1908c,
title = "Speeding up irregular applications in shared-memory multiprocessors: memory binding and group prefetching",
abstract = "While many parallel applications exhibit good spatial locality, other important codes in areas like graph problem-solving or CAD do not. Often, these irregular codes contain small records accessed via pointers. Consequently, while the former applications benefit from long cache lines, the latter prefer short lines. One good solution is to combine short lines with prefetching. In this way, each application can exploit the amount of spatial locality that it has. However, prefetching, if provided, should also work for the irregular codes. This paper presents a new prefetching scheme that, while usable by regular applications, is specifically targeted to irregular ones: Memory Binding and Group Prefetching. The idea is to hardware-bind and prefetch together groups of data that the programmer suggests are strongly related to each other. Examples are the different fields in a record or two records linked by a permanent pointer. This prefetching scheme, combined with short cache lines, results in a memory hierarchy design that can be exploited by both regular and irregular applications. Overall, it is better to use a system with short lines (16-32 bytes) and our prefetching than a system with long lines (128 bytes) with or without our prefetching. The former system runs 6 out of 7 Splash-class applications faster. In particular, some of the most irregular applications run 25-40{\%} faster.",
author = "Zheng Zhang and Josep Torrellas",
year = "1995",
month = "1",
day = "1",
language = "English (US)",
pages = "188--199",
note = "Proceedings of the 22nd Annual International Symposium on Computer Architecture ; Conference date: 22-06-1995 Through 24-06-1995",

}

TY - CONF

T1 - Speeding up irregular applications in shared-memory multiprocessors

T2 - memory binding and group prefetching

AU - Zhang, Zheng

AU - Torrellas, Josep

PY - 1995/1/1

Y1 - 1995/1/1

N2 - While many parallel applications exhibit good spatial locality, other important codes in areas like graph problem-solving or CAD do not. Often, these irregular codes contain small records accessed via pointers. Consequently, while the former applications benefit from long cache lines, the latter prefer short lines. One good solution is to combine short lines with prefetching. In this way, each application can exploit the amount of spatial locality that it has. However, prefetching, if provided, should also work for the irregular codes. This paper presents a new prefetching scheme that, while usable by regular applications, is specifically targeted to irregular ones: Memory Binding and Group Prefetching. The idea is to hardware-bind and prefetch together groups of data that the programmer suggests are strongly related to each other. Examples are the different fields in a record or two records linked by a permanent pointer. This prefetching scheme, combined with short cache lines, results in a memory hierarchy design that can be exploited by both regular and irregular applications. Overall, it is better to use a system with short lines (16-32 bytes) and our prefetching than a system with long lines (128 bytes) with or without our prefetching. The former system runs 6 out of 7 Splash-class applications faster. In particular, some of the most irregular applications run 25-40% faster.

AB - While many parallel applications exhibit good spatial locality, other important codes in areas like graph problem-solving or CAD do not. Often, these irregular codes contain small records accessed via pointers. Consequently, while the former applications benefit from long cache lines, the latter prefer short lines. One good solution is to combine short lines with prefetching. In this way, each application can exploit the amount of spatial locality that it has. However, prefetching, if provided, should also work for the irregular codes. This paper presents a new prefetching scheme that, while usable by regular applications, is specifically targeted to irregular ones: Memory Binding and Group Prefetching. The idea is to hardware-bind and prefetch together groups of data that the programmer suggests are strongly related to each other. Examples are the different fields in a record or two records linked by a permanent pointer. This prefetching scheme, combined with short cache lines, results in a memory hierarchy design that can be exploited by both regular and irregular applications. Overall, it is better to use a system with short lines (16-32 bytes) and our prefetching than a system with long lines (128 bytes) with or without our prefetching. The former system runs 6 out of 7 Splash-class applications faster. In particular, some of the most irregular applications run 25-40% faster.

UR - http://www.scopus.com/inward/record.url?scp=0029179453&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0029179453&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:0029179453

SP - 188

EP - 199

ER -