Application-Transparent near-memory processing architecture with memory channel network

Mohammad Alian, Seung Won Min, Hadi Asgharimoghaddam, Ashutosh Dhar, Dong Kai Wang, Thomas Roewer, Adam McPadden, Oliver O'Halloran, Deming Chen, Jinjun Xiong, Daehoon Kim, Wen-Mei W Hwu, Nam Sung Kim

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The physical memory capacity of servers is expected to increase drastically with deployment of the forthcoming non-volatile memory technologies. This is a welcomed improvement for emerging data-intensive applications. For such servers to be cost-effective, nonetheless, we must cost-effectively increase compute throughput and memory bandwidth commensurate with the increase in memory capacity without compromising application readiness. Tackling this challenge, we present Memory Channel Network (MCN) architecture in this paper. Specifically, first, we propose an MCN DIMM, an extension of a buffered DIMM where a small but capable processor called MCN processor is integrated with a buffer device on the DIMM for near-memory processing. Second, we implement device drivers to give the host and MCN processors in a server an illusion that they are independent heterogeneous nodes connected through an Ethernet link. These allow the host and MCN processors in a server to run a given data-intensive application together based on popular distributed computing frameworks such as MPI and Spark without any change in the host processor hardware and its application software, while offering the benefits of high-bandwidth and low-latency communications between the host and the MCN processors over memory channels. As such, MCN can serve as an application-Transparent framework which can seamlessly unify near-memory processing within a server and distributed computing across such servers for data-intensive applications. Our simulation running the full software stack shows that a server with 8 MCN DIMMs offers 4.56X higher throughput and consume 47.5% less energy than a cluster with 9 conventional nodes connected through Ethernet links, as it facilitates up to 8.17X higher aggregate DRAM bandwidth utilization. Lastly, we demonstrate the feasibility of MCN with an IBM POWER8 system and an experimental buffered DIMM.

Original languageEnglish (US)
Title of host publicationProceedings - 51st Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2018
PublisherIEEE Computer Society
Pages802-814
Number of pages13
ISBN (Electronic)9781538662403
DOIs
StatePublished - Dec 12 2018
Event51st Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2018 - Fukuoka, Japan
Duration: Oct 20 2018Oct 24 2018

Publication series

NameProceedings of the Annual International Symposium on Microarchitecture, MICRO
Volume2018-October
ISSN (Print)1072-4451

Other

Other51st Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2018
CountryJapan
CityFukuoka
Period10/20/1810/24/18

Fingerprint

Computer networks
Data storage equipment
Processing
Servers
Distributed computer systems
Ethernet
Bandwidth
Throughput
Dynamic random access storage
Network architecture
Electric sparks
Application programs
Computer hardware
Costs

Keywords

  • Application Transparent
  • Buffer Device
  • DRAM
  • Distributed Systems
  • Ethernet
  • Memory Channel
  • Mobile Processors
  • Near Memory Processing
  • Processing In Memory
  • TCP IP

ASJC Scopus subject areas

  • Hardware and Architecture

Cite this

Alian, M., Min, S. W., Asgharimoghaddam, H., Dhar, A., Wang, D. K., Roewer, T., ... Kim, N. S. (2018). Application-Transparent near-memory processing architecture with memory channel network. In Proceedings - 51st Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2018 (pp. 802-814). [8574587] (Proceedings of the Annual International Symposium on Microarchitecture, MICRO; Vol. 2018-October). IEEE Computer Society. https://doi.org/10.1109/MICRO.2018.00070

Application-Transparent near-memory processing architecture with memory channel network. / Alian, Mohammad; Min, Seung Won; Asgharimoghaddam, Hadi; Dhar, Ashutosh; Wang, Dong Kai; Roewer, Thomas; McPadden, Adam; O'Halloran, Oliver; Chen, Deming; Xiong, Jinjun; Kim, Daehoon; Hwu, Wen-Mei W; Kim, Nam Sung.

Proceedings - 51st Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2018. IEEE Computer Society, 2018. p. 802-814 8574587 (Proceedings of the Annual International Symposium on Microarchitecture, MICRO; Vol. 2018-October).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Alian, M, Min, SW, Asgharimoghaddam, H, Dhar, A, Wang, DK, Roewer, T, McPadden, A, O'Halloran, O, Chen, D, Xiong, J, Kim, D, Hwu, W-MW & Kim, NS 2018, Application-Transparent near-memory processing architecture with memory channel network. in Proceedings - 51st Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2018., 8574587, Proceedings of the Annual International Symposium on Microarchitecture, MICRO, vol. 2018-October, IEEE Computer Society, pp. 802-814, 51st Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2018, Fukuoka, Japan, 10/20/18. https://doi.org/10.1109/MICRO.2018.00070
Alian M, Min SW, Asgharimoghaddam H, Dhar A, Wang DK, Roewer T et al. Application-Transparent near-memory processing architecture with memory channel network. In Proceedings - 51st Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2018. IEEE Computer Society. 2018. p. 802-814. 8574587. (Proceedings of the Annual International Symposium on Microarchitecture, MICRO). https://doi.org/10.1109/MICRO.2018.00070
Alian, Mohammad ; Min, Seung Won ; Asgharimoghaddam, Hadi ; Dhar, Ashutosh ; Wang, Dong Kai ; Roewer, Thomas ; McPadden, Adam ; O'Halloran, Oliver ; Chen, Deming ; Xiong, Jinjun ; Kim, Daehoon ; Hwu, Wen-Mei W ; Kim, Nam Sung. / Application-Transparent near-memory processing architecture with memory channel network. Proceedings - 51st Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2018. IEEE Computer Society, 2018. pp. 802-814 (Proceedings of the Annual International Symposium on Microarchitecture, MICRO).
@inproceedings{3672cdf34fa3471bae21f998e2374002,
title = "Application-Transparent near-memory processing architecture with memory channel network",
abstract = "The physical memory capacity of servers is expected to increase drastically with deployment of the forthcoming non-volatile memory technologies. This is a welcomed improvement for emerging data-intensive applications. For such servers to be cost-effective, nonetheless, we must cost-effectively increase compute throughput and memory bandwidth commensurate with the increase in memory capacity without compromising application readiness. Tackling this challenge, we present Memory Channel Network (MCN) architecture in this paper. Specifically, first, we propose an MCN DIMM, an extension of a buffered DIMM where a small but capable processor called MCN processor is integrated with a buffer device on the DIMM for near-memory processing. Second, we implement device drivers to give the host and MCN processors in a server an illusion that they are independent heterogeneous nodes connected through an Ethernet link. These allow the host and MCN processors in a server to run a given data-intensive application together based on popular distributed computing frameworks such as MPI and Spark without any change in the host processor hardware and its application software, while offering the benefits of high-bandwidth and low-latency communications between the host and the MCN processors over memory channels. As such, MCN can serve as an application-Transparent framework which can seamlessly unify near-memory processing within a server and distributed computing across such servers for data-intensive applications. Our simulation running the full software stack shows that a server with 8 MCN DIMMs offers 4.56X higher throughput and consume 47.5{\%} less energy than a cluster with 9 conventional nodes connected through Ethernet links, as it facilitates up to 8.17X higher aggregate DRAM bandwidth utilization. Lastly, we demonstrate the feasibility of MCN with an IBM POWER8 system and an experimental buffered DIMM.",
keywords = "Application Transparent, Buffer Device, DRAM, Distributed Systems, Ethernet, Memory Channel, Mobile Processors, Near Memory Processing, Processing In Memory, TCP IP",
author = "Mohammad Alian and Min, {Seung Won} and Hadi Asgharimoghaddam and Ashutosh Dhar and Wang, {Dong Kai} and Thomas Roewer and Adam McPadden and Oliver O'Halloran and Deming Chen and Jinjun Xiong and Daehoon Kim and Hwu, {Wen-Mei W} and Kim, {Nam Sung}",
year = "2018",
month = "12",
day = "12",
doi = "10.1109/MICRO.2018.00070",
language = "English (US)",
series = "Proceedings of the Annual International Symposium on Microarchitecture, MICRO",
publisher = "IEEE Computer Society",
pages = "802--814",
booktitle = "Proceedings - 51st Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2018",

}

TY - GEN

T1 - Application-Transparent near-memory processing architecture with memory channel network

AU - Alian, Mohammad

AU - Min, Seung Won

AU - Asgharimoghaddam, Hadi

AU - Dhar, Ashutosh

AU - Wang, Dong Kai

AU - Roewer, Thomas

AU - McPadden, Adam

AU - O'Halloran, Oliver

AU - Chen, Deming

AU - Xiong, Jinjun

AU - Kim, Daehoon

AU - Hwu, Wen-Mei W

AU - Kim, Nam Sung

PY - 2018/12/12

Y1 - 2018/12/12

N2 - The physical memory capacity of servers is expected to increase drastically with deployment of the forthcoming non-volatile memory technologies. This is a welcomed improvement for emerging data-intensive applications. For such servers to be cost-effective, nonetheless, we must cost-effectively increase compute throughput and memory bandwidth commensurate with the increase in memory capacity without compromising application readiness. Tackling this challenge, we present Memory Channel Network (MCN) architecture in this paper. Specifically, first, we propose an MCN DIMM, an extension of a buffered DIMM where a small but capable processor called MCN processor is integrated with a buffer device on the DIMM for near-memory processing. Second, we implement device drivers to give the host and MCN processors in a server an illusion that they are independent heterogeneous nodes connected through an Ethernet link. These allow the host and MCN processors in a server to run a given data-intensive application together based on popular distributed computing frameworks such as MPI and Spark without any change in the host processor hardware and its application software, while offering the benefits of high-bandwidth and low-latency communications between the host and the MCN processors over memory channels. As such, MCN can serve as an application-Transparent framework which can seamlessly unify near-memory processing within a server and distributed computing across such servers for data-intensive applications. Our simulation running the full software stack shows that a server with 8 MCN DIMMs offers 4.56X higher throughput and consume 47.5% less energy than a cluster with 9 conventional nodes connected through Ethernet links, as it facilitates up to 8.17X higher aggregate DRAM bandwidth utilization. Lastly, we demonstrate the feasibility of MCN with an IBM POWER8 system and an experimental buffered DIMM.

AB - The physical memory capacity of servers is expected to increase drastically with deployment of the forthcoming non-volatile memory technologies. This is a welcomed improvement for emerging data-intensive applications. For such servers to be cost-effective, nonetheless, we must cost-effectively increase compute throughput and memory bandwidth commensurate with the increase in memory capacity without compromising application readiness. Tackling this challenge, we present Memory Channel Network (MCN) architecture in this paper. Specifically, first, we propose an MCN DIMM, an extension of a buffered DIMM where a small but capable processor called MCN processor is integrated with a buffer device on the DIMM for near-memory processing. Second, we implement device drivers to give the host and MCN processors in a server an illusion that they are independent heterogeneous nodes connected through an Ethernet link. These allow the host and MCN processors in a server to run a given data-intensive application together based on popular distributed computing frameworks such as MPI and Spark without any change in the host processor hardware and its application software, while offering the benefits of high-bandwidth and low-latency communications between the host and the MCN processors over memory channels. As such, MCN can serve as an application-Transparent framework which can seamlessly unify near-memory processing within a server and distributed computing across such servers for data-intensive applications. Our simulation running the full software stack shows that a server with 8 MCN DIMMs offers 4.56X higher throughput and consume 47.5% less energy than a cluster with 9 conventional nodes connected through Ethernet links, as it facilitates up to 8.17X higher aggregate DRAM bandwidth utilization. Lastly, we demonstrate the feasibility of MCN with an IBM POWER8 system and an experimental buffered DIMM.

KW - Application Transparent

KW - Buffer Device

KW - DRAM

KW - Distributed Systems

KW - Ethernet

KW - Memory Channel

KW - Mobile Processors

KW - Near Memory Processing

KW - Processing In Memory

KW - TCP IP

UR - http://www.scopus.com/inward/record.url?scp=85060012121&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85060012121&partnerID=8YFLogxK

U2 - 10.1109/MICRO.2018.00070

DO - 10.1109/MICRO.2018.00070

M3 - Conference contribution

T3 - Proceedings of the Annual International Symposium on Microarchitecture, MICRO

SP - 802

EP - 814

BT - Proceedings - 51st Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2018

PB - IEEE Computer Society

ER -