TY - GEN
T1 - Google workloads for consumer devices
T2 - 23rd International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2018
AU - Boroumand, Amirali
AU - Ghose, Saugata
AU - Kim, Youngsok
AU - Ausavarungnirun, Rachata
AU - Shiu, Eric
AU - Thakur, Rahul
AU - Kim, Daehyun
AU - Kuusela, Aki
AU - Knies, Allan
AU - Ranganathan, Parthasarathy
AU - Mutlu, Onur
N1 - Funding Information:
This research started at Google, during Amirali Boroumand’s extended internship and Onur Mutlu’s stay as a visiting researcher. We thank our industrial partners, especially Google, Huawei, Intel, and VMware, for their generous support. This research was partially supported by the Semiconductor Research Corporation.
Publisher Copyright:
© 2018 Association for Computing Machinery.
Copyright:
Copyright 2019 Elsevier B.V., All rights reserved.
PY - 2018/3/19
Y1 - 2018/3/19
N2 - We are experiencing an explosive growth in the number of consumer devices, including smartphones, tablets, web-based computers such as Chromebooks, and wearable devices. For this class of devices, energy efficiency is a first-class concern due to the limited battery capacity and thermal power budget. We find that data movement is a major contributor to the total system energy and execution time in consumer devices. The energy and performance costs of moving data between the memory system and the compute units are significantly higher than the costs of computation. As a result, addressing data movement is crucial for consumer devices. In this work, we comprehensively analyze the energy and performance impact of data movement for several widely-used Google consumer workloads: (1) the Chrome web browser; (2) TensorFlow Mobile, Google's machine learning framework; (3) video playback, and (4) video capture, both of which are used in many video services such as YouTube and Google Hangouts. We find that processing-in-memory (PIM) can significantly reduce data movement for all of these workloads, by performing part of the computation close to memory. Each workload contains simple primitives and functions that contribute to a significant amount of the overall data movement. We investigate whether these primitives and functions are feasible to implement using PIM, given the limited area and power constraints of consumer devices. Our analysis shows that offloading these primitives to PIM logic, consisting of either simple cores or specialized accelerators, eliminates a large amount of data movement, and significantly reduces total system energy (by an average of 55.4% across the workloads) and execution time (by an average of 54.2%).
AB - We are experiencing an explosive growth in the number of consumer devices, including smartphones, tablets, web-based computers such as Chromebooks, and wearable devices. For this class of devices, energy efficiency is a first-class concern due to the limited battery capacity and thermal power budget. We find that data movement is a major contributor to the total system energy and execution time in consumer devices. The energy and performance costs of moving data between the memory system and the compute units are significantly higher than the costs of computation. As a result, addressing data movement is crucial for consumer devices. In this work, we comprehensively analyze the energy and performance impact of data movement for several widely-used Google consumer workloads: (1) the Chrome web browser; (2) TensorFlow Mobile, Google's machine learning framework; (3) video playback, and (4) video capture, both of which are used in many video services such as YouTube and Google Hangouts. We find that processing-in-memory (PIM) can significantly reduce data movement for all of these workloads, by performing part of the computation close to memory. Each workload contains simple primitives and functions that contribute to a significant amount of the overall data movement. We investigate whether these primitives and functions are feasible to implement using PIM, given the limited area and power constraints of consumer devices. Our analysis shows that offloading these primitives to PIM logic, consisting of either simple cores or specialized accelerators, eliminates a large amount of data movement, and significantly reduces total system energy (by an average of 55.4% across the workloads) and execution time (by an average of 54.2%).
KW - Consumer workloads
KW - Data movement
KW - Energy efficiency
KW - Memory systems
KW - Processing-in-memory
UR - http://www.scopus.com/inward/record.url?scp=85060114163&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85060114163&partnerID=8YFLogxK
U2 - 10.1145/3173162.3173177
DO - 10.1145/3173162.3173177
M3 - Conference contribution
AN - SCOPUS:85060114163
VL - 53
T3 - International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS
SP - 316
EP - 331
BT - Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2018
PB - Association for Computing Machinery
Y2 - 24 March 2018 through 28 March 2018
ER -