TY - GEN
T1 - TrIMS
T2 - 12th IEEE International Conference on Cloud Computing, CLOUD 2019
AU - Dakkak, Abdul
AU - Li, Cheng
AU - De Gonzalo, Simon Garcia
AU - Xiong, Jinjun
AU - Hwu, Wen Mei
N1 - Funding Information:
This work is supported by IBM-ILLINOIS Center for Cognitive Computing Systems Research (C3SR) - a research collaboration as part of the IBM Cognitive Horizon Network.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/7
Y1 - 2019/7
N2 - Deep neural networks (DNNs) have become core computation components within low latency Function as a Service (FaaS) prediction pipelines. Cloud computing, as the defacto backbone of modern computing infrastructure, has to be able to handle user-defined FaaS pipelines containing diverse DNN inference workloads while maintaining isolation and latency guarantees with minimal resource waste. The current solution for guaranteeing isolation and latency within FaaS is inefficient. A major cause of the inefficiency is the need to move large amount of data within and across servers. We propose TrIMS as a novel solution to address this issue. TrIMSis a generic memory sharing technique that enables constant data to be shared across processes or containers while still maintaining isolation between users. TrIMS consists of a persistent model store across the GPU, CPU, local storage, and cloud storage hierarchy, an efficient resource management layer that provides isolation, and a succinct set of abstracts, applicationAPIs, and container technologies for easy and transparent integration with FaaS, Deep Learning (DL) frameworks, and user code. We demonstrate our solution by interfacing TrIMS with the Apache MXNet framework and demonstrate up to 24x speedup in latency for image classification models, up to 210x speedup for large models, and up to8×system throughput improvement.
AB - Deep neural networks (DNNs) have become core computation components within low latency Function as a Service (FaaS) prediction pipelines. Cloud computing, as the defacto backbone of modern computing infrastructure, has to be able to handle user-defined FaaS pipelines containing diverse DNN inference workloads while maintaining isolation and latency guarantees with minimal resource waste. The current solution for guaranteeing isolation and latency within FaaS is inefficient. A major cause of the inefficiency is the need to move large amount of data within and across servers. We propose TrIMS as a novel solution to address this issue. TrIMSis a generic memory sharing technique that enables constant data to be shared across processes or containers while still maintaining isolation between users. TrIMS consists of a persistent model store across the GPU, CPU, local storage, and cloud storage hierarchy, an efficient resource management layer that provides isolation, and a succinct set of abstracts, applicationAPIs, and container technologies for easy and transparent integration with FaaS, Deep Learning (DL) frameworks, and user code. We demonstrate our solution by interfacing TrIMS with the Apache MXNet framework and demonstrate up to 24x speedup in latency for image classification models, up to 210x speedup for large models, and up to8×system throughput improvement.
KW - Cloud
KW - Inference
KW - Machine Learning
KW - Memory
UR - http://www.scopus.com/inward/record.url?scp=85072329545&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85072329545&partnerID=8YFLogxK
U2 - 10.1109/CLOUD.2019.00067
DO - 10.1109/CLOUD.2019.00067
M3 - Conference contribution
AN - SCOPUS:85072329545
T3 - IEEE International Conference on Cloud Computing, CLOUD
SP - 372
EP - 382
BT - Proceedings - 2019 IEEE International Conference on Cloud Computing, CLOUD 2019 - Part of the 2019 IEEE World Congress on Services
A2 - Bertino, Elisa
A2 - Chang, Carl K.
A2 - Chen, Peter
A2 - Damiani, Ernesto
A2 - Goul, Michael
A2 - Oyama, Katsunori
PB - IEEE Computer Society
Y2 - 8 July 2019 through 13 July 2019
ER -