Accelerating large scale deep learning inference through DeepCPU at Microsoft

Minjia Zhang, Samyam Rajbandari, Wenhan Wang, Elton Zheng, Olatunji Ruwase, Jeff Rasley, Jason Li, Junhua Wang, Yuxiong He

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The application of deep learning models presents significant improvement to many Microsoft services and products. In this paper, we introduce our experience and methodology of developing and applying the DeepCPU library for serving DL models in production at large scale with remarkable latency improvement and infrastructure cost reduction. We describe two ways to use the library, through customized optimization or framework integration, targeting different scenarios.

Original languageEnglish (US)
Title of host publicationProceedings of the 2019 USENIX Conference on Operational Machine Learning, OpML 2019
PublisherUSENIX Association
Pages5-7
Number of pages3
ISBN (Electronic)9781939133007
StatePublished - 2019
Externally publishedYes
Event2019 USENIX Conference on Operational Machine Learning, OpML 2019 - Santa Clara, United States
Duration: May 20 2019 → …

Publication series

NameProceedings of the 2019 USENIX Conference on Operational Machine Learning, OpML 2019

Conference

Conference2019 USENIX Conference on Operational Machine Learning, OpML 2019
Country/TerritoryUnited States
CitySanta Clara
Period5/20/19 → …

ASJC Scopus subject areas

  • Computer Science Applications
  • Human-Computer Interaction

Fingerprint

Dive into the research topics of 'Accelerating large scale deep learning inference through DeepCPU at Microsoft'. Together they form a unique fingerprint.

Cite this