TY - GEN
T1 - AcMC2
T2 - 24th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2019
AU - Banerjee, Subho S.
AU - Kalbarczyk, Zbigniew T.
AU - Iyer, Ravishankar K.
N1 - Publisher Copyright:
© 2019 Association for Computing Machinery.
PY - 2019/4/4
Y1 - 2019/4/4
N2 - Probabilistic models (PMs) are ubiquitously used across a variety of machine learning applications. They have been shown to successfully integrate structural prior information about data and effectively quantify uncertainty to enable the development of more powerful, interpretable, and efficient learning algorithms. This paper presents AcMC2, a compiler that transforms PMs into optimized hardware accelerators (for use in FPGAs or ASICs) that utilize Markov chain Monte Carlo methods to infer and query a distribution of posterior samples from the model. The compiler analyzes statistical dependencies in the PM to drive several optimizations to maximally exploit the parallelism and data locality available in the problem. We demonstrate the use of AcMC2 to implement several learning and inference tasks on a Xilinx Virtex-7 FPGA. AcMC2-generated accelerators provide a 47 - 100× improvement in runtime performance over a 6-core IBM Power8 CPU and a 8 - 18× improvement over an NVIDIA K80 GPU. This corresponds to a 753 - 1600× improvement over the CPU and 248 - 463× over the GPU in performance-per-watt terms.
AB - Probabilistic models (PMs) are ubiquitously used across a variety of machine learning applications. They have been shown to successfully integrate structural prior information about data and effectively quantify uncertainty to enable the development of more powerful, interpretable, and efficient learning algorithms. This paper presents AcMC2, a compiler that transforms PMs into optimized hardware accelerators (for use in FPGAs or ASICs) that utilize Markov chain Monte Carlo methods to infer and query a distribution of posterior samples from the model. The compiler analyzes statistical dependencies in the PM to drive several optimizations to maximally exploit the parallelism and data locality available in the problem. We demonstrate the use of AcMC2 to implement several learning and inference tasks on a Xilinx Virtex-7 FPGA. AcMC2-generated accelerators provide a 47 - 100× improvement in runtime performance over a 6-core IBM Power8 CPU and a 8 - 18× improvement over an NVIDIA K80 GPU. This corresponds to a 753 - 1600× improvement over the CPU and 248 - 463× over the GPU in performance-per-watt terms.
KW - Accelerator
KW - Markov Chain Monte Carlo
KW - Probabilistic Graphical Models
KW - Probabilistic Programming
UR - http://www.scopus.com/inward/record.url?scp=85064637402&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85064637402&partnerID=8YFLogxK
U2 - 10.1145/3297858.3304019
DO - 10.1145/3297858.3304019
M3 - Conference contribution
AN - SCOPUS:85064637402
T3 - International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS
SP - 515
EP - 528
BT - ASPLOS 2019 - 24th International Conference on Architectural Support for Programming Languages and Operating Systems
PB - Association for Computing Machinery
Y2 - 13 April 2019 through 17 April 2019
ER -