TY - GEN
T1 - GANAX
T2 - 45th ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2018
AU - Yazdanbakhsh, Amir
AU - Samadi, Kambiz
AU - Kim, Nam Sung
AU - Esmaeilzadeh, Hadi
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/7/19
Y1 - 2018/7/19
N2 - Generative Adversarial Networks (GANs) are one of the most recent deep learning models that generate synthetic data from limited genuine datasets. GANs are on the frontier as further extension of deep learning into many domains (e.g., medicine, robotics, content synthesis) requires massive sets of labeled data that is generally either unavailable or prohibitively costly to collect. Although GANs are gaining prominence in various fields, there are no accelerators for these new models. In fact, GANs leverage a new operator, called transposed convolution, that exposes unique challenges for hardware acceleration. This operator first inserts zeros within the multidimensional input, then convolves a kernel over this expanded array to add information to the embedded zeros. Even though there is a convolution stage in this operator, the inserted zeros lead to underutilization of the compute resources when a conventional convolution accelerator is employed. We propose the GANAX architecture to alleviate the sources of inefficiency associated with the acceleration of GANs using conventional convolution accelerators, making the first GAN accelerator design possible. We propose a reorganization of the output computations to allocate compute rows with similar patterns of zeros to adjacent processing engines, which also avoids inconsequential multiply-adds on the zeros. This compulsory adjacency reclaims data reuse across these neighboring processing engines, which had otherwise diminished due to the inserted zeros. The reordering breaks the full SIMD execution model, which is prominent in convolution accelerators. Therefore, we propose a unified MIMD-SIMD design for GANAX that leverages repeated patterns in the computation to create distinct microprograms that execute concurrently in SIMD mode. The interleaving of MIMD and SIMD modes is performed at the granularity of single microprogrammed operation. To amortize the cost of MIMD execution, we propose a decoupling of data access from data processing in GANAX. This decoupling leads to a new design that breaks each processing engine to an access micro-engine and an execute micro-engine. The proposed architecture extends the concept of access-execute architectures to the finest granularity of computation for each individual operand. Evaluations with six GAN models shows, on average, 3.6× speedup and 3.1× energy savings over EYERISS without compromising the efficiency of conventional convolution accelerators. These benefits come with a mere ≈7.8% area increase. These results suggest that GANAX is an effective initial step that paves the way for accelerating the next generation of deep neural models.
AB - Generative Adversarial Networks (GANs) are one of the most recent deep learning models that generate synthetic data from limited genuine datasets. GANs are on the frontier as further extension of deep learning into many domains (e.g., medicine, robotics, content synthesis) requires massive sets of labeled data that is generally either unavailable or prohibitively costly to collect. Although GANs are gaining prominence in various fields, there are no accelerators for these new models. In fact, GANs leverage a new operator, called transposed convolution, that exposes unique challenges for hardware acceleration. This operator first inserts zeros within the multidimensional input, then convolves a kernel over this expanded array to add information to the embedded zeros. Even though there is a convolution stage in this operator, the inserted zeros lead to underutilization of the compute resources when a conventional convolution accelerator is employed. We propose the GANAX architecture to alleviate the sources of inefficiency associated with the acceleration of GANs using conventional convolution accelerators, making the first GAN accelerator design possible. We propose a reorganization of the output computations to allocate compute rows with similar patterns of zeros to adjacent processing engines, which also avoids inconsequential multiply-adds on the zeros. This compulsory adjacency reclaims data reuse across these neighboring processing engines, which had otherwise diminished due to the inserted zeros. The reordering breaks the full SIMD execution model, which is prominent in convolution accelerators. Therefore, we propose a unified MIMD-SIMD design for GANAX that leverages repeated patterns in the computation to create distinct microprograms that execute concurrently in SIMD mode. The interleaving of MIMD and SIMD modes is performed at the granularity of single microprogrammed operation. To amortize the cost of MIMD execution, we propose a decoupling of data access from data processing in GANAX. This decoupling leads to a new design that breaks each processing engine to an access micro-engine and an execute micro-engine. The proposed architecture extends the concept of access-execute architectures to the finest granularity of computation for each individual operand. Evaluations with six GAN models shows, on average, 3.6× speedup and 3.1× energy savings over EYERISS without compromising the efficiency of conventional convolution accelerators. These benefits come with a mere ≈7.8% area increase. These results suggest that GANAX is an effective initial step that paves the way for accelerating the next generation of deep neural models.
KW - Accelerators
KW - Access-execute Architecture
KW - CNN
KW - Convolution neural Networks
KW - DNN
KW - Dataflow
KW - Deep neural Networks
KW - GAN
KW - Generative adversarial Networks
KW - SIMD-MIMD
KW - Transpoed convolution
UR - http://www.scopus.com/inward/record.url?scp=85051425307&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85051425307&partnerID=8YFLogxK
U2 - 10.1109/ISCA.2018.00060
DO - 10.1109/ISCA.2018.00060
M3 - Conference contribution
AN - SCOPUS:85051425307
T3 - Proceedings - International Symposium on Computer Architecture
SP - 650
EP - 661
BT - Proceedings - 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture, ISCA 2018
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 2 June 2018 through 6 June 2018
ER -