TY - GEN
T1 - Orthogonal weight normalization
T2 - 32nd AAAI Conference on Artificial Intelligence, AAAI 2018
AU - Huang, Lei
AU - Liu, Xianglong
AU - Lang, Bo
AU - Yu, Adams Wei
AU - Wang, Yongliang
AU - Li, Bo
N1 - Publisher Copyright:
Copyright © 2018, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2018
Y1 - 2018
N2 - Orthogonal matrix has shown advantages in training Recurrent Neural Networks (RNNs), but such matrix is limited to be square for the hidden-to-hidden transformation in RNNs. In this paper, we generalize such square orthogonal matrix to orthogonal rectangular matrix and formulating this problem in feed-forward Neural Networks (FNNs) as Optimization over Multiple Dependent Stiefel Manifolds (OMDSM). We show that the orthogonal rectangular matrix can stabilize the distribution of network activations and regularize FNNs. We propose a novel orthogonal weight normalization method to solve OMDSM. Particularly, it constructs orthogonal transformation over proxy parameters to ensure the weight matrix is orthogonal. To guarantee stability, we minimize the distortions between proxy parameters and canonical weights over all tractable orthogonal transformations. In addition, we design orthogonal linear module (OLM) to learn orthogonal filter banks in practice, which can be used as an alternative to standard linear module. Extensive experiments demonstrate that by simply substituting OLM for standard linear module without revising any experimental protocols, our method improves the performance of the state-of-the-art networks, including Inception and residual networks on CIFAR and ImageNet datasets.
AB - Orthogonal matrix has shown advantages in training Recurrent Neural Networks (RNNs), but such matrix is limited to be square for the hidden-to-hidden transformation in RNNs. In this paper, we generalize such square orthogonal matrix to orthogonal rectangular matrix and formulating this problem in feed-forward Neural Networks (FNNs) as Optimization over Multiple Dependent Stiefel Manifolds (OMDSM). We show that the orthogonal rectangular matrix can stabilize the distribution of network activations and regularize FNNs. We propose a novel orthogonal weight normalization method to solve OMDSM. Particularly, it constructs orthogonal transformation over proxy parameters to ensure the weight matrix is orthogonal. To guarantee stability, we minimize the distortions between proxy parameters and canonical weights over all tractable orthogonal transformations. In addition, we design orthogonal linear module (OLM) to learn orthogonal filter banks in practice, which can be used as an alternative to standard linear module. Extensive experiments demonstrate that by simply substituting OLM for standard linear module without revising any experimental protocols, our method improves the performance of the state-of-the-art networks, including Inception and residual networks on CIFAR and ImageNet datasets.
UR - http://www.scopus.com/inward/record.url?scp=85060467656&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85060467656&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85060467656
T3 - 32nd AAAI Conference on Artificial Intelligence, AAAI 2018
SP - 3271
EP - 3278
BT - 32nd AAAI Conference on Artificial Intelligence, AAAI 2018
PB - American Association for Artificial Intelligence (AAAI) Press
Y2 - 2 February 2018 through 7 February 2018
ER -