TY - GEN
T1 - Rethinking Controllable Variational Autoencoders
AU - Shao, Huajie
AU - Yang, Yifei
AU - Lin, Haohong
AU - Lin, Longzhong
AU - Chen, Yizhuo
AU - Yang, Qinmin
AU - Zhao, Han
N1 - This paper aimed to develop a deep understanding of ControlVAE for disentangled representation learning. From information bottleneck theory, we offered an explanation about why it performs well on disentanglement learning via stabilizing the output KL-divergence to different set points. Then we theoretically derived a lower bound of the set point for the target KL-divergence. It was further validated via conducting extensive experiments. In order to evolve the output KL-divergence smoothly along a good trajectory, we further proposed a novel model, Dynamic-VAE, for better disentanglement learning. Specifically, we leveraged an incremental PI controller, moving average and a hybrid annealing to stabilize the KL-divergence to separate the disentanglement learning and reconstruction optimization. We further theoretically prove the stability of the proposed method. The evaluation results demonstrate DynamicVAE can significantly improve the reconstruction accuracy meanwhile achieving better disentanglement than ControlVAE and the other baselines. Acknowledgement This work was supported by 2022 Pre-Tenure Faculty Summer Grant at William and Mary.
PY - 2022
Y1 - 2022
N2 - The Controllable Variational Autoencoder (ControlVAE) combines automatic control theory with the basic VAE model to manipulate the KL-divergence for overcoming posterior collapse and learning disentangled representations. It has shown success in a variety of applications, such as image generation, disentangled representation learning, and language modeling. However, when it comes to disentangled representation learning, ControlVAE does not delve into the rationale behind it. The goal of this paper is to develop a deeper understanding of ControlVAE in learning disentangled representations, including the choice of a desired KL-divergence (i.e, set point), and its stability during training. We first fundamentally explain its ability to disentangle latent variables from an information bottleneck perspective. We show that KL-divergence is an upper bound of the variational information bottleneck. By controlling the KL-divergence gradually from a small value to a target value, ControlVAE can disentangle the latent factors one by one. Based on this finding, we propose a new DynamicVAE that leverages a modified incremental PI (proportionalintegral) controller, a variant of the proportional-integralderivative (PID) algorithm, and employs a moving average as well as a hybrid annealing method to evolve the value of KL-divergence smoothly in a tightly controlled fashion. In addition, we analytically derive a lower bound of the set point for disentangling. We then theoretically prove the stability of the proposed approach. Evaluation results on multiple benchmark datasets demonstrate that DynamicVAE achieves a good trade-off between the disentanglement and reconstruction quality. We also discover that it can separate disentangled representation learning and re-construction via manipulating the desired KL-divergence.
AB - The Controllable Variational Autoencoder (ControlVAE) combines automatic control theory with the basic VAE model to manipulate the KL-divergence for overcoming posterior collapse and learning disentangled representations. It has shown success in a variety of applications, such as image generation, disentangled representation learning, and language modeling. However, when it comes to disentangled representation learning, ControlVAE does not delve into the rationale behind it. The goal of this paper is to develop a deeper understanding of ControlVAE in learning disentangled representations, including the choice of a desired KL-divergence (i.e, set point), and its stability during training. We first fundamentally explain its ability to disentangle latent variables from an information bottleneck perspective. We show that KL-divergence is an upper bound of the variational information bottleneck. By controlling the KL-divergence gradually from a small value to a target value, ControlVAE can disentangle the latent factors one by one. Based on this finding, we propose a new DynamicVAE that leverages a modified incremental PI (proportionalintegral) controller, a variant of the proportional-integralderivative (PID) algorithm, and employs a moving average as well as a hybrid annealing method to evolve the value of KL-divergence smoothly in a tightly controlled fashion. In addition, we analytically derive a lower bound of the set point for disentangling. We then theoretically prove the stability of the proposed approach. Evaluation results on multiple benchmark datasets demonstrate that DynamicVAE achieves a good trade-off between the disentanglement and reconstruction quality. We also discover that it can separate disentangled representation learning and re-construction via manipulating the desired KL-divergence.
KW - Explainable computer vision
KW - Machine learning
KW - Representation learning
UR - http://www.scopus.com/inward/record.url?scp=85141773017&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85141773017&partnerID=8YFLogxK
U2 - 10.1109/CVPR52688.2022.01865
DO - 10.1109/CVPR52688.2022.01865
M3 - Conference contribution
AN - SCOPUS:85141773017
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 19228
EP - 19237
BT - Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
PB - IEEE Computer Society
T2 - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
Y2 - 19 June 2022 through 24 June 2022
ER -