TY - CONF
T1 - ON THE SCALABILITY AND MEMORY EFFICIENCY OF SEMIDEFINITE PROGRAMS FOR LIPSCHITZ CONSTANT ESTIMATION OF NEURAL NETWORKS
T2 - 12th International Conference on Learning Representations, ICLR 2024
AU - Wang, Zi
AU - Hu, Bin
AU - Havens, Aaron J.
AU - Araujo, Alexandre
AU - Zheng, Yang
AU - Chen, Yudong
AU - Jha, Somesh
N1 - Z. Wang and S. Jha are partially supported by DARPA under agreement number 885000, NSF CCFFMiTF-1836978 and ONR N00014-21-1-2492. A. Havens and B. Hu are generously supported by the NSF award CAREER-2048168 and the AFOSR award FA9550-23-1-0732. Y. Zheng is partially supported by NSF ECCS-2154650 and NSF CMMI-2320697. Y. Chen is partially supported by NSF CCF-1704828 and NSF CCF-2233152.
PY - 2024
Y1 - 2024
N2 - Lipschitz constant estimation plays an important role in understanding generalization, robustness, and fairness in deep learning. Unlike naive bounds based on the network weight norm product, semidefinite programs (SDPs) have shown great promise in providing less conservative Lipschitz bounds with polynomial-time complexity guarantees. However, due to the memory consumption and running speed, standard SDP algorithms cannot scale to modern neural network architectures. In this paper, we transform the SDPs for Lipschitz constant estimation into an eigenvalue optimization problem, which aligns with the modern large-scale optimization paradigms based on first-order methods. This is amenable to autodiff frameworks such as PyTorch and TensorFlow, requiring significantly less memory than standard SDP algorithms. The transformation also allows us to leverage various existing numerical techniques for eigenvalue optimization, opening the way for further memory improvement and computational speedup. The essential technique of our eigenvalue-problem transformation is to introduce redundant quadratic constraints and then utilize both Lagrangian and Shor's SDP relaxations under a certain trace constraint. Notably, our numerical study successfully scales the SDP-based Lipschitz constant estimation to address large neural networks on ImageNet. Our numerical examples on CIFAR10 and ImageNet demonstrate that our technique is more scalable than existing approaches. Our code is available at https://github.com/z1w/LipDiff.
AB - Lipschitz constant estimation plays an important role in understanding generalization, robustness, and fairness in deep learning. Unlike naive bounds based on the network weight norm product, semidefinite programs (SDPs) have shown great promise in providing less conservative Lipschitz bounds with polynomial-time complexity guarantees. However, due to the memory consumption and running speed, standard SDP algorithms cannot scale to modern neural network architectures. In this paper, we transform the SDPs for Lipschitz constant estimation into an eigenvalue optimization problem, which aligns with the modern large-scale optimization paradigms based on first-order methods. This is amenable to autodiff frameworks such as PyTorch and TensorFlow, requiring significantly less memory than standard SDP algorithms. The transformation also allows us to leverage various existing numerical techniques for eigenvalue optimization, opening the way for further memory improvement and computational speedup. The essential technique of our eigenvalue-problem transformation is to introduce redundant quadratic constraints and then utilize both Lagrangian and Shor's SDP relaxations under a certain trace constraint. Notably, our numerical study successfully scales the SDP-based Lipschitz constant estimation to address large neural networks on ImageNet. Our numerical examples on CIFAR10 and ImageNet demonstrate that our technique is more scalable than existing approaches. Our code is available at https://github.com/z1w/LipDiff.
UR - http://www.scopus.com/inward/record.url?scp=85200610435&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85200610435&partnerID=8YFLogxK
M3 - Paper
AN - SCOPUS:85200610435
Y2 - 7 May 2024 through 11 May 2024
ER -