TY - JOUR
T1 - Optimization for Neural Operators can Benefit from Width
AU - Cisneros-Velarde, Pedro
AU - Shrimali, Bhavesh
AU - Banerjee, Arindam
N1 - Part of the work was done when Pedro Cisneros-Velarde was affiliated with the University of Illinois Urbana-Champaign and was concluded during Pedro Cisneros-Velarde’s current affiliation. The work was supported in part by the National Science Foundation (NSF) through awards IIS 21-31335, OAC 21-30835, DBI 20-21898, as well as a C3.ai research award.
PY - 2025
Y1 - 2025
N2 - Neural Operators that directly learn mappings between function spaces, such as Deep Operator Networks (DONs) and Fourier Neural Operators (FNOs), have received considerable attention. Despite the universal approximation guarantees for DONs and FNOs, there is currently no optimization convergence guarantee for learning such networks using gradient descent (GD). In this paper, we address this open problem by presenting a unified framework for optimization based on GD and applying it to establish convergence guarantees for both DONs and FNOs. In particular, we show that the losses associated with both of these neural operators satisfy two conditions—restricted strong convexity (RSC) and smoothness—that guarantee a decrease on their loss values due to GD. Remarkably, these two conditions are satisfied for each neural operator due to different reasons associated with the architectural differences of the respective models. One takeaway that emerges from the theory is that wider networks benefit optimization convergence guarantees for both DONs and FNOs. We present empirical results on canonical operator learning problems to support our theoretical results and find that larger widths benefit training.
AB - Neural Operators that directly learn mappings between function spaces, such as Deep Operator Networks (DONs) and Fourier Neural Operators (FNOs), have received considerable attention. Despite the universal approximation guarantees for DONs and FNOs, there is currently no optimization convergence guarantee for learning such networks using gradient descent (GD). In this paper, we address this open problem by presenting a unified framework for optimization based on GD and applying it to establish convergence guarantees for both DONs and FNOs. In particular, we show that the losses associated with both of these neural operators satisfy two conditions—restricted strong convexity (RSC) and smoothness—that guarantee a decrease on their loss values due to GD. Remarkably, these two conditions are satisfied for each neural operator due to different reasons associated with the architectural differences of the respective models. One takeaway that emerges from the theory is that wider networks benefit optimization convergence guarantees for both DONs and FNOs. We present empirical results on canonical operator learning problems to support our theoretical results and find that larger widths benefit training.
UR - https://www.scopus.com/pages/publications/105023640918
UR - https://www.scopus.com/pages/publications/105023640918#tab=citedBy
M3 - Conference article
AN - SCOPUS:105023640918
SN - 2640-3498
VL - 267
SP - 10994
EP - 11041
JO - Proceedings of Machine Learning Research
JF - Proceedings of Machine Learning Research
T2 - 42nd International Conference on Machine Learning, ICML 2025
Y2 - 13 July 2025 through 19 July 2025
ER -