Optimization for Neural Operators can Benefit from Width

Research output: Contribution to journalConference articlepeer-review

Abstract

Neural Operators that directly learn mappings between function spaces, such as Deep Operator Networks (DONs) and Fourier Neural Operators (FNOs), have received considerable attention. Despite the universal approximation guarantees for DONs and FNOs, there is currently no optimization convergence guarantee for learning such networks using gradient descent (GD). In this paper, we address this open problem by presenting a unified framework for optimization based on GD and applying it to establish convergence guarantees for both DONs and FNOs. In particular, we show that the losses associated with both of these neural operators satisfy two conditions—restricted strong convexity (RSC) and smoothness—that guarantee a decrease on their loss values due to GD. Remarkably, these two conditions are satisfied for each neural operator due to different reasons associated with the architectural differences of the respective models. One takeaway that emerges from the theory is that wider networks benefit optimization convergence guarantees for both DONs and FNOs. We present empirical results on canonical operator learning problems to support our theoretical results and find that larger widths benefit training.

Original languageEnglish (US)
Pages (from-to)10994-11041
Number of pages48
JournalProceedings of Machine Learning Research
Volume267
StatePublished - 2025
Event42nd International Conference on Machine Learning, ICML 2025 - Vancouver, Canada
Duration: Jul 13 2025Jul 19 2025

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Statistics and Probability
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Optimization for Neural Operators can Benefit from Width'. Together they form a unique fingerprint.

Cite this