Improving Neural ODE Training with Temporal Adaptive Batch Normalization

Su Zheng, Zhengqi Gao, Fan Keng Sun, Duane S. Boning, Bei Yu, Martin Wong

Research output: Contribution to journalConference articlepeer-review

Abstract

Neural ordinary differential equations (Neural ODEs) is a family of continuous-depth neural networks where the evolution of hidden states is governed by learnable temporal derivatives. We identify a significant limitation in applying traditional Batch Normalization (BN) to Neural ODEs, due to a fundamental mismatch - BN was initially designed for discrete neural networks with no temporal dimension, whereas Neural ODEs operate continuously over time. To bridge this gap, we introduce temporal adaptive Batch Normalization (TA-BN), a novel technique that acts as the continuous-time analog to traditional BN. Our empirical findings reveal that TA-BN enables the stacking of more layers within Neural ODEs, enhancing their performance. Moreover, when confined to a model architecture consisting of a single Neural ODE followed by a linear layer, TA-BN achieves 91.1% test accuracy on CIFAR-10 with 2.2 million parameters, making it the first unmixed Neural ODE architecture to approach MobileNetV2-level parameter efficiency. Extensive numerical experiments on image classification and physical system modeling substantiate the superiority of TA-BN compared to baseline methods.

Original languageEnglish (US)
JournalAdvances in Neural Information Processing Systems
Volume37
StatePublished - 2024
Externally publishedYes
Event38th Conference on Neural Information Processing Systems, NeurIPS 2024 - Vancouver, Canada
Duration: Dec 9 2024Dec 15 2024

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'Improving Neural ODE Training with Temporal Adaptive Batch Normalization'. Together they form a unique fingerprint.

Cite this