A Formal Characterization of Activation Functions in Deep Neural Networks

Massi Amrouche, Dusan M. Stipanovic

Research output: Contribution to journalArticlepeer-review

Abstract

In this article, a mathematical formulation for describing and designing activation functions in deep neural networks is provided. The methodology is based on a precise characterization of the desired activation functions that satisfy particular criteria, including circumventing vanishing or exploding gradients during training. The problem of finding desired activation functions is formulated as an infinite-dimensional optimization problem, which is later relaxed to solving a partial differential equation. Furthermore, bounds that guarantee the optimality of the designed activation function are provided. Relevant examples with some state-of-the-art activation functions are provided to illustrate the methodology.

Original languageEnglish (US)
Pages (from-to)2153-2166
Number of pages14
JournalIEEE Transactions on Neural Networks and Learning Systems
Volume35
Issue number2
DOIs
StatePublished - Feb 1 2024

Keywords

  • Artificial neural networks
  • deep learning
  • feed-forward neural networks
  • partial differential equations (PDEs)

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence
  • Computer Networks and Communications
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'A Formal Characterization of Activation Functions in Deep Neural Networks'. Together they form a unique fingerprint.

Cite this