Abstract
Recently there has been much interest in understanding why deep neural networks are preferred to shallow networks. We show that, for a large class of piecewise smooth functions, the number of neurons needed by a shallow network to approximate a function is exponentially larger than the corresponding number of neurons needed by a deep network for a given degree of function approximation. First, we consider univariate functions on a bounded interval and require a neural network to achieve an approximation error of e uniformly over the interval. We show that shallow networks (i.e., networks whose depth does not depend on ε) require Ω(poly(1/e)) neurons while deep networks (i.e., networks whose depth grows with 1/e) require O(polylog(1/e)) neurons. We then extend these results to certain classes of important multivariate functions. Our results are derived for neural networks which use a combination of rectifier linear units (ReLUs) and binary step units, two of the most popular type of activation functions. Our analysis builds on a simple observation: the multiplication of two bits can be represented by a ReLU.
Original language | English (US) |
---|---|
State | Published - 2017 |
Event | 5th International Conference on Learning Representations, ICLR 2017 - Toulon, France Duration: Apr 24 2017 → Apr 26 2017 |
Conference
Conference | 5th International Conference on Learning Representations, ICLR 2017 |
---|---|
Country/Territory | France |
City | Toulon |
Period | 4/24/17 → 4/26/17 |
ASJC Scopus subject areas
- Education
- Computer Science Applications
- Linguistics and Language
- Language and Linguistics