### Abstract

This paper is concerned with the problem of representing and learning a linear transformation using a linear neural network. In recent years, there is a growing interest in the study of such networks, in part due to the successes of deep learning. The main question of this body of research (and also of our paper) is related to the existence and optimality properties of the critical points of the mean-squared loss function. An additional primary concern of our paper pertains to the robustness of these critical points in the face of (a small amount of) regularization. An optimal control model is introduced for this purpose and a learning algorithm (backprop with weight decay) derived for the same using the Hamilton's formulation of optimal control. The formulation is used to provide a complete characterization of the critical points in terms of the solutions of a nonlinear matrix-valued equation, referred to as the characteristic equation. Analytical and numerical tools from bifurcation theory are used to compute the critical points via the solutions of the characteristic equation.

Original language | English (US) |
---|---|

Pages (from-to) | 2503-2513 |

Number of pages | 11 |

Journal | Advances in Neural Information Processing Systems |

Volume | 2017-December |

State | Published - Jan 1 2017 |

Event | 31st Annual Conference on Neural Information Processing Systems, NIPS 2017 - Long Beach, United States Duration: Dec 4 2017 → Dec 9 2017 |

### Fingerprint

### ASJC Scopus subject areas

- Computer Networks and Communications
- Information Systems
- Signal Processing

### Cite this

*Advances in Neural Information Processing Systems*,

*2017-December*, 2503-2513.

**How regularization affects the critical points in linear networks.** / Taghvaei, Amirhossein; Kim, Jin W.; Mehta, Prashant Girdharilal.

Research output: Contribution to journal › Conference article

*Advances in Neural Information Processing Systems*, vol. 2017-December, pp. 2503-2513.

}

TY - JOUR

T1 - How regularization affects the critical points in linear networks

AU - Taghvaei, Amirhossein

AU - Kim, Jin W.

AU - Mehta, Prashant Girdharilal

PY - 2017/1/1

Y1 - 2017/1/1

N2 - This paper is concerned with the problem of representing and learning a linear transformation using a linear neural network. In recent years, there is a growing interest in the study of such networks, in part due to the successes of deep learning. The main question of this body of research (and also of our paper) is related to the existence and optimality properties of the critical points of the mean-squared loss function. An additional primary concern of our paper pertains to the robustness of these critical points in the face of (a small amount of) regularization. An optimal control model is introduced for this purpose and a learning algorithm (backprop with weight decay) derived for the same using the Hamilton's formulation of optimal control. The formulation is used to provide a complete characterization of the critical points in terms of the solutions of a nonlinear matrix-valued equation, referred to as the characteristic equation. Analytical and numerical tools from bifurcation theory are used to compute the critical points via the solutions of the characteristic equation.

AB - This paper is concerned with the problem of representing and learning a linear transformation using a linear neural network. In recent years, there is a growing interest in the study of such networks, in part due to the successes of deep learning. The main question of this body of research (and also of our paper) is related to the existence and optimality properties of the critical points of the mean-squared loss function. An additional primary concern of our paper pertains to the robustness of these critical points in the face of (a small amount of) regularization. An optimal control model is introduced for this purpose and a learning algorithm (backprop with weight decay) derived for the same using the Hamilton's formulation of optimal control. The formulation is used to provide a complete characterization of the critical points in terms of the solutions of a nonlinear matrix-valued equation, referred to as the characteristic equation. Analytical and numerical tools from bifurcation theory are used to compute the critical points via the solutions of the characteristic equation.

UR - http://www.scopus.com/inward/record.url?scp=85047002841&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85047002841&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85047002841

VL - 2017-December

SP - 2503

EP - 2513

JO - Advances in Neural Information Processing Systems

JF - Advances in Neural Information Processing Systems

SN - 1049-5258

ER -