TY - GEN
T1 - Understanding the loss surface of neural networks for binary classification
AU - Liang, Shiyu
AU - Sun, Ruoyu
AU - Li, Yixuan
AU - Srikant, R.
N1 - Publisher Copyright:
© CURRAN-CONFERENCE. All rights reserved.
PY - 2018
Y1 - 2018
N2 - It is widely conjectured that training algorithms for neural networks are successful because all local minima lead to similar performance; for example, see (LeCun et al., 2015; Choromanska et al., 2015; Dauphin et al., 2014). Performance is typically measured in terms of two metrics: training performance and generalization performance. Here we focus on the training performance of neural networks for binary classification, and provide conditions under which the training error is zero at all local minima of appropriately chosen surrogate loss functions. Our conditions are roughly in the following form: the neurons have to be increasing and strictly convex, the neural network should either be single-layered or is multi-layered with a shortcutlike connection, and the surrogate loss function should be a smooth version of hinge loss. We also provide counterexamples to show that, when these conditions are relaxed, the result may not hold.
AB - It is widely conjectured that training algorithms for neural networks are successful because all local minima lead to similar performance; for example, see (LeCun et al., 2015; Choromanska et al., 2015; Dauphin et al., 2014). Performance is typically measured in terms of two metrics: training performance and generalization performance. Here we focus on the training performance of neural networks for binary classification, and provide conditions under which the training error is zero at all local minima of appropriately chosen surrogate loss functions. Our conditions are roughly in the following form: the neurons have to be increasing and strictly convex, the neural network should either be single-layered or is multi-layered with a shortcutlike connection, and the surrogate loss function should be a smooth version of hinge loss. We also provide counterexamples to show that, when these conditions are relaxed, the result may not hold.
UR - http://www.scopus.com/inward/record.url?scp=85057286525&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85057286525&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85057286525
T3 - 35th International Conference on Machine Learning, ICML 2018
SP - 4420
EP - 4429
BT - 35th International Conference on Machine Learning, ICML 2018
A2 - Dy, Jennifer
A2 - Krause, Andreas
PB - International Machine Learning Society (IMLS)
T2 - 35th International Conference on Machine Learning, ICML 2018
Y2 - 10 July 2018 through 15 July 2018
ER -