Carlo Baldassi &mdash; Universit&agrave; Bocconi - Milano, INFN - sezione di Torino #
Neural networks optimization and dense states #
The problem of training neural networks is in general non-convex and in
principle computationally hard. This, however, does not seem to be a problem in
practice, as many fairly greedy heuristics based on variants of Stochastic
Gradient Descent are routinely employed with surprisingly good results by the
machine learning community in real-life scenarios.

Starting from a large-deviation analysis of the simplest non-convex neural
network, the discrete perceptron, we developed a series of analytical and
numerical results which reveal the existence of rare dense regions of the
optimization landscape that have a number of highly desirable properties. In
particular, they are wide, easily accessible minima with good generalization
properties. The analysis allowed us to develop a large number of algorithms
which are able to exploit the existence of these states in a variety of models,
including state-of-the art neural networks trained on real data, stochastic
processes and quantum annealing devices. Overall, these results appear to be
rather general with respect to the details of the underlying model and of the
data, and may be relevant biologically and technologically, as well as apply to
other inference and constraint satisfaction problems.