Carlo Baldassi — Università Bocconi - Milano, INFN - sezione di Torino # Neural networks optimization and dense states # The problem of training neural networks is in general non-convex and in principle computationally hard. This, however, does not seem to be a problem in practice, as many fairly greedy heuristics based on variants of Stochastic Gradient Descent are routinely employed with surprisingly good results by the machine learning community in real-life scenarios. Starting from a large-deviation analysis of the simplest non-convex neural network, the discrete perceptron, we developed a series of analytical and numerical results which reveal the existence of rare dense regions of the optimization landscape that have a number of highly desirable properties. In particular, they are wide, easily accessible minima with good generalization properties. The analysis allowed us to develop a large number of algorithms which are able to exploit the existence of these states in a variety of models, including state-of-the art neural networks trained on real data, stochastic processes and quantum annealing devices. Overall, these results appear to be rather general with respect to the details of the underlying model and of the data, and may be relevant biologically and technologically, as well as apply to other inference and constraint satisfaction problems.