Federico Milanesio &mdash; PhD student - University of Turin #
Understanding Geometric Compression in Neural Network Dynamics #
Neural networks are the most used algorithm in modern machine learning and have achieved incredible performance on different tasks. However,  their development relies not on a deep theoretical understanding but on trial and error. One of the main difficulties of understanding NNs lies in the problematic nature of their training dynamics, which seeks optima in a very high-dimensional rough landscape. At the same time, the models avoid becoming trapped in suboptimal minima and converge to points with good generalization, thus escaping the so-called curse of dimensionality.  Intuitively, in regression tasks, the optimal representation of the data would be a low-dimensional manifold in which the points align, allowing for easy linear regression. We introduce a entropy-based measure to capture geometric compression and investigate this prediction. Our observations during training reveal a non-monotonic behavior, namely a first compression phase, followed by a subsequent decompression phase where our measure increases again. Such a result is remarkable and unobserved in regression NNs. We prove that this behavior is a property of feature learning, and is quite general for changes in hyperparameters and for different datasets. We hypothesize that the epoch of inversion is when the network has learned to predict the best linear regression possible, and the subsequent decompression phase may indicate a phase of generalization, in which the model becomes more flexible and needs representations decompressed to accommodate more complex functions. This behavior aligns with current literature suggesting that neural networks learn the data distribution's moments sequentially. We theorize that inversion happens when the network has learned the first two moments and starts learning from moments of higher order.