Theory

Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion

In this work we explore the limiting dynamics of deep neural networks trained with stochastic gradient descent (SGD). As observed previously, long after performance has converged, networks continue to move through parameter space by a process of …

Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics

Predicting the dynamics of neural network parameters during training is one of the key challenges in building a theoretical foundation for deep learning. A central obstacle is that the motion of a network in high-dimensional parameter space undergoes …

Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks

Review Summary Variantional inference Proof that SGD minimizes a potential along with an entropic regularization term. However, this potential differs from the loss used to compute backpropagation gradients. They are only equal if the gradient noise were isotropic (i.

Theory

Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion

Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics

Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks

A Variational Analysis of Stochastic Gradient Algorithms

Three Factors Influencing Minima in SGD

Spherical Motion Dynamics of Deep Neural Networks with Batch Normalization and Weight Decay