# Theory

## Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics

Predicting the dynamics of neural network parameters during training is one of the key challenges in building a theoretical foundation for deep learning. A central obstacle is that the motion of a network in high-dimensional parameter space undergoes …

## Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks

Review Summary Variantional inference Proof that SGD minimizes a potential along with an entropic regularization term. However, this potential differs from the loss used to compute backpropagation gradients. They are only equal if the gradient noise were isotropic (i.

## A Variational Analysis of Stochastic Gradient Algorithms

Review Summary The authors expand on their previous work on the continuous time-limit of SGD. They show how SGD with a constant LR can be modelled as an SDE that reaches a stationary distribution.

## Three Factors Influencing Minima in SGD

Review Summary SGD performs similarly for different batch sizes, but a constant LR/BS ratio. The authors note that SGD with the same LR/BS ratio are different discretizations of the same Stochastic Differential Equation.

## Spherical Motion Dynamics of Deep Neural Networks with Batch Normalization and Weight Decay

Review Summary Batch normalization induces scale invariance of the loss with respect to the weights (i.e. $L(x; \theta) = L(x; k \theta)$, for parameters $\theta$ with BN. This expression is not mathematically precise).