Javier Sagastuy-Brena

Javier Sagastuy-Brena

PhD Candidate at Stanford University




I am a PhD Candidate at the Institute for Computational and Mathematical Engineering at Stanford University. I joined the Stanford Neuroscience and Artificial Intelligence Laboratory with P.I. Dan Yamins in September 2018 widely interested in understanding how the human brain works using computational models. I’ve worked in projects on recurrent models of the visual system, biologically-inspired learning rules and deep learning theory. More recently I’ve been working on understanding how to measure similarity betweeen individuals’ neural responses and how to map between them. I am also interested in the application and safe deployment of AI systems in the real world.

Before starting grad school, I spent two years working at a Mexican FinTech startup, teaching Computer Science, and doing research in machine learning for text mining. My non-academic interests include alpine skiing, cycling, hiking, cooking, and an ever-increasing obsession with coffee.

sagas [at] hey [dot] com


  • Artificial Intelligence
  • Computational Neuroscience


  • PhD in Computational and Mathematical Engineering, 2023

    Stanford University

  • MSc in Computational and Mathematical Engineering, 2019

    Stanford University

  • BSc in Computer Engineering, 2015

    Instituto Tecnológico Autónomo de México

  • BSc in Applied Mathematics, 2015

    Instituto Tecnológico Autónomo de México

Recent Publications

Inter-animal transforms as a guide to model-brain comparison

Accurately measuring similarity between different animals’ neural responses is a crucial step towards evaluating deep neural network …

Recurrent Connections in the Primate Ventral Visual Stream Mediate a Trade-Off Between Task Performance and Network Size During Core Object Recognition

The computational role of the abundant feedback connections in the ventral visual stream is unclear, enabling humans and nonhuman …

Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion

In this work we explore the limiting dynamics of deep neural networks trained with stochastic gradient descent (SGD). As observed …

Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics

Predicting the dynamics of neural network parameters during training is one of the key challenges in building a theoretical foundation …

Two Routes to Scalable Credit Assignment without Weight Symmetry

The neural plausibility of backpropagation has long been disputed, primarily for its use of non-local weight transport - the …

I’ve just read

Mini-reviews to help me track and remember papers I read

I got this idea from a lab mate’s website and thought I’d do something similar, though perhaps not as nice. His project eventually evolved into Paper a Week, which is just awesome.

Generalized Shape Metrics on Neural Representations

TL;DR This paper highlights issues that can occur when represtation similarity metrics are not *metrics* in the mathematical sense. The authors formulate novel metrics based on previous approaches, by getting them to satisfy the triangle inequality, including one geared specifically for CNNs. They demonstrate theis methods are effective and scalable to large numbers of specimens.

Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks

TL;DR The authors present two main results: a thorough mathematical analysis on how SGD performs variational inference and what its steady state behavior looks like: limit cycles. They present empirical quantities similar to the ones we have measured and analyze those compared to the null of Brownian motion.

A Variational Analysis of Stochastic Gradient Algorithms

TL;DR Rethinking SGD in the limit of continuous time yields valuable insight, particularly on hyperparameter tuning. This paper introduces the SDE derivation in the previously reviewed 'Three Factors' paper, and elaborates on the minimization of the KL divergence of the stationary distribution of the underlying OU process and the target posterior (and as such relies on the Bayesian view on ML algorithms, rather than the optimization view).



PhD Intern. Autonomous Vehicle Perception.


Jun 2021 – Dec 2021 California

SWE Intern


Jun 2018 – Sep 2018 California

Computer Science Teacher

Modern American School

Jul 2016 – Aug 2017 CDMX, Mexico
Taught Object Oriented Programming in C#.

Tech Lead. Full stack developer.


Aug 2015 – Aug 2017 CDMX, Mexico

Research Intern


Mar 2015 – Aug 2015 Böblingen, Germany