Javier Sagastuy-Brena

PhD Candidate at Stanford University

Bio

I am a PhD Candidate at the Institute for Computational and Mathematical Engineering at Stanford University. I joined the Stanford Neuroscience and Artificial Intelligence Laboratory with P.I. Dan Yamins in September 2018 widely interested in understanding how the human brain works using computational models. I’ve worked in projects on recurrent models of the visual system, biologically-inspired learning rules and deep learning theory. More recently I’ve been working on understanding how to measure similarity betweeen individuals’ neural responses and how to map between them. I am also interested in the application and safe deployment of AI systems in the real world.

Before starting grad school, I spent two years working at a Mexican FinTech startup, teaching Computer Science, and doing research in machine learning for text mining. My non-academic interests include alpine skiing, cycling, hiking, cooking, and an ever-increasing obsession with coffee.

sagas [at] hey [dot] com

Interests

Artificial Intelligence
Computational Neuroscience

Education

PhD in Computational and Mathematical Engineering, 2023

Stanford University
MSc in Computational and Mathematical Engineering, 2019

Stanford University
BSc in Computer Engineering, 2015

Instituto Tecnológico Autónomo de México
BSc in Applied Mathematics, 2015

Instituto Tecnológico Autónomo de México

Recent Publications

Inter-animal transforms as a guide to model-brain comparison

Accurately measuring similarity between different animals’ neural responses is a crucial step towards evaluating deep neural network …

Javier Sagastuy-Brena, Imran Thobani, Aran Nayebi, Rosa Cao, Daniel LK Yamins

CCN 2022 Abstract CCN 2022 Manuscript Cosyne 2023 Abstract Cosyne 2023 Poster

Recurrent Connections in the Primate Ventral Visual Stream Mediate a Trade-Off Between Task Performance and Network Size During Core Object Recognition

The computational role of the abundant feedback connections in the ventral visual stream is unclear, enabling humans and nonhuman …

Aran Nayebi, Javier Sagastuy-Brena, Daniel M Bear, Kohitij Kar, Jonas Kubilius, Surya Ganguli, David Sussillo, James J DiCarlo, Daniel LK Yamins

Preprint PDF

Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion

In this work we explore the limiting dynamics of deep neural networks trained with stochastic gradient descent (SGD). As observed …

Daniel Kunin, Javier Sagastuy-Brena, Lauren Gillespie, Eshed Margalit, Hidenori Tanaka, Surya Ganguli, Daniel LK Yamins

Preprint PDF

Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics

Predicting the dynamics of neural network parameters during training is one of the key challenges in building a theoretical foundation …

Daniel Kunin, Javier Sagastuy-Brena, Surya Ganguli, Daniel LK Yamins, Hidenori Tanaka

Preprint PDF Code

Two Routes to Scalable Credit Assignment without Weight Symmetry

The neural plausibility of backpropagation has long been disputed, primarily for its use of non-local weight transport - the …

Daniel Kunin, Aran Nayebi, Javier Sagastuy-Brena, Surya Ganguli, Jon Bloom, Daniel LK Yamins

Preprint PDF Code Slides Video Blog Post

I’ve just read

Mini-reviews to help me track and remember papers I read

I got this idea from a lab mate’s website and thought I’d do something similar, though perhaps not as nice. His project eventually evolved into Paper a Week, which is just awesome.

Generalized Shape Metrics on Neural Representations

Alex Williams, Erin Kunz, Simon Kornblith, Scott Linderman

Published October 2021
Read on Feb 7, 2021

TL;DR This paper highlights issues that can occur when represtation similarity metrics are not *metrics* in the mathematical sense. The authors formulate novel metrics based on previous approaches, by getting them to satisfy the triangle inequality, including one geared specifically for CNNs. They demonstrate theis methods are effective and scalable to large numbers of specimens.

Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks

Pratik Chaudhari, Stefano Soatto

Published October 2017
Read on Aug 28, 2020

TL;DR The authors present two main results: a thorough mathematical analysis on how SGD performs variational inference and what its steady state behavior looks like: limit cycles. They present empirical quantities similar to the ones we have measured and analyze those compared to the null of Brownian motion.

Learning Dynamics Theory SDE

A Variational Analysis of Stochastic Gradient Algorithms

Stephan Mandt, Matthew D. Hoffman, David M. Blei

Published February 2016
Read on Aug 18, 2020

TL;DR Rethinking SGD in the limit of continuous time yields valuable insight, particularly on hyperparameter tuning. This paper introduces the SDE derivation in the previously reviewed 'Three Factors' paper, and elaborates on the minimization of the KL divergence of the stationary distribution of the underlying OU process and the target posterior (and as such relies on the Bayesian view on ML algorithms, rather than the optimization view).

Learning Dynamics Theory

See all

Experience

PhD Intern. Autonomous Vehicle Perception.

Nvidia

Jun 2021 – Dec 2021 California

SWE Intern

Google

Jun 2018 – Sep 2018 California

Computer Science Teacher

Modern American School

Jul 2016 – Aug 2017 CDMX, Mexico

Taught Object Oriented Programming in C#.

Tech Lead. Full stack developer.

Nexu.mx

Aug 2015 – Aug 2017 CDMX, Mexico

Research Intern

IBM R&D

Mar 2015 – Aug 2015 Böblingen, Germany