Blog posts

2025

🔥 Scaling Predictive Coding to 100+ Layer Networks

5 minute read

Published: May 20, 2025

📖 TL;DR: We introduce \(\mu\)PC, a reparameterisation of predictive coding networks that enables stable training of 100+ layer ResNets with zero-shot hyperparameter transfer.

♾️ Infinite Widths (& Depths) Part III: The Maximal Update Parameterisation (\(\mu\)P)

8 minute read

Published: April 09, 2025

This is the third and last post of a short series on the infinite-width limits of deep neural networks (DNNs). In Part I, we showed that the output of a random network becomes Gaussian distributed in the infinite-width limit. Part II went beyond initialisation and showed that infinitely wide nets trained with GD are basically kernel methods.

♾️ Infinite Widths Part II: The Neural Tangent Kernel

7 minute read

Published: February 20, 2025

This is the second post of a short series on the infinite-width limits of deep neural networks (DNNs). Previously, we reviewed the correspondence between neural networks and Gaussian Processes (NNGP), showing that, as the number neurons in the hidden layers grows to infinity, the output of a random network becomes Gaussian distributed.

2024

♾️ Infinite Widths Part I: Neural Networks as Gaussian Processes

6 minute read

Published: November 16, 2024

This is the first post of a short series on the infinite-width limits of deep neural networks (DNNs). We start by reviewing the correspondence between neural networks and Gaussian Processes (GPs).

KANs Made Simple

2 minute read

Published: October 09, 2024

🤔 Confused about the recent KAN: Kolmogorov-Arnold Networks? I was too, so here’s a minimal explanation that makes it easy to see the difference between KANs and multi-layer perceptrons (MLPs).

⛰️ The Energy Landscape of Predictive Coding Networks

9 minute read

Published: October 01, 2024

📖 TL;DR: Predictive coding makes the loss landscape of feedforward neural networks more benign and robust to vanishing gradients.

💥 Thermodynamic Natural Gradient Descent

7 minute read

Published: July 19, 2024

I recently came across this paper Thermodynamic Natural Gradient Descent by Normal Computing. I found it very interesting, so below is my brief take on it.

💭 My experience as an Applied Scientist Intern at Amazon

7 minute read

Published: April 27, 2024

2023

🧠 Predictive Coding as a 2^nd-Order Method

10 minute read

Published: August 10, 2023

📖 TL;DR: Predictive coding implicitly performs a 2^nd-order weight update via 1^st-order (gradient) updates on neurons that in some cases allow it to converge faster than backpropagation with standard stochastic gradient descent.

Francesco Innocenti

Blog posts

2025

2024

2023