Posts by Tags

Amazon

Bayesian inference

Bayesian neural networks

Fisher information

๐Ÿง  Predictive Coding as a 2nd-Order Method

10 minute read

Published:

๐Ÿ“– TL;DR: Predictive coding implicitly performs a 2nd-order weight update via 1st-order (gradient) updates on neurons that in some cases allow it to converge faster than backpropagation with standard stochastic gradient descent.

Gaussian processes

KAN

KANs Made Simple

2 minute read

Published:

๐Ÿค” Confused about the recent KAN: Kolmogorov-Arnold Networks? I was too, so hereโ€™s a minimal explanation that makes it easy to see the difference between KANs and multi-layer perceptrons (MLPs).

Kolmogorov-Arnold networks

KANs Made Simple

2 minute read

Published:

๐Ÿค” Confused about the recent KAN: Kolmogorov-Arnold Networks? I was too, so hereโ€™s a minimal explanation that makes it easy to see the difference between KANs and multi-layer perceptrons (MLPs).

Kolmogorov-Arnold representation theorem

KANs Made Simple

2 minute read

Published:

๐Ÿค” Confused about the recent KAN: Kolmogorov-Arnold Networks? I was too, so hereโ€™s a minimal explanation that makes it easy to see the difference between KANs and multi-layer perceptrons (MLPs).

Normal Computing

PhD

applied scientist

backpropagation

๐Ÿง  Predictive Coding as a 2nd-Order Method

10 minute read

Published:

๐Ÿ“– TL;DR: Predictive coding implicitly performs a 2nd-order weight update via 1st-order (gradient) updates on neurons that in some cases allow it to converge faster than backpropagation with standard stochastic gradient descent.

central limit theorem

deep information propagation

deep neural networks

โ™พ๏ธ Infinite Widths Part II: The Neural Tangent Kernel

7 minute read

Published:

This is the second post of a short series on the infinite-width limits of deep neural networks (DNNs). Previously, we reviewed the correspondence between neural networks and Gaussian Processes (NNGP), showing that, as the number neurons in the hidden layers grows to infinity, the output of a random network becomes Gaussian distributed.

KANs Made Simple

2 minute read

Published:

๐Ÿค” Confused about the recent KAN: Kolmogorov-Arnold Networks? I was too, so hereโ€™s a minimal explanation that makes it easy to see the difference between KANs and multi-layer perceptrons (MLPs).

๐Ÿง  Predictive Coding as a 2nd-Order Method

10 minute read

Published:

๐Ÿ“– TL;DR: Predictive coding implicitly performs a 2nd-order weight update via 1st-order (gradient) updates on neurons that in some cases allow it to converge faster than backpropagation with standard stochastic gradient descent.

depth-mup

dynamical mean field theory

feature learning

gradient descent

hyperparameter transfer

industry

inference learning

๐Ÿง  Predictive Coding as a 2nd-Order Method

10 minute read

Published:

๐Ÿ“– TL;DR: Predictive coding implicitly performs a 2nd-order weight update via 1st-order (gradient) updates on neurons that in some cases allow it to converge faster than backpropagation with standard stochastic gradient descent.

infinite width limit

โ™พ๏ธ Infinite Widths Part II: The Neural Tangent Kernel

7 minute read

Published:

This is the second post of a short series on the infinite-width limits of deep neural networks (DNNs). Previously, we reviewed the correspondence between neural networks and Gaussian Processes (NNGP), showing that, as the number neurons in the hidden layers grows to infinity, the output of a random network becomes Gaussian distributed.

internship

interpretability

KANs Made Simple

2 minute read

Published:

๐Ÿค” Confused about the recent KAN: Kolmogorov-Arnold Networks? I was too, so hereโ€™s a minimal explanation that makes it easy to see the difference between KANs and multi-layer perceptrons (MLPs).

kernel methods

โ™พ๏ธ Infinite Widths Part II: The Neural Tangent Kernel

7 minute read

Published:

This is the second post of a short series on the infinite-width limits of deep neural networks (DNNs). Previously, we reviewed the correspondence between neural networks and Gaussian Processes (NNGP), showing that, as the number neurons in the hidden layers grows to infinity, the output of a random network becomes Gaussian distributed.

lazy learning

โ™พ๏ธ Infinite Widths Part II: The Neural Tangent Kernel

7 minute read

Published:

This is the second post of a short series on the infinite-width limits of deep neural networks (DNNs). Previously, we reviewed the correspondence between neural networks and Gaussian Processes (NNGP), showing that, as the number neurons in the hidden layers grows to infinity, the output of a random network becomes Gaussian distributed.

linear regime

โ™พ๏ธ Infinite Widths Part II: The Neural Tangent Kernel

7 minute read

Published:

This is the second post of a short series on the infinite-width limits of deep neural networks (DNNs). Previously, we reviewed the correspondence between neural networks and Gaussian Processes (NNGP), showing that, as the number neurons in the hidden layers grows to infinity, the output of a random network becomes Gaussian distributed.

local learning

๐Ÿง  Predictive Coding as a 2nd-Order Method

10 minute read

Published:

๐Ÿ“– TL;DR: Predictive coding implicitly performs a 2nd-order weight update via 1st-order (gradient) updates on neurons that in some cases allow it to converge faster than backpropagation with standard stochastic gradient descent.

loss landscape

machine learning

maximal update parameterisation

multi-layer perceptrons

KANs Made Simple

2 minute read

Published:

๐Ÿค” Confused about the recent KAN: Kolmogorov-Arnold Networks? I was too, so hereโ€™s a minimal explanation that makes it easy to see the difference between KANs and multi-layer perceptrons (MLPs).

mup

natural gradient descent

neural scaling laws

KANs Made Simple

2 minute read

Published:

๐Ÿค” Confused about the recent KAN: Kolmogorov-Arnold Networks? I was too, so hereโ€™s a minimal explanation that makes it easy to see the difference between KANs and multi-layer perceptrons (MLPs).

neural tangent kernel

โ™พ๏ธ Infinite Widths Part II: The Neural Tangent Kernel

7 minute read

Published:

This is the second post of a short series on the infinite-width limits of deep neural networks (DNNs). Previously, we reviewed the correspondence between neural networks and Gaussian Processes (NNGP), showing that, as the number neurons in the hidden layers grows to infinity, the output of a random network becomes Gaussian distributed.

optimisation theory

predictive coding

๐Ÿง  Predictive Coding as a 2nd-Order Method

10 minute read

Published:

๐Ÿ“– TL;DR: Predictive coding implicitly performs a 2nd-order weight update via 1st-order (gradient) updates on neurons that in some cases allow it to converge faster than backpropagation with standard stochastic gradient descent.

rich regime

saddle points

saddles

๐Ÿง  Predictive Coding as a 2nd-Order Method

10 minute read

Published:

๐Ÿ“– TL;DR: Predictive coding implicitly performs a 2nd-order weight update via 1st-order (gradient) updates on neurons that in some cases allow it to converge faster than backpropagation with standard stochastic gradient descent.

second-order method

๐Ÿง  Predictive Coding as a 2nd-Order Method

10 minute read

Published:

๐Ÿ“– TL;DR: Predictive coding implicitly performs a 2nd-order weight update via 1st-order (gradient) updates on neurons that in some cases allow it to converge faster than backpropagation with standard stochastic gradient descent.

second-order methods

splines

KANs Made Simple

2 minute read

Published:

๐Ÿค” Confused about the recent KAN: Kolmogorov-Arnold Networks? I was too, so hereโ€™s a minimal explanation that makes it easy to see the difference between KANs and multi-layer perceptrons (MLPs).

tensor programs

thermodynamic AI

trust region

๐Ÿง  Predictive Coding as a 2nd-Order Method

10 minute read

Published:

๐Ÿ“– TL;DR: Predictive coding implicitly performs a 2nd-order weight update via 1st-order (gradient) updates on neurons that in some cases allow it to converge faster than backpropagation with standard stochastic gradient descent.

vanishing gradients