๐ญ My experience as an Applied Scientist Intern at Amazon
Published:
Published:
This is the first post of a short series on the infinite-width limits of deep neural networks (DNNs). We start by reviewing the correspondence between neural networks and Gaussian Processes (GPs).
Published:
This is the first post of a short series on the infinite-width limits of deep neural networks (DNNs). We start by reviewing the correspondence between neural networks and Gaussian Processes (GPs).
Published:
I recently came across this paper Thermodynamic Natural Gradient Descent by Normal Computing. I found it very interesting, so below is my brief take on it.
Published:
๐ TL;DR: Predictive coding implicitly performs a 2nd-order weight update via 1st-order (gradient) updates on neurons that in some cases allow it to converge faster than backpropagation with standard stochastic gradient descent.
Published:
This is the first post of a short series on the infinite-width limits of deep neural networks (DNNs). We start by reviewing the correspondence between neural networks and Gaussian Processes (GPs).
Published:
๐ค Confused about the recent KAN: Kolmogorov-Arnold Networks? I was too, so hereโs a minimal explanation that makes it easy to see the difference between KANs and multi-layer perceptrons (MLPs).
Published:
๐ค Confused about the recent KAN: Kolmogorov-Arnold Networks? I was too, so hereโs a minimal explanation that makes it easy to see the difference between KANs and multi-layer perceptrons (MLPs).
Published:
๐ค Confused about the recent KAN: Kolmogorov-Arnold Networks? I was too, so hereโs a minimal explanation that makes it easy to see the difference between KANs and multi-layer perceptrons (MLPs).
Published:
I recently came across this paper Thermodynamic Natural Gradient Descent by Normal Computing. I found it very interesting, so below is my brief take on it.
Published:
๐ TL;DR: We introduce \(\mu\)PC, a reparameterisation of predictive coding networks that enables stable training of 100+ layer ResNets with zero-shot hyperparameter transfer.
Published:
๐ TL;DR: Predictive coding makes the loss landscape of feedforward neural networks more benign and robust to vanishing gradients.
Published:
๐ TL;DR: Predictive coding implicitly performs a 2nd-order weight update via 1st-order (gradient) updates on neurons that in some cases allow it to converge faster than backpropagation with standard stochastic gradient descent.
Published:
This is the first post of a short series on the infinite-width limits of deep neural networks (DNNs). We start by reviewing the correspondence between neural networks and Gaussian Processes (GPs).
Published:
This is the first post of a short series on the infinite-width limits of deep neural networks (DNNs). We start by reviewing the correspondence between neural networks and Gaussian Processes (GPs).
Published:
๐ TL;DR: We introduce \(\mu\)PC, a reparameterisation of predictive coding networks that enables stable training of 100+ layer ResNets with zero-shot hyperparameter transfer.
Published:
This is the third and last post of a short series on the infinite-width limits of deep neural networks (DNNs). In Part I, we showed that the output of a random network becomes Gaussian distributed in the infinite-width limit. Part II went beyond initialisation and showed that infinitely wide nets trained with GD are basically kernel methods.
Published:
This is the second post of a short series on the infinite-width limits of deep neural networks (DNNs). Previously, we reviewed the correspondence between neural networks and Gaussian Processes (NNGP), showing that, as the number neurons in the hidden layers grows to infinity, the output of a random network becomes Gaussian distributed.
Published:
This is the first post of a short series on the infinite-width limits of deep neural networks (DNNs). We start by reviewing the correspondence between neural networks and Gaussian Processes (GPs).
Published:
๐ค Confused about the recent KAN: Kolmogorov-Arnold Networks? I was too, so hereโs a minimal explanation that makes it easy to see the difference between KANs and multi-layer perceptrons (MLPs).
Published:
๐ TL;DR: Predictive coding makes the loss landscape of feedforward neural networks more benign and robust to vanishing gradients.
Published:
๐ TL;DR: Predictive coding implicitly performs a 2nd-order weight update via 1st-order (gradient) updates on neurons that in some cases allow it to converge faster than backpropagation with standard stochastic gradient descent.
Published:
๐ TL;DR: We introduce \(\mu\)PC, a reparameterisation of predictive coding networks that enables stable training of 100+ layer ResNets with zero-shot hyperparameter transfer.
Published:
This is the third and last post of a short series on the infinite-width limits of deep neural networks (DNNs). In Part I, we showed that the output of a random network becomes Gaussian distributed in the infinite-width limit. Part II went beyond initialisation and showed that infinitely wide nets trained with GD are basically kernel methods.
Published:
This is the third and last post of a short series on the infinite-width limits of deep neural networks (DNNs). In Part I, we showed that the output of a random network becomes Gaussian distributed in the infinite-width limit. Part II went beyond initialisation and showed that infinitely wide nets trained with GD are basically kernel methods.
Published:
๐ TL;DR: Predictive coding makes the loss landscape of feedforward neural networks more benign and robust to vanishing gradients.
Published:
๐ TL;DR: We introduce \(\mu\)PC, a reparameterisation of predictive coding networks that enables stable training of 100+ layer ResNets with zero-shot hyperparameter transfer.
Published:
This is the third and last post of a short series on the infinite-width limits of deep neural networks (DNNs). In Part I, we showed that the output of a random network becomes Gaussian distributed in the infinite-width limit. Part II went beyond initialisation and showed that infinitely wide nets trained with GD are basically kernel methods.
Published:
๐ TL;DR: We introduce \(\mu\)PC, a reparameterisation of predictive coding networks that enables stable training of 100+ layer ResNets with zero-shot hyperparameter transfer.
Published:
๐ TL;DR: Predictive coding makes the loss landscape of feedforward neural networks more benign and robust to vanishing gradients.
Published:
๐ TL;DR: Predictive coding implicitly performs a 2nd-order weight update via 1st-order (gradient) updates on neurons that in some cases allow it to converge faster than backpropagation with standard stochastic gradient descent.
Published:
This is the third and last post of a short series on the infinite-width limits of deep neural networks (DNNs). In Part I, we showed that the output of a random network becomes Gaussian distributed in the infinite-width limit. Part II went beyond initialisation and showed that infinitely wide nets trained with GD are basically kernel methods.
Published:
This is the second post of a short series on the infinite-width limits of deep neural networks (DNNs). Previously, we reviewed the correspondence between neural networks and Gaussian Processes (NNGP), showing that, as the number neurons in the hidden layers grows to infinity, the output of a random network becomes Gaussian distributed.
Published:
This is the first post of a short series on the infinite-width limits of deep neural networks (DNNs). We start by reviewing the correspondence between neural networks and Gaussian Processes (GPs).
Published:
๐ค Confused about the recent KAN: Kolmogorov-Arnold Networks? I was too, so hereโs a minimal explanation that makes it easy to see the difference between KANs and multi-layer perceptrons (MLPs).
Published:
This is the second post of a short series on the infinite-width limits of deep neural networks (DNNs). Previously, we reviewed the correspondence between neural networks and Gaussian Processes (NNGP), showing that, as the number neurons in the hidden layers grows to infinity, the output of a random network becomes Gaussian distributed.
Published:
This is the second post of a short series on the infinite-width limits of deep neural networks (DNNs). Previously, we reviewed the correspondence between neural networks and Gaussian Processes (NNGP), showing that, as the number neurons in the hidden layers grows to infinity, the output of a random network becomes Gaussian distributed.
Published:
This is the second post of a short series on the infinite-width limits of deep neural networks (DNNs). Previously, we reviewed the correspondence between neural networks and Gaussian Processes (NNGP), showing that, as the number neurons in the hidden layers grows to infinity, the output of a random network becomes Gaussian distributed.
Published:
๐ TL;DR: We introduce \(\mu\)PC, a reparameterisation of predictive coding networks that enables stable training of 100+ layer ResNets with zero-shot hyperparameter transfer.
Published:
๐ TL;DR: Predictive coding makes the loss landscape of feedforward neural networks more benign and robust to vanishing gradients.
Published:
๐ TL;DR: Predictive coding implicitly performs a 2nd-order weight update via 1st-order (gradient) updates on neurons that in some cases allow it to converge faster than backpropagation with standard stochastic gradient descent.
Published:
๐ TL;DR: Predictive coding makes the loss landscape of feedforward neural networks more benign and robust to vanishing gradients.
Published:
I recently came across this paper Thermodynamic Natural Gradient Descent by Normal Computing. I found it very interesting, so below is my brief take on it.
Published:
๐ TL;DR: We introduce \(\mu\)PC, a reparameterisation of predictive coding networks that enables stable training of 100+ layer ResNets with zero-shot hyperparameter transfer.
Published:
This is the third and last post of a short series on the infinite-width limits of deep neural networks (DNNs). In Part I, we showed that the output of a random network becomes Gaussian distributed in the infinite-width limit. Part II went beyond initialisation and showed that infinitely wide nets trained with GD are basically kernel methods.
Published:
๐ค Confused about the recent KAN: Kolmogorov-Arnold Networks? I was too, so hereโs a minimal explanation that makes it easy to see the difference between KANs and multi-layer perceptrons (MLPs).
Published:
๐ TL;DR: We introduce \(\mu\)PC, a reparameterisation of predictive coding networks that enables stable training of 100+ layer ResNets with zero-shot hyperparameter transfer.
Published:
This is the third and last post of a short series on the infinite-width limits of deep neural networks (DNNs). In Part I, we showed that the output of a random network becomes Gaussian distributed in the infinite-width limit. Part II went beyond initialisation and showed that infinitely wide nets trained with GD are basically kernel methods.
Published:
I recently came across this paper Thermodynamic Natural Gradient Descent by Normal Computing. I found it very interesting, so below is my brief take on it.
Published:
๐ค Confused about the recent KAN: Kolmogorov-Arnold Networks? I was too, so hereโs a minimal explanation that makes it easy to see the difference between KANs and multi-layer perceptrons (MLPs).
Published:
This is the third and last post of a short series on the infinite-width limits of deep neural networks (DNNs). In Part I, we showed that the output of a random network becomes Gaussian distributed in the infinite-width limit. Part II went beyond initialisation and showed that infinitely wide nets trained with GD are basically kernel methods.
Published:
This is the second post of a short series on the infinite-width limits of deep neural networks (DNNs). Previously, we reviewed the correspondence between neural networks and Gaussian Processes (NNGP), showing that, as the number neurons in the hidden layers grows to infinity, the output of a random network becomes Gaussian distributed.
Published:
๐ TL;DR: We introduce \(\mu\)PC, a reparameterisation of predictive coding networks that enables stable training of 100+ layer ResNets with zero-shot hyperparameter transfer.
Published:
This is the third and last post of a short series on the infinite-width limits of deep neural networks (DNNs). In Part I, we showed that the output of a random network becomes Gaussian distributed in the infinite-width limit. Part II went beyond initialisation and showed that infinitely wide nets trained with GD are basically kernel methods.
Published:
๐ TL;DR: We introduce \(\mu\)PC, a reparameterisation of predictive coding networks that enables stable training of 100+ layer ResNets with zero-shot hyperparameter transfer.
Published:
๐ TL;DR: Predictive coding makes the loss landscape of feedforward neural networks more benign and robust to vanishing gradients.
Published:
๐ TL;DR: Predictive coding implicitly performs a 2nd-order weight update via 1st-order (gradient) updates on neurons that in some cases allow it to converge faster than backpropagation with standard stochastic gradient descent.
Published:
This is the third and last post of a short series on the infinite-width limits of deep neural networks (DNNs). In Part I, we showed that the output of a random network becomes Gaussian distributed in the infinite-width limit. Part II went beyond initialisation and showed that infinitely wide nets trained with GD are basically kernel methods.
Published:
๐ TL;DR: Predictive coding makes the loss landscape of feedforward neural networks more benign and robust to vanishing gradients.
Published:
๐ TL;DR: Predictive coding implicitly performs a 2nd-order weight update via 1st-order (gradient) updates on neurons that in some cases allow it to converge faster than backpropagation with standard stochastic gradient descent.
Published:
๐ TL;DR: Predictive coding implicitly performs a 2nd-order weight update via 1st-order (gradient) updates on neurons that in some cases allow it to converge faster than backpropagation with standard stochastic gradient descent.
Published:
I recently came across this paper Thermodynamic Natural Gradient Descent by Normal Computing. I found it very interesting, so below is my brief take on it.
Published:
๐ค Confused about the recent KAN: Kolmogorov-Arnold Networks? I was too, so hereโs a minimal explanation that makes it easy to see the difference between KANs and multi-layer perceptrons (MLPs).
Published:
This is the third and last post of a short series on the infinite-width limits of deep neural networks (DNNs). In Part I, we showed that the output of a random network becomes Gaussian distributed in the infinite-width limit. Part II went beyond initialisation and showed that infinitely wide nets trained with GD are basically kernel methods.
Published:
I recently came across this paper Thermodynamic Natural Gradient Descent by Normal Computing. I found it very interesting, so below is my brief take on it.
Published:
๐ TL;DR: Predictive coding implicitly performs a 2nd-order weight update via 1st-order (gradient) updates on neurons that in some cases allow it to converge faster than backpropagation with standard stochastic gradient descent.
Published:
๐ TL;DR: Predictive coding makes the loss landscape of feedforward neural networks more benign and robust to vanishing gradients.