Training Networks

This is part of my “journal club for credit” series. You can see the other computational neuroscience papers in this post.

Unit: Deep Learning

  1. Perceptron
  2. Energy Based Neural Networks
  3. Training Networks
  4. Deep Learning


A Learning Algorithm for Boltzmann Machines. By Ackley, Hinton, and Sejnowski in 1985.

Learning representations by back-propagating errors. By Rumelhart, Hinton, and Williams in 1986.

Other Useful References


How do you actually train neural networks?

Hopefully the past few posts have piqued your interest in neural networks. Maybe you even want to unleash a neural network on some data. How do you actually train the neural network?

I’m actually going to keep this brief for two reasons. First, detailed derivations can already be found elsewhere (for Boltzmann see Appendix of the original paper as well as MacKay, for backpropagation see Nielsen). Second, I firmly believe that algorithms are best learned by actually stepping through the updates, so any explanation I attempt will not be sufficient for you to truly learn the algorithm. I will provide some general context as well as some questions you should be able to answer, but please go do it yourself!

There are three general classes of machine learning based on the information received:

  • Unsupervised – data only. Boltzmann machine.
  • Supervised – data with labels. MLP with backpropagation.
  • Reinforcement – data, actions, and scores associated with each action. Deserves its own detailed post, but check out papers by DeepMind for cool applications.

The Boltzmann machine learning rule is an example of maximum likelihood. In practice, the original learning rule is too computationally expensive, so a modified algorithm called contrastive divergence (or variants such as persistent contrastive divergence) is utilized instead. See the Hinton guide to RBMs for more details.

Backpropagation is a computationally-efficient writing of the chain rule from calculus, so besides the above paper which popularized it, there is actually a long history of this algorithm being discovered and rediscovered.

Fundamental Questions

  • What is maximum likelihood?
  • Why can one interpret the learning terms in the BM algorithm as “waking” and “sleeping”?
  • Why are BM hidden layers so important?
  • Why are restricted Boltzmann machines, RBMs, much easier to train?
  • Why is backpropagation more computationally efficient than the finite difference method?
  • Derive the 4 backpropagation equations!

Advanced Questions

  • Follow Hinton’s RBM guide and implement your own Boltzmann machine
  • Use Nielsen’s code to train your own MLP