Deep Learning: 0-60 in a few hours?

Here, I will try to outline the fastest possible path to go from zero understanding of deep learning to an understanding of the basic ideas. In a follow up post, I’ll outline some deep learning packages where you could actually implement these ideas.

I think by far the best introduction to deep learning is Michael Nielsen’s ebook. Before you get started with it, I think the minimum required mathematics includes an understanding of the following:

  • Vector and Matrix multiplication – especially when written in summation notation
  • Exponents and Logarithms
  • Derivatives and Partial Derivatives
  • Probability, mainly Bayes Theorem (not actually needed for Michael Nielsen’s book, but it is essential for later topics)

I really think that if you understand those mathematical topics, you can start reading the ebook.

Here is my proposed learning strategy. Iterate between reading the ebook (Chapters 1-5 only) and playing with this cool interactive neural network every time a new idea is mentioned. For a first pass, just read the ebook and don’t do the exercises or worry about actual code implementation. Additionally, chapter 6 introduces convolutional neural networks which are a more advanced topic that can be saved for later.

Once you have some intuition about neural networks, I recommend reading this review by several of the big names in deep learning. This will give you a flavor of the current status of the field.

Now you are ready to start coding!

PS. If you want to get into more advanced deep learning topics, check out my previous Deep Learning Unit. And to really get up to speed on research, there is a deep learning book that should be published soon.

 

Learning Python for Science

Here I outline how to learn Python on your own with emphasis on solving science problems. The first section applies to anyone, but the end is specialized towards computational problems that arise in science.

Python Basics

I recommend the following two tutorials:

Some additional resources that may be helpful include:

My suggested workflow:

  1. Do Codecademy and Python the Hard Way at the same time.
  2. If Codecademy/Python the Hard Way is too difficult, also read a Byte of Python.
  3. If Codecademy/Python the Hard Way is easy, use Think Python as an additional resource.
  4. If you are confused about a specific chunk of code, put it into Python Tutor which will walk you step by step through the program.
  5. Additionally, Google and Stack Overflow are extremely useful for coding questions or go to the original Python documentation.

The essential things one needs to learn about Python include:

  • data types: int, float, string
  • data structures: lists, dictionaries, tuples, sets
  • control statements: for, while, if else
  • print function
  • open / write to a text file
  • custom functions and objects
  • list comprehensions – comes up less often in numerical code, but still good to know

 

Numpy and Scipy

Numpy is the essential mathematics module in Python and is part of the larger Scipy project. All standard numerical needs are covered in Numpy, while more advanced functions are in Scipy.

I recommend the following tutorials:

  • Numpy’s tutorial
  • This fun tutorial that programs the Game of Life (GoL). I only recommend the section that implements GoL in Numpy, the rest of it is not essential. It also has a useful quick reference guide.
  • This Numpy tutorial from Scipy Lecture Notes

 

Matplotlib

Visualizing data is essential to understanding and communicating science ideas. Matplotlib is the standard plotting module. While it has its limitations, I still personally use it for my everyday plots. For more advanced plot types, check out Plotly, Seaborn, Mayavi, ggplot, and Bokeh.

And assuming you are new to making scientific figures, there are some good habits you should get into. First, read these tips from Plos. Second, never ever use rainbow colors aka jet. Color challenged people like myself will hate you. Please stick with a color map that uses shading sensible. Besides making me happy, it also is easier to print to gray scale.

 

Python on a Mac

I personally do most of my coding on my laptop, which is a Mac. Eventually that code gets run on a Linux server, but all initial coding, exploratory data analysis, etc is done on my laptop. And since I advocate for Python, I thought I would lay out all the steps I needed to do to setup my Mac in the easiest manner. (Note: probably similar steps on Windows, but I haven’t used a Windows computer in so long that I don’t know the potential differences).

8rYYWPNYNN-2

Unfortunately, the Python 2.x vs 3.x divide exists and so far, I have yet to be able to completely commit to 3.x due to a few packages with legacy issues. Luckily, there is a pretty easy solution below. Note, your Mac has Python preinstalled (go to terminal and type python to start coding…). However, if you want to update any packages, you can quickly run into issues. So it is easiest to install your own version of Python.

  1. Install Anaconda (I advocate version 2.7, Anaconda will call this environment root)
  2. I recommend using Anaconda Navigator and using Spyder for an IDE
  3. Install version 3.5 and make an environment (in Anaconda Navigator or terminal commands below):
    $ conda create -n python3.5 python=3.5 anaconda
  4. You can switch between python environments  {root, python3.5}
    $ source activate {insert environment name here}
  5. To add new python packages use conda or pip (anaconda has made its own pip the default)
  6. WARNING: always close Spyder before using conda update or pip. I got stuck in some weird place where Spyder would no longer launch. Apparently it can happen if Spyder is open and underlying packages get changed.

To get around the 2.x vs 3.x issue, go to your terminal and use pip install for the following packages: future, importlib, unittest2, and argparse. See the package’s website for details of any differences. Then, start your Python code with the following two lines:

from __future__ import (absolute_import, division, print_function, 
unicode_literals)

from builtins import *

For nearly all scientific computing applications, you are essentially writing Python 3 code. So make sure to read the correct documentation!

Personally, I found Anaconda to be a lifesaver. Otherwise, I got stuck in some weird infinite update loop to install all required packages for machine learning (specifically Theano).

Now you are ready to code! If you aren’t familiar with Python, my recommended tutorials will be in a future post.

 

The one ring to rule them all: Python

Ringfrodo

This lays out why I think all scientists should learn Python first and use it as their primary programming language. I think many of the reasons why scientists should learn Python first are equally applicable to everyone, but computer scientists and others probably have different demands for their primary language.

First, a quick history of languages I have programmed in. My first programming project was as a freshman in college and I used Fortran for simulations of water molecules. Then as a sophomore I used MATLAB for a summer research project that testing components for a high energy experiment. The following summer I used C for a summer research project on stochastic simulations. This was followed by me taking my first programming course where I used Java.

At this point I had used a slew of programming languages, but MATLAB was my primary language. I rarely had to deal with strings or statistics, so MATLAB had everything I needed. This continued into graduate school. Eventually, I ended up doing some bioinformatics, so I had to learn R. And finally, just for the hell of it, I decided to write some code in Python. Additionally, I have used Mathematica, but I wouldn’t count it as a true language.

So this is a really long-winded explanation of why I have some credibility when I say Python is the best (compared to Fortran, C, MATLAB, Java, and R which with I have personal experience). When I started my postdoc, it seemed like the perfect time to make the complete switch to only use Python.

So to start with, why would I recommend it as a first language?

  1. Correct level of difficulty
  2. Pythonic – simple expressions usually work as you would guess
  3. Versatile – can do everything one needs
  4. Open sourced
  5. Community – lots of great packages

And why should scientists use it?

  1. Versatile – handles all data types easily (unlike MATLAB)
  2. Fast enough – Cython, Theano, etc can be used when speed matters
  3. Plenty of scientific packages – Numpy, Scipy, Matplotlib, Scikit-Learn, etc
  4. Large science community – new packages all the time

Additionally, its a language that is popular enough (see here and here) to lead to a job in industry and will safely be around for years to come. So please drink the Kool-Aid and join the Python cult!