Deep Learning in Python

So maybe after reading some of my past posts, you are fired up to start programming a deep neural network in Python. How should you get started?

If you want to be able to run anything but the simplest neural networks on easy problems, you will find that since pure Python is an interpreted language, it is too slow. Does that mean we have to give up and write our own C++ code? Luckily GPUs and other programmers come to your rescue by offering between 5-100X speedup (I would estimate my average speedup at 10X, but it varies for specific tasks).

There are two main Python packages, Theano and TensorFlow, that are designed to let you write Python code that can either run on a CPU or a GPU. In essence, they are each their own mini-language with the following changes from standard Python:

  • Tensors (generalizations of matrices) are the primary variable type and treated as abstract mathematical objects (don’t need to specify actual values immediately).
  • Computational graphs are utilized to organize operations on the tensors.
  • When one wants to actually evaluate the graph on some data, it is stored in a shared variable that when possible gets sent to the GPU. This data is then processed by the graph (in place of the original tensor placeholders).
  • Automatic differentiation (ie it understands derivatives symbolically).
  • Built in numerical optimizations.

So to get started you will want to install either Theano (pip install theano), TensorFlow (details here), or both. I personally have only used Theano, but if Google keeps up the developmental progress of TensorFlow, I may end up switching to it.

At the end of the day, that means that if one wants to actually implement neural networks in Theano or TensorFlow, you essentially will learn another language. However, people have built various libraries that are abstractions on top of these mini-languages. Lasagne is one example that basically organizes Theano code so that you have to interact less with Theano, but you will still need to understand Theano. I initially started with Theano and Lasagne, but I am now a convert to Keras.

Instead, I advocate for Keras (pip install keras) for two major reasons:

  1.  High level abstraction. You can write standard Python code and get a deep neural network up and running very quickly.
  2. Back-end agnostic. Keras can run on either Theano or TensorFlow.

So it seems like a slam dunk right? Unfortunately life is never that simple, instead there are two catches:

  1. Mediocre documentation (using Numpy as a gold standard, or even comparing to Lasagne). You can get the standard things up and running based on theirs docs. But if you want to do anything advanced, you will find yourself looking into their source code on GitHub, which has some hidden, but useful, comments.
  2. Back-end agnostic. This means if you do want to introduce a modification to the back-end, and you want it to always work in Keras, you need to implement it in both Theano and TensorFlow. In practice this isn’t too bad since Keras has done a good job of implementing low-end operations.

Fortunately, the pros definitely outweigh the cons for Keras and I highly endorse it. Here are a few tips I have learned from my experience with Keras:

  • Become familiar with the Keras documentation.
  • I recommend only using the functional API which allows you to implement more complicated networks. The sequential API allows you to write simple models in fewer lines of code, but you lose flexibility (for example, you can’t access intermediate layers) and the code won’t generalize to complex models. So just embrace the functional API.
  • Explore the examples (here and here).
  • Check out the Keras GitHub.
  • Names for layers are optional keywords, but definitely use them! It will significantly help you when you are debugging.

Now start coding your own deep neural networks!

General Programming Tips

I thought I would put together some useful programming tips that I have learned over the years. Most of these are general tips, but they are tailored towards Python.

  1. Zen of Python. Even if you don’t use Python, these are good ideas to internalize.
  2. The language documentation (Python’s standard library), StackOverflow, and Google searches are your best friends.
  3. Utilize modern IDEs (like Spyder for Python) and tab-completion to reduce the number of basic errors.
  4. Comments are not optional. The general logic of functions and objects should be understandable from the comments. Every block of code logic should have a short comment to aid future changes. If you find a chunk of code confusing now, it will be just as confusing if not worse in the future!
  5. Use sensible variable names. This cuts down on the number/length of comments.
  6. Try to adhere to the language standards (Python’s), but don’t obsess over it.
  7. Set your own consistent standards (Do variable names end in s or not? Do boolean variables have similar style names? Etc).
  8. When starting a project, do you best to get quickly get up to a basic working prototype. Working but incomplete code is always better than non-working code. Quick coding is aided by the next point…
  9. Outline your code before starting. My tips for outlining in Python are detailed after this list.
  10. Write modular code. Common tasks should be made into functions or objects.
  11. Avoid magic numbers and hard coded values. Better to include a set of named parameters in one section of your code where the basic logic of these variables is explained.
  12. Avoid multiple inheritance (check out this fun explanation of why this is bad).
  13. A program should have a standard interface, I like to call it main, and a way to run the standard interface with some default values. In Python, utilize if __name__ == ‘__main__’: to define standard parameters and then call main(parameters). This aids the goal of always having working code, as well as making it easier to interact with different programs.
  14. Check out these Python tricks (1-23 are the best, rest are more advanced).

 

Here are details on how I outline code in Python. I try my best to have running code at all times, even if it does absolutely nothing. If it isn’t real code, I leave it as a comment. Therefore, my programming tends to proceed as follows.

  1. Outline the general logic of the code in comments. Define needed functions, but at first have it take no actual variables (utilize pass to keep it as functioning Python code). In the comments inside a function, list the data type you think it should take in, what it should do, and what it should return.
  2. If you start to code a function or series of logic, you can safely leave it incomplete by having it raise NotImplementedError.
  3. Use assert to check any of your assumptions. A custom assert statement will save you lots of time later.
  4. While the Pythonic way is to utilize duck typing, I still prefer to do some type checking if there is potential for confusion. So I like to utilize things like isinstance or implement checks on attributes.
  5. Take advantage of your IDE’s additional formatting options. For example, Spyder specially highlights comments that start with TODO: with a little checkmark. Additionally, it supports code blocks and defines them by #%%. This lets you quickly run small chunks of a larger code.

What is the major advantages of coding like this?

  1. If your code always runs, it allows you to quickly find syntax errors and typos.
  2. You avoid implementing unused code. It sucks to really work on a code section to only realize later that you didn’t actually need it.
  3. You spend your time on the standard case and can add certain options or take care of edge cases when the appropriate time arises. Because sometimes that time will never arise…

Anyways, enough advice, start coding!

 

 

 

 

Basic Bash

Basically these are the only things I know about Bash :). These aren’t all truly Bash commands, but instead these are common commands that everyone should know when using the Linux or Mac command line.

 

Notes

First, the command follows the $ and is listed in bold, the == is just a spacer, and everything else is a description of the command. The <> symbols designate where some other name should go there (like a file, folder, username, etc). The * symbol designates a wildcard, and can be used in conjunction with a partial search term. This means *.txt matches file.txt, file1.txt, etc.

 

Very Basics Commands

$ Ctrl + C  == Kill whatever is running in the foreground

$ tab == complete current typing

$ <command> –help == lists options of a given command

$ Ctrl + A == Go to the beginning of the line you are currently typing on

$ Ctrl + E = Go to the end of the line you are currently typing on

$ Ctrl + U == Clears the line before the cursor position. If you are at the end of the line, clears the entire line.

 

Where Am I and How to I Move Elsewhere?

$ pwd == print working directory

$ cd == change directory

$ cd / == go to root

$ cd .. == go up one level

$ ls == list files and folders

$ ls -a == list all files and folders (including hidden)

 

Basic File/Folder Manipulation

$ mkdir <folder name> == create new folder

$ cp <old file name>  <new file name> == copy and rename a file

$ mv <file> <folder> == move file to a folder

$ rm <file> == delete file

$ rm -r <folder> == delete folder

$ cat <file> == show content of file

$ head -n <number of lines> <file> == show top n lines of file

$ tail -n <number of lines> <file> == show last n lines of file

 

Nano

This is a basic file editor

$ nano <existing file> == opens up a file

$ nano <non-existing file> == creates and opens up a file

Within nano, the needed commands are listed at the bottom where the ^ symbol stands for Ctrl.

 

What is currently running? Can I stop it?

$ top == list all processes running on a computer

$ top -u <username> == lists processes being run by username

$ kill <pid> == kill a process identified by pid, which can be found by using top

$ killall -u <username> == kills all processes being run by username

 

How to run stuff in the background

$ <command> & == runs command in background

If you want to run jobs on a remote server, there will be some queueing system. Check out useful commands like qsub, qstat, etc. However, if you just want to run multiple processes on a single computer (even after logging off), Screen and Tmux are the tools you need. I personally use screen, and below are some useful commands.

 

Screen

$ screen -S <name> == new screen with name

$ Ctrl-a d == detach from screen

$ screen -r name == reattach to screen

$ screen -ls == list of screens

$ exit == kills currently attached screen

Example Running Code in the Background

Here is an example set of commands that will run a Python script in the background.

$ screen -S test

$ python test.py > out.txt 2>&1 &

$ Ctrl-a d

You now can safely exit and the computer will do all the tough work for you!

I did add one new command in there, the redirect (>). What this means is that Python runs test.py. If there is anything that should be printed to the terminal, it gets redirected and saved into the file out.txt. Otherwise when you reattach to the screen named test, you will just get the recent terminal output (usually there is some display length limit).