Research

Posted on January 7, 2019 Experts, Research

PointPillars: Fast Encoders for Object Detection from Point Clouds

I’m excited to finally be able to share some of the stuff I have been working on since joining nuTonomy: an Aptiv company. We recently released our paper on PointPillars (with code), a cutting edge method for object detection using point clouds.

First, what problem are we trying to solve? Our company goal is to create the software stack to run a self driving taxi. Specifically, the machine learning team’s charter is to tackle the problems that are too tough to model explicitly. Therefore, our method of choice is deep learning, and we usually work closely with the raw sensor data. Our cars have a 360 degree coverage through multiple lidars, cameras, and radars (check out nuScenes for our actual data!), but of these lidar is the most important sensor. Lidar is a laser ranging sensor that provides sparse, yet accurate, points in the 3D world. These point clouds are the key inputs for 3D object detection since they allow precise localization in the real world.

The ideal deep learning model would incorporate all sensor modalities (lidar, cameras, and radar), but a first step is to separately model each sensor. Images are relatively easy since a multitude of methods exist in the literature, so our research has focused on how to do lidar and radar. I’m going to keep focusing on lidar since it is the main sensor, but everything that follows about PointPillars could equally well be used on radar after a few minor changes as I’ll explain later.

So, what was the state of the art for lidar only object detection when we started our research? There were two main schools of thought that are best represented by PIXOR and VoxelNet. The fundamental difference is how to represent the sparse lidar point cloud. One school of thought (PIXOR, MV3D, …) is to create a set of fixed, hand crafted features. The other school (PointNet, Frustum PointNet, VoxelNet, SECOND) believes in end to end learning and just lets the network learn directly from the point cloud. From a performance and engineering perspective, end to end learning is always better because (1) the network should always be able to match (and usually far exceed) fixed encodings and (2) we let the network do the hard work of finding the encoder, rather than having to devote engineer’s time to discover the right encoding. So we should all do end to end learning!

But there is always a catch. The issue with VoxelNet is that it is too slow to run in realtime. The central problem is that they chose to do end to end learning on voxels. This forces them to use 3D convolutions which are extremely slow. In contrast, PIXOR can just use 2D convolutions which are well optimized for GPU computing. If only there was a way to blend the performance of end to end learning with the speed of fixed encoders.

It turns out, we found a method to do so: PointPillars. The fundamental realization (courtesy of Oscar Beijbom) was that pillars are the best representation. A pillar is a vertical column that can extend infinitely up and down. By learning end to end on pillars, we achieved state of the art detection performance on the KITTI leaderboard at blazing fast speeds (60 to >100 Hz), for a 2-4 fold improvement in runtime.

A few more details on why we are so fast. First, by using pillars, we eliminate 3D convolutions since we immediately learn a 2D representation. Second, we sped up the network by eliminating parameters in the encoder and network. Third, while our initial model for training is in PyTorch, we convert that model to a NVIDIA TensorRT planfile which allows additional optimizations for GPUs.

So where do we go from here? The next step is to work on sensor fusion. First, we need a radar network, which at first glance looks like it might require more work. But it turns out, radar is also a sparse point cloud of range returns. While lidar points return the x, y, z position and reflectance of an object, radar returns the radial range, angular velocity, and a host of other features. So we can just plug in radar point clouds to PointPillar and go! Since the radar returns have worse spatial localization than lidar, it turns out the radar only network doesn’t give great performance. Finally, now that we have separate networks for lidar, images, and radar, it is time to fuse them together! We are actively working on this now and hopefully I can share some of our tricks soon.

Posted on July 13, 2017July 13, 2017 General science, Research

Deep Learning Tips

I thought I would write up some general tips and tricks that I have learned by experimenting with neural networks. My focus is on tips that apply to any problem and any neural network architecture, and in fact, some of these tips apply more generally to any machine learning algorithm. So what I have learned over the years?

Data Splits

Before doing anything else, you need to split the dataset into training and testing. But how much data should go into each split? This depends on your number of samples and the number of classes. For example, MNIST has only 10 digits with little variation in each digit, so the standard split is around 80% train and 20% test. ImageNet has over a million samples of 1000 diverse classes, so they use around 50% train and 50% test. So if you have an easy problem and/or a small dataset, I would suggest 80% train and 20% test. If you have a very tough problem and/or a large dataset, I would suggest 50% train and 50% test.

The test data should now be put in a lock box and only used on your final model.

Next you also should set aside some of the training data for validation which is used to determine generalization results when tuning hyperparameters. I would suggest around 20% of the training data to be used as a validation.

Finally, I do a little bit of cheating and I data snoop. I usually take a very tiny amount of the data, maybe 1-5% and play around with it. I will inspect the data to make sure that it looks good, and use the small number of samples to debug my initial code and very roughly tune the hyperparameters. This saves you the headache of doing a long training session only to find out that you had a bug in your code or grossly misunderstood where to start your hyperparameter search.

Data Preprocessing

As a general rule, the data should be standardized by preprocessing. I’ll discuss some specific standardizations below, but a general issue is whether to standardize by the whole dataset, per sample, or per feature. I tend to default to per sample, but I don’t have a good scientific reason why that is the best. If you standardize by the whole dataset or per feature, you need to make sure you only use the training data to set the scales. If you standardize per feature, make sure that all of your features have significant variation before doing so (see MNIST for an example where per feature standardization can lead to weird results since many features have a standard deviation of zero).

Mean

All numerical data should be mean centered, no questions asked. If you classes can be robustly classified just by the mean difference, then you don’t need a neural network. You have a very simple problem and should just use a simple threshold discriminator.

Scaling

I highly recommend scaling the data so that it is all order 1. This can speed up training because most initialization schemes of weights assume that the data is mean centered and has values around the size of 1. But there are two possible ways to scale your data: standard deviation or by the range. If you data looks normally distributed, then standard deviation makes sense. Otherwise I just divide by the maximum of the absolute value.

Correlations

In theory, it can also be helpful to remove correlations between features by using PCA or ZCA whitening. However, in practice you may run into numerical stability issues since you will need to invert a matrix. So this is worth considering, but takes some more careful application.

Data Augmentation

More training data is always better, but obtaining that data can be expensive. So I always try hard to find a way to do data augmentation. However, the correct data augmentation is usually problem specific, so I won’t go into details here.

Early Stopping

The no free lunch theorem of machine learning states that there is no general learning algorithm that will solve all problems. However, Geoff Hinton has pointed out that early stopping is as close to a free lunch as we can get. Early stopping is the easiest way for any machine learning algorithm to avoid overfitting, and you can read more about the technical justifications for it at Distill’s momentum article.

Optimizer

SGD vs Adam

In practice, all optimizers for neural networks involve some form of stochastic gradient descent (SGD). The only questions is whether you need to manually tune the learning rate and other parameters, or whether you use an adaptive version of SGD that automatically adjusts the learning rates. I think the best adaptive method is Adam (and Nadam when possible, see later subsection on momentum). So for me the choice is simple: either plain SGD or Adam/Nadam. For a more complete comparison of SGD variants, I highly recommend this blog post.

Learning Rate

If you are using Adam, you will rarely need to tune the learning rate. But for SGD, the learning rate is by far the most important parameter to tune. A nice tip from Yoshua Bengio is this: the optimal learning rate is often an order of magnitude lower than the smallest learning rate that blows up the loss. So this means, start with a high learning rate and work your way down a half order of magnitude at a time (for example: 1, 0.3, 0.1, …). Then start your fine grained learning rate search about an order of magnitude below the last time the loss blew up.

Another useful tweak on the learning rate is to have it decay over the course of training. I find that this slightly improves the final performance, but more importantly leads to consistent training results. There are a variety of ways to implement the decay, but I’m not sure they make that much of a difference. My standard implementation is

$l_{batch} = \frac{l_{start}}{1+decay*(N_{batches})}$

where $N_{batches}$ is the number of minibatches seen so far during training. I then set decay so that the final learning rate at the end of all the epochs is 1/10th the starting learning rate.

Momentum

Momentum is very useful for neural networks, but in practice I spend minimal time tuning the momentum rate because I have a few default settings that I strongly recommend.

First, I really only consider three possible momentum values: 0.5, 0.9, and 0.99. Since the maximum effect of momentum is $\frac{1}{1-momentum}$ , my default values are roughly spaced by an order of magnitude. I always start with 0.9 and go from there.

Also, I always choose Nesterov momentum whenever possible. Most packages, like Keras, have Nesterov as an option for SGD, and Keras also has Nadam, which is Adam with Nesterov momentum. For more details on Nesterov, see here. The short explanation is that it leads to the same maximum effect of $\frac{1}{1-momentum}$ , but it does so in a more gradual manner. In practice, this means that while standard momentum gets very unstable above 0.9, Nesterov momentum can be safely set to 0.99.

Another useful tip is to set the momentum to a smaller value (say half your standard value) for the final few epochs (maybe the last 5-10% of epochs). The intuition for why this is helpful is that hopefully by the end of training, the neural network is close to good weights, but it might be rocking back and forth around the optimal weights. Since the neural network weight space is highly non-convex, by tuning down the momentum, you force the neural network to settle down into these non-convex “valleys” that may contain the best weights.

The final tip, originally suggested here, is to exponentially ramp up and down the momentum anytime you want to change the momentum rate during training. This gives the weights updates time to adjust to the new momentum rates. I personally have found this gives a very slight improvement in performance, but more importantly it leads to consistent training results.

Summary of my momentum tips:

Peak momentum values of: 0.5, 0.9, or 0.99
Always choose Nesterov momentum if possible
Start momentum initially at half the desired peak value and exponentially ramp up
Towards the end of training, exponentially ramp down momentum to half the desired peak value.
Train for 5-10% of epochs at the desired smaller momentum.

Initialization

All weights should be initialized to an orthogonal matrix. This is extremely important for recurrent neural networks (as explained here), but I have also found it to be useful for all neural networks.

Activation Function

The standard is that all hidden layers are ReLUs unless you need the hidden layers to be a valid probability, in which case you should use a sigmoid.

Loss

Choosing the right loss function is very problem dependent, so I will leave that for another day. However, whatever loss function you do choose, make sure the output layer activation function is complimentary to that loss, see Michael Nielsen’s book for details on why sigmoid outputs and crossentropy losses are complimentary.

Regularization

Weights

Weight regularization is almost always a requirement to prevent overfitting and to get good generalization. The two main choices are L1 or L2 regularization. L1 will ensure that small weights are set to zero, and hence will lead to a sparser set of weights. L2 prevents weights from becoming too large, but does not sparsify the weights. Personally, rather than choosing between the two, I tend to default to both. I set L1 to be very small so that I at least get slightly sparser weights, but then I mainly focus on tuning L2 to control overfitting.

Activity

Dropout and batch normalization are not regularizers in the traditional sense, but in practice they help reduce overfitting by controlling the activation outputs. Additionally, it is extremely difficult to train very deep neural networks without using either dropout or batchnorm. Dropout was the standard for several years, but now it is usually replaced by batchnorm.

Parameter Tuning

Neural networks have a lot of interdependent hyperparameters to tune, so picking which ones to tune first is kind of a chicken and the egg problem. Personally, I start off with an adaptive optimizer (like Adam or Nadam) and then tune the architecture. Next I will roughly tune the regularization. Once that leads to acceptable results, I will switch the optimizer to SGD and only focus on tuning the learning rate. If SGD seems promising, I will then tune other parameters like decay and momentum. Hopefully by this point, you are achieving pretty good results. I will then use this neural network as the starting point for a systematic hyperparameter search to truly find the best results.

Final Tips

Don’t take my word for anything, try it out yourself! I strongly recommend experimenting with every option you can find in Keras and see for yourself what actually will work. I also suggest getting opinions from as many people as possible (see Yoshua Bengio’s tips). I think that about 90% of the advice will overlap, but everyone has their own bias. So hopefully be reading enough independent sources, you can average out all our mistakes. Good luck!

Posted on April 16, 2017April 16, 2017 Academia, General science, Research

Research Experience for Undergrads (REU)

This National Science Foundation program is designed to give undergraduates, especially those from smaller schools, a chance to gain real research experience for a summer. Personally I participated in one official REU and one program modeling on REUs. I learned a lot (and they were tons of fun!). The best part is not the specific topic you research, but the opportunity to learn how to be a researcher.

Most of the applications are due in February. Check out the the official NSF REU website for the latest details.

When you are ready to apply, go here to search for programs of REUs in various subjects. Also, search the internet for other research opportunities; Harvard has a nice list of research programs for undergrads. For more detailed tips on applications, I recommend this site.

If you want to get an idea of what an REU is like, here are some interviews of past Math REU participants. And also keep in mind these research tips for undergrads if you do get an REU.

Posted on April 13, 2017April 13, 2017 Experts, Research

QFT Resources

Quantum Field Theory is a notoriously difficult subject to learn, but I found the following resources to be extremely helpful when I took the course a few years ago. I just learned about a few resources that I wish I had then, so here are my current tips for learning QFT.

Books:

Tony Zee’s book QFT in a Nutshell provides a great intuition into what QFT is all about. If you actually want to do calculations, then Peskin and Schroeder’s book is a nice compliment. These two books were the heart of my studies into QFT.

David Tong’s Notes:

Great set of lecture notes that provides a different perspective.

Sidney Coleman’s Lectures:

Apparently, all modern QFT books are based on Coleman (since all the authors learned QFT from him or his students), and you can still see the original videos. For years there was a set of hand-written notes that served as a transcript of the video but this was recently LaTeXed and shared on the ArXiv.

Posted on February 23, 2017 Experts, Research

Deep Learning Seminar Course

This semester Terry Sejnowski is teaching a graduate seminar course that is focused on Deep Learning. The course meets weekly for two hours to discuss papers. Here I’ll just outline the course and in later posts I’ll add some thoughts on each specific week.

Week 1: Perceptrons

Week 2: Hopfield Nets and Boltzmann Machines

Week 3: Backprop

Week 4: Independent Component Analysis (ICA)

Week 5: Convolutional Neural Networks (CNN)

Week 6: Recurrent Neural Networks (RNN)

Week 7: Reinforcement Learning

Week 8: Information and Control Theory

4 Comments Posted on August 22, 2016October 7, 2018 General science, Research, Teaching

NSF GRFP 2016-2017

For a couple of years now, I have had a website with my thoughts on the National Science Foundation Graduate Research Fellowship (NSF GRFP) and examples of successful essays. The popularity of the site in the past few years has grown well beyond what I expected, so this year I’m going to use this blog to try out a few new things.

Questions from You

I end up getting lots of emails asking for advice. While sometimes the advice really does merit an individualized result, many of the questions are applicable to everyone. So in the interest of efficiently answering questions, here is my plan this year.

Before asking me, make sure you’ve read my advice, checked out the NSF GRFP FAQ, skimmed GradCafe, read my FAQ (next section), and checked out the comments for this blog post.
I will not answer any questions about eligibility due to gaps in graduate school because I am honestly clueless on it.
If you feel comfortable asking the question publicly, post it by commenting below.
If you want to ask me privately, send me an email (my full name at gmail.com, include NSF GRFP Question in subject line). I will try and answer you and also work with you on a public question/answer that I can include here.

FAQ

Here are some past questions I have been asked and/or questions I anticipate being asked this year.

My research is closely related to medicine. Am I still eligible?
- I think the best test for this is to ask your advisor if they would apply to NSF or NIH for grants on this topic. If NSF you are definitely good, but if NIH, you will need to reframe the research to fit into NSF.
I am a first year graduate student. Should I apply this year or wait until my second year? (New issue this year since incoming graduate students can only apply once).
- This is the toughest question for me since no one has had to make this choice yet. However, here is how I would personally decide. The important thing to remember is that undergrads and graduate students are each separately graded. So you really need to decide how you currently rank relative to your peers versus how you will rank next year. If you did a bunch of undergrad research, have papers, etc, definitely apply as a first year. If you didn’t, it might payoff to wait, but only if your program lets you get right into research. If you will just be taking classes, I’m less confident your relative standing will improve. Good luck to everyone with this tough choice!

Requests for Essay Reading

Unfortunately, I now get more requests to read essays than I can reasonably accomplish. But I am still willing to read over a few and here is how I will decide on the essays to read.

If you are in San Diego, and you think I am a better fit for you than the other local people on the experienced resource list, send me an email with the subject NSF GRFP Experienced Resource List.
If you are not in San Diego, first check out the experienced resource list and also ask around your school for other resources.
If you can’t find anyone to read your essays, fill out this form. I will semi-randomly select essays to read.

What do I mean by semi-randomly? Well, in the interest of supporting the NSF GRFP’s goal of increasing the diversity of graduate school, I will give priority to undergrads who are without a local person on the experienced resource list and/or are from underrepresented groups. The NSF GRFP specifically “encourages women, members of underrepresented minority groups, persons with disabilities, and veterans to apply”, and I am willing to extremely loosely define minority group by race, ethnicity, sexual orientation, family socio-economic status, geography, colleges that traditionally send few students to graduate school, etc. The form is fill in the blank, so feel free to justify your inclusion in any other underrepresented group that I did not explicitly list.

I’ll then take the prioritized list and make some random selection. The number of people I select this way will depend on the number of local people I end up advising, but I will definitely read at least 2 non-local applications.

Here is a my time-line for essay reading:

Sept 16th – Random drawing number 1
~~Sept 30th~~ Extended to Oct 5th – Random drawing number 2 (I’ll include everyone again, so early birds get double the chances of being selected)
Oct 21st – Last day I will help people (sorry I’m traveling near the deadline)

Posted on August 20, 2016 General science, Research, Teaching

Best Machine Learning Resources

Machine learning is a rapidly evolving field that is generating an intense interest from a wide audience. So how can you get started?

For now, I’m going to assume that you already have the basic programming (ie general introduction to programming and experience with matrices) and mathematical skills (calculus and some probability and linear algebra).

These are the best current books on machine learning:

Murphy. This is a comprehensive introduction to the whole field.
Learning From Data. This is a brief introduction to a subset of topics.
Deep Learning. Also check out my previous post.

These are some out of date books that still contain some useful sections (for example, Murphy several times refers you to Bishop or MacKay for more details).

Bishop. Predecessor to Murphy.
MacKay. Free pdf!
Hastie, Tibshirani, and Friedman. Free pdf!

Here is a list of other potential resources:

Posted on June 15, 2016 Experts, Research

Deep Learning in Python

So maybe after reading some of my past posts, you are fired up to start programming a deep neural network in Python. How should you get started?

If you want to be able to run anything but the simplest neural networks on easy problems, you will find that since pure Python is an interpreted language, it is too slow. Does that mean we have to give up and write our own C++ code? Luckily GPUs and other programmers come to your rescue by offering between 5-100X speedup (I would estimate my average speedup at 10X, but it varies for specific tasks).

There are two main Python packages, Theano and TensorFlow, that are designed to let you write Python code that can either run on a CPU or a GPU. In essence, they are each their own mini-language with the following changes from standard Python:

Tensors (generalizations of matrices) are the primary variable type and treated as abstract mathematical objects (don’t need to specify actual values immediately).
Computational graphs are utilized to organize operations on the tensors.
When one wants to actually evaluate the graph on some data, it is stored in a shared variable that when possible gets sent to the GPU. This data is then processed by the graph (in place of the original tensor placeholders).
Automatic differentiation (ie it understands derivatives symbolically).
Built in numerical optimizations.

So to get started you will want to install either Theano (pip install theano), TensorFlow (details here), or both. I personally have only used Theano, but if Google keeps up the developmental progress of TensorFlow, I may end up switching to it.

At the end of the day, that means that if one wants to actually implement neural networks in Theano or TensorFlow, you essentially will learn another language. However, people have built various libraries that are abstractions on top of these mini-languages. Lasagne is one example that basically organizes Theano code so that you have to interact less with Theano, but you will still need to understand Theano. I initially started with Theano and Lasagne, but I am now a convert to Keras.

Instead, I advocate for Keras (pip install keras) for two major reasons:

High level abstraction. You can write standard Python code and get a deep neural network up and running very quickly.
Back-end agnostic. Keras can run on either Theano or TensorFlow.

So it seems like a slam dunk right? Unfortunately life is never that simple, instead there are two catches:

Mediocre documentation (using Numpy as a gold standard, or even comparing to Lasagne). You can get the standard things up and running based on theirs docs. But if you want to do anything advanced, you will find yourself looking into their source code on GitHub, which has some hidden, but useful, comments.
Back-end agnostic. This means if you do want to introduce a modification to the back-end, and you want it to always work in Keras, you need to implement it in both Theano and TensorFlow. In practice this isn’t too bad since Keras has done a good job of implementing low-end operations.

Fortunately, the pros definitely outweigh the cons for Keras and I highly endorse it. Here are a few tips I have learned from my experience with Keras:

Become familiar with the Keras documentation.
I recommend only using the functional API which allows you to implement more complicated networks. The sequential API allows you to write simple models in fewer lines of code, but you lose flexibility (for example, you can’t access intermediate layers) and the code won’t generalize to complex models. So just embrace the functional API.
Explore the examples (here and here).
Check out the Keras GitHub.
Names for layers are optional keywords, but definitely use them! It will significantly help you when you are debugging.

Now start coding your own deep neural networks!

Posted on June 6, 2016June 6, 2016 General science, Research

General Programming Tips

I thought I would put together some useful programming tips that I have learned over the years. Most of these are general tips, but they are tailored towards Python.

Zen of Python. Even if you don’t use Python, these are good ideas to internalize.
The language documentation (Python’s standard library), StackOverflow, and Google searches are your best friends.
Utilize modern IDEs (like Spyder for Python) and tab-completion to reduce the number of basic errors.
Comments are not optional. The general logic of functions and objects should be understandable from the comments. Every block of code logic should have a short comment to aid future changes. If you find a chunk of code confusing now, it will be just as confusing if not worse in the future!
Use sensible variable names. This cuts down on the number/length of comments.
Try to adhere to the language standards (Python’s), but don’t obsess over it.
Set your own consistent standards (Do variable names end in s or not? Do boolean variables have similar style names? Etc).
When starting a project, do you best to get quickly get up to a basic working prototype. Working but incomplete code is always better than non-working code. Quick coding is aided by the next point…
Outline your code before starting. My tips for outlining in Python are detailed after this list.
Write modular code. Common tasks should be made into functions or objects.
Avoid magic numbers and hard coded values. Better to include a set of named parameters in one section of your code where the basic logic of these variables is explained.
Avoid multiple inheritance (check out this fun explanation of why this is bad).
A program should have a standard interface, I like to call it main, and a way to run the standard interface with some default values. In Python, utilize if __name__ == ‘__main__’: to define standard parameters and then call main(parameters). This aids the goal of always having working code, as well as making it easier to interact with different programs.
Check out these Python tricks (1-23 are the best, rest are more advanced).

Here are details on how I outline code in Python. I try my best to have running code at all times, even if it does absolutely nothing. If it isn’t real code, I leave it as a comment. Therefore, my programming tends to proceed as follows.

Outline the general logic of the code in comments. Define needed functions, but at first have it take no actual variables (utilize pass to keep it as functioning Python code). In the comments inside a function, list the data type you think it should take in, what it should do, and what it should return.
If you start to code a function or series of logic, you can safely leave it incomplete by having it raise NotImplementedError.
Use assert to check any of your assumptions. A custom assert statement will save you lots of time later.
While the Pythonic way is to utilize duck typing, I still prefer to do some type checking if there is potential for confusion. So I like to utilize things like isinstance or implement checks on attributes.
Take advantage of your IDE’s additional formatting options. For example, Spyder specially highlights comments that start with TODO: with a little checkmark. Additionally, it supports code blocks and defines them by #%%. This lets you quickly run small chunks of a larger code.

What is the major advantages of coding like this?

If your code always runs, it allows you to quickly find syntax errors and typos.
You avoid implementing unused code. It sucks to really work on a code section to only realize later that you didn’t actually need it.
You spend your time on the standard case and can add certain options or take care of edge cases when the appropriate time arises. Because sometimes that time will never arise…

Anyways, enough advice, start coding!

Posted on June 1, 2016March 18, 2017 General science, Research

Basic Bash

Basically these are the only things I know about Bash :). These aren’t all truly Bash commands, but instead these are common commands that everyone should know when using the Linux or Mac command line.

Notes

First, the command follows the $ and is listed in bold, the == is just a spacer, and everything else is a description of the command. The <> symbols designate where some other name should go there (like a file, folder, username, etc). The * symbol designates a wildcard, and can be used in conjunction with a partial search term. This means *.txt matches file.txt, file1.txt, etc.

Very Basics Commands

$ Ctrl + C == Kill whatever is running in the foreground

$ tab == complete current typing

$ <command> –help == lists options of a given command

$ Ctrl + A == Go to the beginning of the line you are currently typing on

$ Ctrl + E = Go to the end of the line you are currently typing on

$ Ctrl + U == Clears the line before the cursor position. If you are at the end of the line, clears the entire line.

Where Am I and How to I Move Elsewhere?

$ pwd == print working directory

$ cd == change directory

$ cd / == go to root

$ cd .. == go up one level

$ ls == list files and folders

$ ls -a == list all files and folders (including hidden)

Basic File/Folder Manipulation

$ mkdir <folder name> == create new folder

$ cp <old file name> <new file name> == copy and rename a file

$ mv <file> <folder> == move file to a folder

$ rm <file> == delete file

$ rm -r <folder> == delete folder

$ cat <file> == show content of file

$ head -n <number of lines> <file> == show top n lines of file

$ tail -n <number of lines> <file> == show last n lines of file

Nano

This is a basic file editor

$ nano <existing file> == opens up a file

$ nano <non-existing file> == creates and opens up a file

Within nano, the needed commands are listed at the bottom where the ^ symbol stands for Ctrl.

What is currently running? Can I stop it?

$ top == list all processes running on a computer

$ top -u <username> == lists processes being run by username

$ kill <pid> == kill a process identified by pid, which can be found by using top

$ killall -u <username> == kills all processes being run by username

How to run stuff in the background

$ <command> & == runs command in background

If you want to run jobs on a remote server, there will be some queueing system. Check out useful commands like qsub, qstat, etc. However, if you just want to run multiple processes on a single computer (even after logging off), Screen and Tmux are the tools you need. I personally use screen, and below are some useful commands.

Screen

$ screen -S <name> == new screen with name

$ Ctrl-a d == detach from screen

$ screen -r name == reattach to screen

$ screen -ls == list of screens

$ exit == kills currently attached screen

Example Running Code in the Background

Here is an example set of commands that will run a Python script in the background.

$ screen -S test

$ python test.py > out.txt 2>&1 &

$ Ctrl-a d

You now can safely exit and the computer will do all the tough work for you!

I did add one new command in there, the redirect (>). What this means is that Python runs test.py. If there is anything that should be printed to the terminal, it gets redirected and saved into the file out.txt. Otherwise when you reattach to the screen named test, you will just get the recent terminal output (usually there is some display length limit).

Share this:

Data Splits

Data Preprocessing

Mean

Scaling

Correlations

Data Augmentation

Early Stopping

Optimizer

SGD vs Adam

Learning Rate

Momentum

Initialization

Activation Function

Loss

Regularization

Weights

Activity

Parameter Tuning

Final Tips

Share this:

Share this:

Share this:

Week 1: Perceptrons

Week 2: Hopfield Nets and Boltzmann Machines

Week 3: Backprop

Week 4: Independent Component Analysis (ICA)

Week 5: Convolutional Neural Networks (CNN)

Week 6: Recurrent Neural Networks (RNN)

Week 7: Reinforcement Learning

Week 8: Information and Control Theory

Share this:

Questions from You

FAQ

Requests for Essay Reading

Share this:

Share this:

Share this:

Share this:

Notes

Very Basics Commands

Where Am I and How to I Move Elsewhere?

Basic File/Folder Manipulation

Nano

What is currently running? Can I stop it?

How to run stuff in the background

Screen

Example Running Code in the Background

Share this: