Temporal Difference Learning

How can humans or machines interact with an environment and learn a strategy for selecting actions that are beneficial to their goals? Answers to this question fall under the artificial intelligence category of reinforcement learning. Here I am going to provide an introduction to temporal difference (TD) learning, which is the algorithm at the heart of reinforcement learning.

I will be presenting TD learning from a computational neuroscience background. My post has been heavily influenced by Dayan and Abbott Ch 9, but I have added some additional points. The ultimate reference for reinforcement learning is the book by Sutton and Barto, and their chapter 6 dives into TD learning.

Conditioning

To start, let’s review conditioning. The most famous example of conditional is Pavlov’s dogs. The dogs naturally learned to salivate upon the delivery of food, but Pavlov realized that he could condition dogs to associate the ringing of a bell with the delivery of food. Eventually, the ringing of the bell on its own was enough to cause dogs to salivate.

The specific example of Pavlov’s dogs is an example of classical conditioning. In classical conditioning, no action needs to be taken. However, animals can also learn to associate actions with rewards and this is called operant conditioning.

Before I introduce some specific conditioning paradigms, here are the important definitions:

• $s$ = stimulus
• $r$ = reward
• $x$ = no reward
• $v$ = value, or expected reward (generally a function of $r$, $x$)
• $u$ = binary, indicator variable, of stimulus (1 if stimulus present, 0 otherwise)

Here are the conditioning paradigms I want to discuss:

• Pavlovian
• Extinction
• Blocking
• Inhibitory
• Secondary

For each of these paradigms, I will introduce the necessary training stages and the final result. The statement, $a \rightarrow b$, means that $a$ becomes associated ($\rightarrow$) with $b$.

Pavlovian

Training: $s \rightarrow r$. The stimulus is trained with a reward.

Results: $s \rightarrow v[r]$. The stimulus is associated with the expectation of a reward.

Extinction

Training 1: $s \rightarrow r$. The stimulus is trained with a reward. This eventually leads to successful Pavlovian training.

Training 2: $s \rightarrow x$. The stimulus is trained with a no reward.

Results: $s \rightarrow v[x]$. The stimulus is associated with the expectation of no reward. Extinction of the previous Pavlovian training.

Blocking

Training 1: $s_1 \rightarrow r$. The first stimulus is trained with a reward. This eventually leads to successful Pavlovian training.

Training 2: $s_1 + s_2 \rightarrow r$. The first stimulus and a second stimulus is trained with a reward.

Results: $s_1 \rightarrow v[r]$, and $s_2 \rightarrow v[x]$. The first stimulus completely explains the reward and hence “blocks” the second stimulus from being associated with the reward.

Inhibitory

Training: $s_1+s_2 \rightarrow x$, and $s_1 \rightarrow r$. The combination of two stimuli leads to no reward, but the first stimuli is trained with a reward.

Results: $s_1 \rightarrow v[r]$, and $s_2 \rightarrow -v[r]$. The first stimuli is associated with the expectation of the reward while the second stimuli is associated with the negative of the reward.

Secondary

Training 1: $s_1 \rightarrow r$. The first stimulus is trained with a reward. This eventually leads to successful Pavlovian training.

Training 2: $s_2 \rightarrow s_1$. The second stimulus is trained with the first stimulus.

Results: $s_2 \rightarrow v[r]$. Eventually the second stimulus is associated with the reward despite never being directly associated with the reward.

Rescorla-Wagner Rule

How do we turn the various conditioning paradigms into a mathematical framework of learning? The Rescorla Wagner rule (RW) is a very simple model that can explain many, but not all, of the above paradigms.

The RW rule is a linear prediction model that requires these three equations:

1. $v=w \cdot u$
2. $\delta = r-v$
3. $w_{new} = w_{old}+\epsilon \delta u$

and introduces the following new terms:

• $w$ = weights associated with stimuli state
• $\epsilon$ = learning rate, with $0 \le \epsilon \le 1$

What do each of these equations actually mean?

1. The expected reward, $v$, is a linear dot product of a vector of weights, $w$, associated with each stimuli, $u$.
2. But there may be a mismatch, or error, between the true actual reward, $r$, and the expected reward, $v$.
3. Therefore we should update our weights of each stimuli. We do this by adding a term that is proportional to a learning rate $\epsilon$, the error $\delta$, and the stimuli $u$.

During a Pavlovian pairing of stimuli with reward, the RW rule predicts an exponential approach of the weight to $w = \langle ru\rangle$ over the course of several trials for most values of $\epsilon$ (if $\epsilon=1$ it would instantly update to the final value. Why is this usually bad?). Then if the reward stops being paired with the stimuli, the weight will exponential decay over the course of the next trials.

The RW rule will also continue to work when the reward/stimulus pairing is stochastic instead of deterministic and the will will still approach the final value of $w = \langle ru\rangle$.

How does blocking fit into this framework? Well the RW rule says that after the first stage of training, the weights are $w_1 = r$ and $w_2 = 0$ (since we have not presented stimulus two). When we start the second stage of training and try and associate stimulus two with the reward, we find that we cannot learn that association. The reason is that there is no error (hence $\delta = 0$) and therefore $w_2 = 0$ forever. If instead we had only imperfectly learned the weight of the first stimulus, then there is still some error and hence some learning is possible.

One thing that the RW rule incorrectly predicts is secondary conditioning. In this case, during the learning of the first stimulus, $s_1$, the learned weight becomes $w_1 >0$. The RW rule predicts that the second stimulus, $s_2$, will become $w_2 <0$. This is because this paradigm is exactly the same as inhibitory conditioning, according to the RW rule. Therefore, a more complicated rule is required to successfully have secondary conditioning

One final note. The RW rule can provide an even better match to biology by assuming a non-linear relationship between $v$ and the animal behavior. This function is often something that exponentially saturates at the maximal reward (ie an animal is much more motivated to go from 10% to 20% of the max reward rather than from 80% to 90% of the max reward). While this provides a better fit to many biological experiments, it still cannot explain the secondary conditioning paradigm.

Temporal Difference Learning

To properly model secondary conditioning, we need to explicitly add in time to our equations. For ease, one can assume that time, $t$, is discrete and that a trial lasts for total time $T$ and therefore $0 \le t \le T$.

The straightforward (but wrong) extension of the RW rule to time is:

1. $v[t]=w[t-1] \cdot u[t]$
2. $\delta[t] = r[t]-v[t]$
3. $w[t] = w[t-1]+\epsilon \delta[t] u[t]$

where we will say that it takes one time unit to update the weights.

Why is this naive RW with time wrong? Well, psychology and biology experiments show that animals expected rewards does NOT reflect the past history of rewards nor just reflect the next time step, but instead reflects the expected rewards during the WHOLE REMAINDER of the trial. Therefore a better match to biology is:

1. $v[t]=w[t-1] \cdot u[t]$
2. $R[t]= \langle \sum_{\tau=0}^{T-t} r[t+\tau] \rangle$
3. $\delta[t] = R[t]-v[t]$
4. $w[t] = w[t-1]+\epsilon \delta[t] u[t]$

where $R[t]$ is the full reward expected over the remainder of the trial while $r[t]$ remains the reward at a single time step. This is closer to biology, but we are still missing a key component. Not all future rewards are treated equally. Instead, rewards that happen sooner are valued higher than rewards in the distant future (this is called discounting). So the best match to biology is the following:

1. $v[t]=w[t-1] \cdot u[t]$
2. $R[t]= \langle \sum_{\tau=0}^{T-t} \gamma^\tau r[t+\tau] \rangle$
3. $\delta[t] = R[t]-v[t]$
4. $w[t] = w[t-1]+\epsilon \delta[t] u[t]$

where $0 \le \gamma \le 1$ is the discounting factor for future rewards. A small discounting factor implies we prefer rewards now while a large discounting factor means we are patient for our rewards.

We have managed to write down a set of equations that accurately summarize biological reinforcement. But how can we actually learn with this system? As currently written, we would need to know the average reward over the remainder of the whole trial. Temporal difference learning makes the following assumptions in order to solve for the expected future rewards:

1. Future rewards are Markovian
2. Current observed estimate of reward is close enough to the typical trial

A Markov process is memoryless in that the next future step only depends on the current state of the system and has no other history dependence. By assuming rewards follow this structure, we can make the following approximation:

• $R[t]= \langle r[t+1] \rangle + \gamma \langle \sum_{\tau=1}^{T-t} \gamma^{\tau-1} r[t+\tau]$
• $R[t]= \langle r[t+1] \rangle + \gamma R[t+1]$

The second approximation is called bootstrapping. We will use the currently observed values rather than the full estimate for future rewards. So finally we end up at the temporal difference learning equations:

1. $v[t]=w[t-1] \cdot u[t]$
2. $R[t] = r[t+1] + \gamma v[t+1]$
3. $\delta[t] =r[t+1] + \gamma v[t+1]-v[t]$
4. $w[t] = w[t-1]+\epsilon \delta[t] u[t]$

Dayan and Abbott, Figure 9.2. This illustrates TD learning in action.

I have included an image from Dayan and Abbott about how TD learning evolves over consecutive trials, please read their Chapter 9 for full details.

Finally, I should mention that in practice, people often use the TD-Lambda algorithm. This version introduces a new parameter, lambda, which controls how far back in time one can make adjustments. Lambda 0 implies one time step only, while lambda 1 implies all past time steps. This allows TD learning to excel even if the full system is not Markovian.

Dopamine and Biology’s TD system

So does biology actually implement TD learning? Animals definitely utilize reinforcement learning and there is strong evidence that temporal difference learning plays an essential role. The leading contender for the reward signal is dopamine. This is a widely used neurotransmitter that evolved in early animals and remains widely conserved. There are a relatively small number of dopamine neurons (in the basal ganglia and VTA in humans) that project widely throughout the brain. These dopamine neurons can produce an intense sensation of pleasure (and in fact the “high” of drugs often comes about either through stimulating dopamine production or preventing its reuptake).

There are two great computational neuroscience papers that highlight the important connection between TD learning and dopamine that analyze two different biological systems:

Both of these papers deserved to be read in detail, but I’ll give a brief summary of the bee foraging paper here. Experiments were done that tracked bees in an controlled environment consisting of “yellow flowers” and “blue flowers” (which were basically just different colored cups). These flowers had the same amount of nectar on average, but were either consistent or highly variable. The bees quickly learned to only target the consistent flowers. These experimental results were very well modeled by assuming the bee was performing TD learning with a relatively small discount factor (driving it to value recent rewards).

TD Learning and Games

Playing games is the perfect test bed for TD learning. A game has a final objective (win), but throughout play it can be difficult to determine your probability of winning. TD learning provides a systematic framework to associate the value of a given game state with the eventual probability of learning. Below I highlight the games that have most significantly showcased the usefulness of reinforcement learning.

Backgammon

Backgammon is a two person game of perfect information (neither player has hidden knowledge) with an element of chance (rolling dice to determine one’s possible moves). Gerald Tesauro’s TD-Gammon was the first program to showcase the value of TD learning, so I will go through it in more detail.

Before getting into specifics, I need to point out that there are actually two (often competing) branches in artificial intelligence:

Symbolic logic tends to be a set of formal rules that a system needs to follow. These rules need to be designed by humans. The connectionist approach uses artificial neural networks and other approaches like TD learning that attempt to mimic biological neural networks. The idea is that humans set up the overall architecture and model of the neural network, but the specific connections between “neurons” is determined by the learning algorithm as it is fed real data examples.

Tesauro actually created two versions of a backgammon program. The first was called Neurogammon. It was trained using supervised learning where it was given expert games as well as games Tesauro played against himself and told to learn to mimic the human moves. Neurogammon was able to play at an intermediate human level.

Tesauro’s next version of a backgammon program was TD-Gammon since it used the TD learning rule. Instead of trying to mimic the human moves, TD-Gammon used to the TD learning rule to assign a score to each move throughout a game. The additional innovation is that the TD-Gammon program was trained by playing games against itself. This initial version of TD-Gammon soon matched Neurogammon (ie intermediate human level). TD-Gammon was able to beat experts by both using a supervised phase on expert games as well as a reinforcement phase.

Despite being able to beat experts, TD-Gammon still had a weakness in the endgame. Since it only looked two-moves ahead, it could miss key moves that would have been found by a more thorough analytical approach. This is where symbolic logic excels and hence TD-Gammon was a great demonstration of the complimentary strength and weaknesses of symbolic vs connectionist logic.

Go

Go is a two person game of perfect information with no element of chance. Despite this perfect knowledge, the game is complex enough that there are around $10^170$ possible games (for reference, there are only about $10^80$ atoms in the whole universe). So despite the perfect information, there are just too many possible games to determine the optimal move.

Recently AlphaGo made a huge splash by beating one of the world’s top players of Go. Most Go players, and even many artificial intelligence researchers, thoughts an expert level Go program was years away. So the win was just as surprising as when DeepBlue beat Kasparov in chess. AlphaGo is a large program with many different parts, but at the heart of it is a reinforcement learning module that utilizes TD learning (see here or here for details).

Poker

The final frontier in gaming is poker, specifically multi-person No-Limit Texas Hold’em. The reason this is the toughest game left is that it is a multi-player game with imperfect information and an element of chance.

Last winter the computer systems won against professionals for the first time in a series of heads up matches (computer vs only one human). Further improvements are needed to actually beat the best professionals at a multi-person table, but these results seem encouraging for future successes. The interesting thing to me is that both AI system seems to have used only a limited amount of reinforcement learning. I think that fully embracing reinforcement and TD learning should be the top priority for these research teams and might provide the necessary leap in ability. And they should hurry since others might beat them to it!

NSF GRFP 2016-2017

For a couple of years now, I have had a website with my thoughts on the National Science Foundation Graduate Research Fellowship (NSF GRFP) and examples of successful essays. The popularity of the site in the past few years has grown well beyond what I expected, so this year I’m going to use this blog to try out a few new things.

Questions from You

I end up getting lots of emails asking for advice. While sometimes the advice really does merit an individualized result, many of the questions are applicable to everyone. So in the interest of efficiently answering questions, here is my plan this year.

2. I will not answer any questions about eligibility due to gaps in graduate school because I am honestly clueless on it.
3. If you feel comfortable asking the question publicly, post it by commenting below.
4. If you want to ask me privately, send me an email (my full name at gmail.com, include NSF GRFP Question in subject line). I will try and answer you and also work with you on a public question/answer that I can include here.

FAQ

Here are some past questions I have been asked and/or questions I anticipate being asked this year.

• My research is closely related to medicine. Am I still eligible?
• I think the best test for this is to ask your advisor if they would apply to NSF or NIH for grants on this topic. If NSF you are definitely good, but if NIH, you will need to reframe the research to fit into NSF.
• I am a first year graduate student. Should I apply this year or wait until my second year? (New issue this year since incoming graduate students can only apply once).
• This is the toughest question for me since no one has had to make this choice yet. However, here is how I would personally decide. The important thing to remember is that undergrads and graduate students are each separately graded. So you really need to decide how you currently rank relative to your peers versus how you will rank next year. If you did a bunch of undergrad research, have papers, etc, definitely apply as a first year. If you didn’t, it might payoff to wait, but only if your program lets you get right into research. If you will just be taking classes, I’m less confident your relative standing will improve. Good luck to everyone with this tough choice!

Unfortunately, I now get more requests to read essays than I can reasonably accomplish. But I am still willing to read over a few and here is how I will decide on the essays to read.

1. If you are in San Diego, and you think I am a better fit for you than the other local people on the experienced resource list,  send me an email with the subject NSF GRFP Experienced Resource List.
2. If you are not in San Diego, first check out the experienced resource list and also ask around your school for other resources.
3. If you can’t find anyone to read your essays, fill out this form. I will semi-randomly select essays to read.

What do I mean by semi-randomly? Well, in the interest of supporting the NSF GRFP’s goal of increasing the diversity of graduate school, I will give priority to undergrads who are without a local person on the experienced resource list and/or are from underrepresented groups. The NSF GRFP specifically “encourages women, members of underrepresented minority groups, persons with disabilities, and veterans to apply”, and I am willing to extremely loosely define minority group by race, ethnicity, sexual orientation, family socio-economic status, geography, colleges that traditionally send few students to graduate school, etc. The form is fill in the blank, so feel free to justify your inclusion in any other underrepresented group that I did not explicitly list.

I’ll then take the prioritized list and make some random selection. The number of people I select this way will depend on the number of local people I end up advising, but I will definitely read at least 2 non-local applications.

Here is a my time-line for essay reading:

• Sept 16th – Random drawing number 1
• Sept 30th Extended to Oct 5th – Random drawing number 2 (I’ll include everyone again, so early birds get double the chances of being selected)
• Oct 21st – Last day I will help people (sorry I’m traveling near the deadline)

Best Machine Learning Resources

Machine learning is a rapidly evolving field that is generating an intense interest from a wide audience. So how can you get started?

For now, I’m going to assume that you already have the basic programming (ie general introduction to programming and experience with matrices) and mathematical skills (calculus and some probability and linear algebra).

These are the best current books on machine learning:

These are some out of date books that still contain some useful sections (for example, Murphy several times refers you to Bishop or MacKay for more details).

Here is a list of other potential resources:

I3: International Institute for Intelligence

While I was previously discussing my opinion of Open AI, I mentioned that I would do something different if I was in charge. Here is my dream.

What OpenAI is Missing

Helping everyday people throughout the whole world.

OpenAI’s stated goal is:

OpenAI is a non-profit artificial intelligence research company. Our goal is to advance digital intelligence in the way that is most likely to benefit humanity as a whole, unconstrained by a need to generate financial return.

In the short term, we’re building on recent advances in AI research and working towards the next set of breakthroughs.

However, based on their actions so far, this interview with Ilya Sutskever, and popular press articles, the main focus of OpenAI appears to be advanced research in an artificial intelligence by stressing open source, as well as thinking longterm about the impacts of letting advanced artificial intelligence systems control large aspects of our life. While I strongly support these goals, in reality, these will not benefit all of humanity. Instead, it only benefits those with either the necessary training (which is a minimum of a bachelors, but usually means a masters or PhD) or money (to hire top people, buy the required computing resources, etc) to take advantage of the advanced research. So this leaves out the developing world as well as the poor in developed countries, ie contrary to their stated goal, OpenAI is missing the vast majority of humanity.

While one can argue that by making OpenAI’s research open source, eventually it will trickle down and help a wider swath of humanity. However, the current trend suggests that large corporations are best poised to benefit the most from the next revolution (I mean, who is more likely to invent a self driving car, Google, or someone in a developing country?). Additionally, these innovations focus on first world problems (since these are the highest paying customers). And finally, each round of innovation ends up creating fewer and fewer jobs (so the number of unemployed in developed countries may expand). I firmly believe that unless there is a global educational effort (and probably an implementation of basic income), the benefits of AI will be directed towards a tiny sliver of the world’s population.

My Proposal: I3

Here I lay out my proposal for a new institute that would actually expand the benefits of recent and future advances in machine learning / artificial intelligence to a wider swath of humanity. I don’t claim that it would truly benefit all of humanity (again, see basic income), but it is a way for research advances to reach a larger proportion of it.

I propose a new education and research institute focused on artificial intelligence, machine learning, and computational neuroscience which I’ll call the International Institute for Intelligence. I like alliterations, and since I think it should focus on three types of intelligence, I especially like the idea of calling it I3 or I-Cubed for short.

Why these three research areas? Well, machine learning is currently revolutionizing how companies use data and is facilitating new technological advances everyday. Designing artificial intelligence systems on top of these machine learning algorithms seems like a realistic possibility in the near future. The less conventional choice is computational neuroscience. I think it is important to include for two reasons. First, the brain is the best example we have of an intelligent system, so until we actually design an artificial intelligence, it seems best to understand and mimic the best example (this is the philosophy of Deep Mind according to Demis Hassabis). Second, the US Brain Initiative  and similar international efforts are injecting significant resources into neuroscience, with the hopes of sparking a revolution similar in spirit and magnitude to the widespread effect the Human Genome Project had on biotechnology and genomics. So I figure we might as well prepare everyone for this future.

So what would be the actual purpose of I3? Sticking with the theme of threes, I propose three initiatives that I will list in my order of importance as well as some bonus points.

1. International PhD Education

The central goal is to similar program to ICTP (International Centre for Theoretical Physics) but with a different research emphasis. So what is ICTP? It was founded by Nobel Prize Winner Abdus Salam and it has several programs to promote research in developing countries, including:

• Predoctoral program – students get a 1 year course to prep them for PhDs
• Visiting PhD program – students in a developing nation PhD program get to spend a couple of months each year for 3 years at ICTP to participate in their research
• Conferences
• Regional offices (currently Sao Paolo, Brazil, but more in the planning)

So the idea is to implement a similar program but with the research emphasis now focused on machine learning, artificial intelligence, and computational neuroscience. While I think the main thing is to get the predoctoral program and visiting PhD program started, eventually it would be great to have 5 regional offices spread throughout the developing world. For example, I think one is needed in South America (Lima, Peru?), one in Africa (Nairobi, Kenya?), and 2 in Asia (India, and China, but not in a traditional technological center). And assuming I3 is based in the US (see my case for San Diego below), it would be great to have an affiliate office in Europe, maybe in Trieste next to ICTP.

One additional initiative that I think could be useful would be paying people to not leave their country and instead help them establish a research center at their local universities. This could also wait until later because it might be easiest to convince some of the future alumni of the predoctoral or visiting PhD programs to return/stay in their home country.

A second additional initiative would be to encourage professors from developed and developing countries to take their sabbatical at I3. This would provide a fresh stream of mentors and set up potential future collaborations. This is a blend of two programs at KITP (this and that).

2. US Primary School Education

The science pipeline analogy is overused, but I don’t have a better one yet. So currently, the researchers in I3 focused areas are predominately male, white or Asian, and middle to upper class. So not a very representative sample of the US (or world) population. Therefore, the best longterm solution is to get a more diverse set of students interested in the research at a young age.

Technically this should have a higher priority over the next initiative (US College Education), but since there are other non-profits interested in this (for example, CodeNow), maybe I3 does not need to be a leader in this and instead can play a supporting role.

3. US College Education

And again back to science pipeline analogy, if we are to have a more diverse set of researchers, we need to encourage a diverse set of undergrads to pursue relevant majors and continue on into graduate programs. This won’t be solved by any single program, but here are some potential ideas.

• US underrepresented students could apply for the same 1 year program that is offered to international students.
• Assist universities in establishing bridge programs that partner research universities with colleges that have significant minority populations. A great example of this is the Vanderbilt-Fisk Physics program.
• US colleges would also benefit from the proposed sabbatical program offered to international researchers. I also like the KITP idea of extending it to undergraduate only institutes (especially those with large minority populations) as a way to get more undergrads interested in research.
• Establish a complete set of free college curriculum for machine learning, artificial intelligence, and computational neuroscience. While there are many useful MOOCs on these topics, I still don’t think they beat an actual course.

Bonus #1 : Research

ICTP has proven that it is possible to further global educational goals and still succeed at research. I would argue that the people working at I3 should mainly be evaluated for tenure based on their mentorship and teaching of students. Research of course will play a role (otherwise it would be poor mentorship of future researchers), but I think there shouldn’t be huge pressure to bring in grants, high-profile publications, etc. But even without that emphasis, there is no way that a group of smart people with motivated students will not lead to great research.

Bonus #2: International Primary and College Education

This is longer term, but if there are successful programs in improving the US primary and college education, international regional offices, and PhD alumni who are in their home countries, it seems like there should be possible to leverage those connections into a global initiative to improve primary and college education.

Final Thoughts

So Elon Musk, Peter Thiel, and friends, if you have another billion you want to donate (or Open AI funds to redirect), here is my proposal. In reality, implementing all of my ideas would probably cost several billions, but once you got the center founded, I think that it would be easy to get tech companies, the US government, and even UNESCO to help provide funding.

My final point is that I think San Diego would be a perfect location. I know I’m biased since I live here now, but there a many legitimate reasons San Diego is great for this institute.

1. UCSD already partners with outside research institutes (Salk, Scripps, etc)
2. UCSD (and Salk, etc) are leaders in all of these research areas
3. It is extremely easy to convince people to take a sabbatical in San Diego

While there are many other great potential locations, I strongly suggest that I3 is not in the Bay Area, Seattle, Boston, or New York City. These cities already have plenty of tech jobs, please spread the wealth to other parts of the US.

Anyways, I’ll keep dreaming that someday I’ll get to work at a place like the one I just described.

Life at Low Reynolds Number

This is part of my “journal club for credit” series. You can see the other computational neuroscience papers in this post.

Unit: Diffusion

Organized by Ben Regner

1. Standard Diffusion
2. Anomalous Diffusion
3. Life at Low Reynold’s Number

Papers

Life at Low Reynold’s Number. By Purcell in 1977.

Introduction

This is one of my favorite papers. The presentation style is extremely fun and readable without sacrificing any scientific integrity. I think it serves as a great introduction to fluid mechanics at low Reynold’s number. I don’t have too many comments since I think the paper explains it the best, but I will provide a few supplementary details for a more in depth exploration of the ideas from the paper.

And just to get you excited about fluid dynamics, I present an example of laminar flow:

Basics of Fluid Mechanics

The fundamental equation of fluid mechanics is Navier-Stokes. The relevant version for this paper is the incompressible flow equations with pressure but no other external fields:

$\frac{\partial \vec{u}}{\partial t}+ \vec{u}\cdot\nabla\vec{u} +\frac{1}{\rho}\nabla p -\nu\nabla^2\vec{u}=0$

where $\vec{u}$ is the velocity vector, $\vec{x}$ is position, $\rho$ is density, $p$ is pressure, and $\nu$ is the kinematic viscosity. This equation can be made non-dimensional by the introduction of a characteristic velocity $U$, length $L$, and introducing the dynamic viscosity $\eta=\nu/\rho$. This gives the following dimensionless variables:

$u^* = \frac{u}{U}$

$x^* = \frac{x}{L}$

$p^* = \frac{pL}{\eta U}$

$t^* = \frac{L}{U}$

Substituting in these characteristic length scales and doing some algebra, one arrives at the simplified equations:

$R\frac{\partial \vec{u^*}}{\partial t^*}+ R\vec{u^*}\cdot\nabla^*\vec{u^*} +\nabla^* p^*-(\nabla^*)^2\vec{u^*}=0$

with only one dimensionless constant, the Reynold’s number, defined as:

$R = \frac{UL\rho}{\eta} = \frac{UL}{\nu}$

As explained in the paper, Reynold’s number is one of the essential constants describing a flow. High Reynold’s number leads to turbulent (chaotic) flow, while low Reynold’s number leads to laminar (smooth) flow. For extemely small Reynold’s number, Navier-Stokes simplifies to:

$\nabla^* p^* = (\nabla^*)^2\vec{u^*}$

which is also just called Stoke’s equation.

At the end of the paper, Purcell describes another dimensionless number which he calls $S$ and in a footnote identifies as the Sherwood number. However, Ben Regner pointed out, that Purcell’s $S$ would actually be called the Peclet number today.

Basics of Ecoli Chemotaxis

Chemotaxis and cellular sensing really deserves its own series of papers. But in the meantime, I recommend the following resources

Video Proof of Purcell’s Scallop Theorem

Reversible kicking does fine in water (high Reynold’s number)…

… but the same motion has issues in corn syrup (low Reynold’s number).

Here is a solution similar to what Ecoli and other bacteria employ.

Fundamental Questions

• Purcell does an amazing job, so I have nothing to add.

• What are some other strategies that are employed in biology to get around the issue of mobility at low Reynold’s number? Hint: I already linked to a video of one strategy. There are at least two other strategies, but to find these you will need to think about the assumptions leading to the basic Navier-Stokes equations.

Anomalous Diffusion

This is part of my “journal club for credit” series. You can see the other computational neuroscience papers in this post.

Unit: Diffusion

Organized by Ben Regner

1. Standard Diffusion
2. Anomalous Diffusion
3. Life at Low Reynold’s Number

What is anomalous diffusion?

If one measures the mean square displacement vs time, it can be parameterized as

$< x^2> = t^\alpha$

where $\alpha=1$ is Brownian (standard diffusion), $0<\alpha<1$ is subdiffusive, $1<\alpha<2$ is superdiffusive, and ballistic is $\alpha=2$. So the technical definition of anomalous diffusion is $0<\alpha<1$ or $1<\alpha<2$.

How to describe anomalous diffusion?

Currently, there is no “best” or “simple” description of anomalous diffusion in the general case. However, continuous-time random walks (CTRW) are one paradigm that I find helpful as a conceptual and simulation framework.

In the simplest discrete random walk (DRW), at every time step, a particle makes a jump of fixed size, the only question is the direction. The next generalization has the particle make a jump at every time step, but now it draws the jump size from a distribution.

The idea of a CTRW is that there is now a distribution both of the waiting time between jumps, and the jump size. If the waiting time follows the exponential distribution and the jump size follows the normal distribution, one ends up with the Wiener process aka standard diffusion and Brownian motion.

What causes anomalous diffusion?

Just as a reminder, there are three conditions that need to be satisfied for Brownian motion (standard diffusion):
1. Increments are independent
2. Increments are wide sense stationary. 1st moment and autocovariance don’t depend on time (this is weaker condition then complete stationarity)
3. Zero mean

The third condition is often ignored by examining the motion relative to the mean displacement (ie the actual displacement is not Brownian, but fluctuations in the displacement could be Brownian). So really, the first two are the more important conditions. Therefore, anomalous diffusion arises due to non-independent increments and/or correlations in time of the mean and/or standard deviation.

The CTRW allows one to think more precisely about different mechanisms that can give rise to anomalous diffusion. There is not one single way to get sub or super-diffusion in CTRW, since there are two, potentially dependent, distributions (waiting time and jump size). However, there are a few common situations that seem to arise often in biology and elsewhere (see Random walk models in biology, Box 2 for original idea). Subdiffusion in biology is often caused by longer waiting time distributions (compared to exponential), or molecular crowding, while superdiffusion in occurs when jump sizes are drawn from a Levy flight or other alpha stable distributions.

Examples

For further exploration of anomalous diffusion in biology, I recommend these papers

• This is an interesting paper that introduces a renormalization group approach to classifying diffusion processes

Standard Diffusion

This is part of my “journal club for credit” series. You can see the other computational neuroscience papers in this post.

Unit: Diffusion

Organized by Ben Regner

1. Standard Diffusion
2. Anomalous Diffusion
3. Life at Low Reynold’s Number

Papers

Brownian Motion. By Einstein in 1905.

Brownian Motion. By Langevin in 1908.

An Introduction to Fractional Diffusion. By Henry, Langlands, and Straka in 2010.

What is diffusion?

Diffusion is the general process by which small particles move from regions of high concentration to low concentration. Check out the link to the Wikipedia articles above for some cool videos and animations. Diffusion is extremely ubiquitous and plays an essential role in biology. For example, oxygen diffuses from your lungs to unoxygenated blood, which then delivers it to the rest of your body where it diffuses out of your blood and into your cells. Additionally, signals between neurons are transmitted by several different diffusing molecules.

Mathematically, standard diffusion is described by two fundamental equations.

Fick’s First Law: Particles move from high-to-low concentration.

$j=-D\frac{\partial n}{\partial x}$

where $n$ is the number of particles, $x$ is the location of the particles, $D$ is the diffusion constant, and  $j$ is the flux of particles.

Fick’s Second Law: Conservation of particles combined with Fick’s First Law leads to the diffusion equation.

If particles cannot be created or destroyed, they follow a conservation law:

$\frac{\partial n}{\partial t} = -\frac{\partial j}{\partial x}$

Combining the conservation law with Fick’s First Law gives us the diffusion equation:

$\frac{\partial n}{\partial t} = D \frac{\partial^2 n}{\partial x^2}$

Brownian Motion

In 1827 Robert Brown looked at pollen in water under a microscope, see Wikipedia page for simulations of the observations. Much to his surprise, the pollen acts as if it alive! Brown verified that pollen is not alive and any small, inorganic particle followed similar motion. In 1905, during Einstein’s miracle year, he wrote a paper on an atomistic description that describes Brownian Motion. In 1908 Langevin used a different approach (that is “infinitely simpler” in his words) to describe Brownian motion. The general explanations are outlined below.

1. Einstein’s Derivation

Einstein’s goal was a probability based description of Brownian motion that connects to Fick’s law. Einstein makes several assumptions about the particles, including

In the end, Einstein finds a solution that is Gaussian, implying that the mean square displacement is linear in time for Brownian motion:

$< x^2> = t$

More generally, the mean square displacement could depend on some power of time, usually parameterized as

$< x^2> = t^\alpha$

where $\alpha=1$ is Brownian, $0<\alpha<1$ is subdiffusive, $1<\alpha<2$ is superdiffusive, and ballistic is $\alpha=2$. Note, one can get up to $\alpha=3$ in certain turbulent regimes.

2. Langevin’s Derivation
The Langevin approach is to start with a particle based description. The first assumption is the equipartition theorem to determine the kinetic energy (KE)
$KE = \frac{k_B T}{2} = m (\frac{d^2 x}{dt^2})^2$

Then, one looks at the actual forces on the particle:

KE = Stoke’s + stochastic variable
$m (\frac{d^2 x}{dt^2})^2 = -6 \pi \eta r \frac{dx}{dt} + X$
where $X$ is a stochastic variable. It is assumed to be zero mean, unit variance, and no time correlations, aka white noise.

After multiplying both sides of the equation by x, doing some algebra, and then taking the average solution, one arrives at the same results as Einstein (after ignore a short time transient).

3. Random Walk Derivation.

There is a third way to derive Brownian motion that is layed out in the book chapter above. The idea is to look at a single particle and do a microscopic random walk. One can set up a recursive definition that defines a binomial probability solution. After a large number of steps, the central limit theorem applies and we end up with a Gaussian solution.

How do we get Brownian motion?

In general, there are three conditions that need to be satisfied for Brownian motion:
1. Increments are independent
2. Increments are wide sense stationary. 1st moment and autocovariance don’t depend on time (this is weaker condition then complete stationarity)
3. Zero mean

The third condition is often ignored by examining the motion relative to the mean displacement (ie the actual displacement is not Brownian, but fluctuations in the displacement could be Brownian). So really, the first two are the more important conditions.

Fundamental Questions

• Einstein made three major assumptions in his derivation. 2/3 are often violated by biology, which assumption is relatively safe?
• What biological processes do you think are actually diffusive vs sub/super-diffusive? Think about the 3 conditions for Brownian motion listed above. Note, this is a preview for the next post.