Interview Questions VII: Integrated Brownian Motion

For a standard Weiner process denoted , calculate

This integral is the limit of the sum of at each infinitessimal time slice from to , it is called Integrated Brownian Motion.

We see immediately that this will be a random variable with expectation zero, so the result of the squared expectation will simply be the variance of the random variable, which is what we need to calculate. Here I show two ways to solve this, first by taking the limit of the sum and second using stochastic integration by parts

Need help? Discuss this in the Interview Questions forum!

1) Limit of a sum

(1)

This sum of values along the Wenier process is not independent of one-another, since only the increments are independent. However, we can re-write them in terms of sums of these independent increments

(2)

where are the individual independent increments of the brownian motion. Substituting into our previous equation and reversing the order of the summation

(3)

which is simply a weighted sum of independent gaussians. To calculate the total variance, we sum the individual variances using the summation formula for

(4)

which is the solution.

2) Stochastic integration by parts

The stochastic version of integration by parts for potentially stochastic variables , looks like this:

Re-arranging this gives

Now setting and we have

(5)

We recognise this as a weighted sum of independent gaussian increments, which is (as expected) a gaussian variable with expectation 0 and variance that we can calculate with the Ito isometry

(6)

which is the solution.

Interview Questions VI: The Drunkard’s Walk

A drunk man is stumbling home at night after closing time. Every step he takes moves him either 1 metre closer or 1 metre further away from his destination, with an equal chance of going in either direction (!). His home is 70 metres down the road, but unfortunately, there is a cliff 30 metres behind him at the other end of the street. What are the chances that he makes it to his house BEFORE tumbling over the edge of the cliff?

This is a fun question and quite heavy on conditional probability. We are trying to find the probability that the drunkard has moved 70 metres forward BEFORE he has ever moved 30 metres backward, or visa versa. There are several ways of attempting this, including some pretty neat martingale maths, but I’m going to attempt it here in the language of matrices and markov chains.

Essentially, there are 100 states that the drunkard can be in, from right beside the cliff all the way down the road to right beside his front door. Let’s label these from 1 to 100, in terms of the number of metres away from the cliff, so that he starts in state 30. At each step, he transitions from his current state to either one state higher or one state lower with probability 50%, and the process continues until he reaches either the cliff or the door, at which point the process will cease in either a good or a very bad night’s rest. We call these two states, 0 and 100, ‘absorbers’, because the process stops at this point and no transitions to new states can happen. A markov diagram that illustrates this process can be drawn like this:

We can characterise each step in the process by a transition matrix acting on a state vector. The drunkard initially has a state of 30 metres, so his state vector is a long string of zeroes with a single 1 at the 30th position:

$S_0&space;=&space;\begin{pmatrix}&space;0&space;\\&space;\vdots&space;\\&space;1\\&space;\vdots\\&space;0&space;\end{pmatrix}$

This vector is probabalistic – a 1 indicates with 100% certainty that the drunkard is in the 30th state. However, with subsequent moves this probability density will be spread over the nearby states as his position’s probability density diffuses into other states. The transition matrix multiplies the drunkard’s state vector to give his state vector after another step:

$P&space;=&space;\begin{pmatrix}&space;1&space;&&space;0.5&space;&&space;0&space;&&space;\cdots&space;&&space;0&space;&&space;0\\&space;0&space;&&space;0&space;&&space;0.5&space;&&space;\cdots&space;&&space;0&space;&&space;0\\&space;0&space;&&space;0.5&space;&&space;0&space;&&space;\cdots&space;&&space;0&space;&&space;0\\&space;0&space;&&space;0&space;&&space;0.5&space;&&space;&&space;0&space;&&space;0&space;\\&space;\vdots&space;&&space;\vdots&space;&&space;\vdots&space;&&space;\ddots&space;&&space;\vdots&space;&&space;\vdots\\&space;0&space;&&space;0&space;&&space;0&space;&&space;\cdots&space;&&space;0.5&space;&&space;1&space;\end{pmatrix};&space;\quad&space;S_{i+1}&space;=&space;P&space;\cdot&space;S_i$

So, after one step the drunkard’s state vector will have a 0.5 in the 29th and the 31st position and zeroes elsewhere, saying that he will be in either of these states with probability 0.5, and certainly nowhere else. Note that the 1’s at the top and bottom of the transition matrix will absorb and probability that arrives at that state for the rest of the process.

To keep things simple, let’s consider a much smaller problem with only six states, where state 1 is ‘down the cliff’ and state 6 is ‘home’; and we’ll come back to the larger problem at the end. We want to calculate the limit of the drunkard’s state as the transition matrix is applied a large number of times, ie. to calculate

$S_n&space;=&space;P^n&space;\cdot&space;S_0$

An efficient way to calculate powers of a large matrix is to first diagonalise it. We have $\inline&space;P&space;=&space;U&space;A\&space;U^{-1}$, where $\inline&space;A$ is a diagonal matrix whose diagonal elements are the eigenvalues of $\inline&space;P$, and $\inline&space;U$ is the matrix whose columns are the eigenvectors of $\inline&space;P$. Note that, as $\inline&space;P$ is not symmetric, $\inline&space;U^{-1}$ will have row vectors that the the LEFT eigenvectors of $\inline&space;P$, which in general will be different to the right eigenvectors. It is easy to see how to raise $\inline&space;P$ to the power n

$P^n&space;=&space;(U&space;A\&space;U^{-1})^n&space;=&space;U&space;A^n\&space;U^{-1}$

since all of the $\inline&space;U$s and $\inline&space;U^{-1}$s in the middle cancel. Since $\inline&space;A$ is diagonal, to raise it to the power of n we simlpy raise each element (which are just the eigenvalues of $\inline&space;P$) to the power of n. To calculate the eigenvalues of P we solve the characteristic equation

$|P&space;-&space;\lambda_a&space;I|&space;=&space;0$

with

$P&space;=&space;\begin{pmatrix}&space;1&space;&&space;0.5&space;&&space;0&space;&&space;0&space;&&space;0&space;&&space;0\\&space;0&space;&&space;0&space;&&space;0.5&space;&&space;0&space;&&space;0&space;&&space;0\\&space;0&space;&&space;0.5&space;&&space;0&space;&&space;0.5&space;&&space;0&space;&&space;0\\&space;0&space;&&space;0&space;&&space;0.5&space;&&space;0&space;&&space;0.5&space;&&space;0&space;\\&space;0&space;&&space;0&space;&&space;0&space;&&space;0.5&space;&&space;0&space;&&space;0\\&space;0&space;&&space;0&space;&&space;0&space;&&space;0&space;&&space;0.5&space;&&space;1&space;\end{pmatrix}$

This gives six eigenvalues, two of which are one (these will turn out to correspond to the two absorbing states) and the remainder are strictly $\inline&space;0\leq&space;\lambda_a&space;<&space;1$. Consequently, when raised to the power n in the diagonal matrix above, all of the terms will disappear except for the first and the last eigenvalue which are 1 as n becomes large.

Calculating the eigenvectors is time consuming and we’d like to avoid it if possible. Luckily, in the limit that n gets large, we’ve seen that most of the eigenvalues raised to the power n will go to zero which will reduce significantly the amount we need to calculate. We have

$P^n\cdot&space;S&space;=&space;\bigl(&space;U&space;A^n&space;\&space;U^{-1}\bigr)\cdot&space;S$

and in the limit that n gets large, $\inline&space;U&space;\cdot&space;A^n$ is just a matrix of zeroes with a one in the upper left and lower right entry. At first sight it looks as though calculating $\inline&space;U^{-1}$ will be required, which is itself an eigenvector problem, but in fact we only have to calculate a single eigenvector – the first (or equivalently the last), which will give the probability of evolving from an initial state S to a final state 0 (or 100).

$\inline&space;U^{-1}$ is the matrix of left eigenvectors of $\inline&space;P$, each of which satisfies

$x_a&space;\cdot&space;P&space;=&space;\lambda_a&space;x_a$

and we are looking for the eigenvectors corresponding to eigenvalue of 1, so we need to solve the matrix equation

$x&space;\cdot&space;\begin{pmatrix}&space;0&space;&&space;0.5&space;&&space;0&space;&&space;0&space;&&space;0&space;&&space;0\\&space;0&space;&&space;-1&space;&&space;0.5&space;&&space;0&space;&&space;0&space;&&space;0\\&space;0&space;&&space;0.5&space;&&space;-1&space;&&space;0.5&space;&&space;0&space;&&space;0\\&space;0&space;&&space;0&space;&&space;0.5&space;&&space;-1&space;&&space;0.5&space;&&space;0&space;\\&space;0&space;&&space;0&space;&&space;0&space;&&space;0.5&space;&&space;-1&space;&&space;0\\&space;0&space;&&space;0&space;&&space;0&space;&&space;0&space;&&space;0.5&space;&&space;0&space;\end{pmatrix}&space;=&space;0$

We know that if he starts in the first state (ie. over the cliff) he must finish there (trivially, after 0 steps) with probability 100%, so that $\inline&space;x_1&space;=&space;1$ and $\inline&space;x_6&space;=&space;0$. The solution to this is

$x_n&space;=&space;{6&space;-&space;n&space;\over&space;5}$

which is plotted here for each initial state

which says that the probability of ending up in state 1 (down the cliff!) falls linearly with starting distance from the cliff. We can scale up this final matrix equation to the original 100 by 100 state space, and find that for someone starting in state 30, there is a 70/100 chance of ending up down the cliff, and consequently only a 30% chance of getting home!

This problem is basically the same as the Gambler’s ruin problem, where a gambler in a casino stakes $1 on each toss of a coin and leaves after reaching a goal of$N or when broke. There are some very neat methods for solving them via martingale methods that don’t use the mechanics above that I’ll look at in a future post.

Interview Quesions III

Today’s question will test some of the statistics and correlation I’ve discussed in the last couple of months. Assume throughout that  and  are jointly normally distributed such that

a) Calculate
b) Calculate

The first expectation is of a lognormal variate, and the second is of a lognormal variate conditional on some earlier value of the variate having been a particular value – these are very typical of the sorts of quantities that a quant deals with every day, so the solution will be quite instructive! Before reading the solution have a go at each one, the following posts may be useful: SDEs pt. 1, SDEs pt. 2, Results for Common Distributions

a) Here, we use the standard result for expectations

b) This one is a little tougher, so first of all I’ll discuss what it means and some possible plans of attack. We want to calculate the expectation of , given that takes a value of . Of course, if and were independent, this wouldn’t make any difference and the result would be the same. However, because they are correlated, the realised value of will have an effect on the distribution of .

To demonstrate this, I’ve plotted a few scatter-graphs illustrating the effect of specifying on , with and uncorrelated and then becoming increasing more correlated.

The simplest way of attempting this calculation is to use the result for jointly normal variates given in an earlier post, which says that if and have correlation , we can express in terms of and a new variate  which is uncorrelated with

so

Since the value of is already determined (ie. = ), I’ve separated this term out and the only thing I have to calculate is the expectation of the second term in . Since and are independent, we can calculate the expectation of the , which is the same process as before but featuring slightly more complicated pre-factors

We can check the limiting values of this – if  then and are independent [this is not a general result by the way – see wikipedia for example – but it IS true for jointly normally distributed variables], in this case  just as above. If , , which also makes sense since in this case , so fully determines the expectation of .

The more general way to solve this is to use the full 2D joint normal distribution as given in the previous post mentioned before,

This is the joint probability function of and , but it’s not quite what we need – the expectation we are trying to calculate is

So we need to calculate the conditional expectation of given , for which we need Bayes’ theorem

Putting this together, we have

This integral is left as an exercise to the reader, but it is very similar to those given above and should give the same answer as the previous expression for  after some simplification!

Stochastic Differential Equations Pt2: The Lognormal Distribution

This post follows from the earlier post on Stochastic Differential Equations.

I finished last time by saying that the solution to the BS SDE for terminal spot at time T was

$\inline&space;S_T&space;=&space;S_0&space;e^{(\mu&space;-&space;{1&space;\over&space;2}\sigma^2)T&space;+&space;\sigma&space;W_T}$

When we solve an ODE, it gives us an expression for the position of a particle at time T. But we’ve already said that we are uncertain about the price of an asset in the future, and this expression expresses that uncertainty through the $\inline&space;\sigma&space;W_T$ term in the exponent. We said in the last post that the difference between this quantity at two different times s and t was normally distributed, and since this term is the distance between t=0 and t=T (we have implicitly ignored a term $W_0$, but this is ok because we assumed that the process started at zero) it is also normally distributed,

$W_T&space;\sim&space;{\mathbb&space;N}(0,T)$

It’s a well-known property of the normal distribution (see the Wikipedia entry for this and many others) that if $X&space;\sim&space;{\mathbb&space;N}(0,1)$ then $aX&space;\sim&space;{\mathbb&space;N}(0,a^2)$ for constant a. We can use this in reverse to reduce $W_T$ to a standard normal variable x, by taking a square root of time outside of the distribution so $W_T&space;\sim&space;\sqrt{T}\cdot{\mathbb&space;N}(0,1)$ and we now only need standard normal variables, which we know lots about. We can repeat our first expression in these terms

$\inline&space;S_T&space;=&space;S_0&space;e^{(\mu&space;-&space;{1&space;\over&space;2}\sigma^2)T&space;+&space;\sigma&space;\sqrt{T}&space;X}$

What does all of this mean? In an ODE environment, we’d be able to specify the exact position of a particle at time T. Once we try to build in uncertainty via SDEs, we are implicitly sacrificing this ability, so instead we can only talk about expected positions, variances, and other probabilistic quantities. However, we certainly can do this, the properties of the normal distribution are very well understood from a probabilistic standpoint so we expect to be able to make headway! Just as X is a random variable distributed across a normal distribution, S(t) is now a random variable whose distribution is a function of random variable X and the other deterministic terms in the expression. We call this distribution the lognormal distribution since the log of S is distributed normally.

The random nature of S is determined entirely by the random nature of X. If we take a draw from X, that will entirely determine the corresponding value of S, since the remaining terms are deterministic. The first things we might want to do are calculate the expectation of S, its variance, and plot its distribution. To calculate the expectation, we integrate over all possible realisations of X weighted by their probability, complete the square and use the gaussian integral formula with a change of variables

${\mathbb&space;E}[S_t]&space;=&space;\int^{\infty}_{-\infty}&space;S_t(x)&space;p(x)&space;dx$

$={S_0&space;\over&space;\sqrt{2\pi}}\int^{\infty}_{-\infty}&space;e^{(\mu-{1\over&space;2}\sigma^2)t&space;+&space;\sigma&space;\sqrt{t}x}&space;e^{-{1\over&space;2}x^2}&space;dx$

$={&space;{S_0&space;e^{(\mu-{1\over&space;2}\sigma^2)t}}&space;\over&space;\sqrt{2\pi}}\int^{\infty}_{-\infty}&space;e^{-{1\over&space;2}x^2&space;+&space;\sigma&space;\sqrt{t}&space;x}&space;dx$

$={{S_0&space;e^{(\mu-{1\over&space;2}\sigma^2)t}}&space;\over&space;\sqrt{2\pi}}\int^{\infty}_{-\infty}&space;e^{-{1\over&space;2}&space;(x&space;-&space;\sigma&space;\sqrt{t})^2}&space;e^{{1\over&space;2}\sigma^2&space;t}&space;dx$

$={{S_0&space;e^{\mu&space;t}}&space;\over&space;\sqrt{2\pi}}\int^{\infty}_{-\infty}&space;e^{-{1\over&space;2}&space;y^2}&space;dy$

$=S_0&space;e^{\mu&space;t}$

which is just the linear growth term acting over time [exercise: calculate the variance in a similar way]. We know what the probability distribution of X looks like (it’s a standard normal variable), but what does the probability distribution of S look like? We can calculate the pdf using the change-of-variables technique, which says that if S = g(x), then the area under each curve in corresponding regions must be equal:

$\int_{x_1}^{x_2}&space;p_x(x)&space;dx&space;=&space;\int_{g(x_1)}^{g(x_2)}&space;p_S(S)&space;dS$

$p_x(x)&space;dx&space;=&space;p_S(S)&space;dS$

$p_S(S_t)&space;=&space;p_x(x)&space;{dx&space;\over&space;dS_t}$

We know the function S(x), but the easiest way to calculate this derivative is first to invert the function t make it ameanable to differentiation

$x&space;=&space;{\ln{S_t&space;\over&space;S_0}&space;-&space;(\mu&space;-&space;{1&space;\over&space;2}\sigma^2)t&space;\over&space;\sigma&space;\sqrt{t}}$

${dx&space;\over&space;dS_t}&space;=&space;{1&space;\over&space;\sigma&space;\sqrt{t}&space;S_t}$

So the pdf of S expressed in terms of S is

$p_S(S_t)&space;=&space;{1&space;\over&space;S_t&space;\sigma&space;\sqrt{2\pi&space;t}}&space;\exp{-\Bigl(\ln{S_t\over&space;S_0}&space;-&space;(\mu-{1\over&space;2}\sigma^2)t&space;\Bigr)^2\over&space;2&space;\sigma^2&space;t}$

Well it’s a nasty looking function indeed! I’ve plotted it below for a few typical parameter sets and evolution times.

This distribution is really central to a lot of what we do, so I’ll come back to it soon and discuss a few more of its properties. The one other thing to mention is that if we want to calculate an expected value over S (which will turn out to be something we do a lot), we have two approaches – either integrate over $\inline&space;p_S(S_t)$

${\mathbb&space;E}[f(S_t)]&space;=&space;\int_0^{\infty}&space;f(S_t)p_S(S_t)dS$

or,instead express the function in terms of x instead (using $\inline&space;S_t&space;=&space;S_0&space;e^{(\mu&space;-&space;{1&space;\over&space;2}\sigma^2)t&space;+&space;\sigma&space;\sqrt{t}&space;x}$) and instead integrate over the normal distribution

${\mathbb&space;E}[f(S_t(x))]&space;=&space;\int_{-\infty}^{\infty}&space;f(x)p_x(x)dx$

This is typically the easier option. I think it is called the Law of the Unconscious Statistician. On that note, we’ve certainly covered enough ground for the moment!

-QuantoDrifter