Interview Questions VII: Integrated Brownian Motion

For a standard Weiner process denoted W_t, calculate

    \[{\mathbb E} \Bigl[ \Bigl( \int^T_0 W_t dt \Bigr)^2 \Bigr]\]

This integral is the limit of the sum of W_t at each infinitessimal time slice dt from 0 to T, it is called Integrated Brownian Motion.

We see immediately that this will be a random variable with expectation zero, so the result of the squared expectation will simply be the variance of the random variable, which is what we need to calculate. Here I show two ways to solve this, first by taking the limit of the sum and second using stochastic integration by parts

Need help? Discuss this in the Interview Questions forum!

1) Limit of a sum

(1)   \begin{align*} I &= \int^T_0 W_t dt \nonumber \\ &= \lim_{n \to \infty}{\frac T n}\sum^n_{i=1} W_{\frac {Ti} n} \nonumber \end{align*}

This sum of values along the Wenier process is not independent of one-another, since only the increments are independent. However, we can re-write them in terms of sums of these independent increments

(2)   \begin{align*} W_{\frac {Ti} n} &= \sum^i_{j=1} ( W_{\frac {Tj} n} - W_{\frac {T(j-1)} n} ) \nonumber \\ &= \sum^i_{j=1} \overline{W}_{\frac {Tj} n} \nonumber \end{align*}

where \overline{W}_{\frac {T(j-1)} n} \sim {\mathbb N}(0,{\frac T n}) are the individual independent increments of the brownian motion. Substituting into our previous equation and reversing the order of the summation

(3)   \begin{align*} \lim_{n \to \infty}{\frac T n}\sum^n_{i=1} W_{\frac {Ti} n} &= \lim_{n \to \infty}{\frac T n} \sum^n_{i=1} \sum^i_{j=1} \overline{W}_{\frac {Tj} n} \nonumber \\ &= \lim_{n \to \infty}{\frac T n} \sum^n_{j=1} \sum^n_{i=j} \overline{W}_{\frac {Tj} n} \nonumber \\ &= \lim_{n \to \infty}{\frac T n} \sum^n_{j=1} (n-j+1) \overline{W}_{\frac {Tj} n} \nonumber \\ \end{align*}

which is simply a weighted sum of independent gaussians. To calculate the total variance, we sum the individual variances using the summation formula for \sum_1^N n^2

(4)   \begin{align*} {\mathbb V}[I] &= \lim_{n \to \infty} {\frac {T^2} {n^2}} \sum^n_{j=1} (n-j+1)^2 {\frac T n} \nonumber \\ &= \lim_{n \to \infty} {\frac {T^3} {n^3}} \Bigl( {\frac 1 3} n^3 + O(n^2) \Bigr) \nonumber \\ &= {\frac 1 3} T^3 \end{align*}

which is the solution.

2) Stochastic integration by parts

The stochastic version of integration by parts for potentially stochastic variables X_t, Y_t looks like this:

    \[X_T \cdot Y_T = X_0 \cdot Y_0 + \int^T_0 X_t dY_t + \int^T_0 Y_t dX_t\]

Re-arranging this gives

    \[\int^T_0 Y_t dX_t = \Bigl[ X_t \cdot Y_t \Bigr]^T_0 - \int^T_0 X_t dY_t\]

Now setting Y_t = -W_t and X_t = (T-t) we have

(5)   \begin{align*} \int^T_0 W_t dt &= \Bigl[ -W_t \cdot (T-t) \Bigr]^T_0 + \int^T_0 (T-t) dW_t \nonumber \\ &= \int^T_0 (T-t) dW_t \nonumber \end{align*}

We recognise this as a weighted sum of independent gaussian increments, which is (as expected) a gaussian variable with expectation 0 and variance that we can calculate with the Ito isometry

(6)   \begin{align*} {\mathbb E}\Bigl[ \Bigl( \int^T_0 (T-t) dW_t \Bigr)^2 \Bigr] &= {\mathbb E}\Bigl[ \int^T_0 (T-t)^2 dt \Bigr] \nonumber \\ &= \Bigl[ T^2t - Tt^2 + {\frac 1 3} t^3 \Bigr]^T_0 \nonumber \\ &= {\frac 1 3} T^3 \end{align*}

which is the solution.

Interview Questions VI: The Drunkard’s Walk

A drunk man is stumbling home at night after closing time. Every step he takes moves him either 1 metre closer or 1 metre further away from his destination, with an equal chance of going in either direction (!). His home is 70 metres down the road, but unfortunately, there is a cliff 30 metres behind him at the other end of the street. What are the chances that he makes it to his house BEFORE tumbling over the edge of the cliff?

This is a fun question and quite heavy on conditional probability. We are trying to find the probability that the drunkard has moved 70 metres forward BEFORE he has ever moved 30 metres backward, or visa versa. There are several ways of attempting this, including some pretty neat martingale maths, but I’m going to attempt it here in the language of matrices and markov chains.

Essentially, there are 100 states that the drunkard can be in, from right beside the cliff all the way down the road to right beside his front door. Let’s label these from 1 to 100, in terms of the number of metres away from the cliff, so that he starts in state 30. At each step, he transitions from his current state to either one state higher or one state lower with probability 50%, and the process continues until he reaches either the cliff or the door, at which point the process will cease in either a good or a very bad night’s rest. We call these two states, 0 and 100, ‘absorbers’, because the process stops at this point and no transitions to new states can happen. A markov diagram that illustrates this process can be drawn like this:

A markov chain displaying the transition probabilities for each state in the drunkard's walk
A markov chain displaying the transition probabilities for each state in the drunkard’s walk

We can characterise each step in the process by a transition matrix acting on a state vector. The drunkard initially has a state of 30 metres, so his state vector is a long string of zeroes with a single 1 at the 30th position:

S_0 = \begin{pmatrix} 0 \\ \vdots \\ 1\\ \vdots\\ 0 \end{pmatrix}

This vector is probabalistic – a 1 indicates with 100% certainty that the drunkard is in the 30th state. However, with subsequent moves this probability density will be spread over the nearby states as his position’s probability density diffuses into other states. The transition matrix multiplies the drunkard’s state vector to give his state vector after another step:

P = \begin{pmatrix} 1 & 0.5 & 0 & \cdots & 0 & 0\\ 0 & 0 & 0.5 & \cdots & 0 & 0\\ 0 & 0.5 & 0 & \cdots & 0 & 0\\ 0 & 0 & 0.5 & & 0 & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots\\ 0 & 0 & 0 & \cdots & 0.5 & 1 \end{pmatrix}; \quad S_{i+1} = P \cdot S_i

So, after one step the drunkard’s state vector will have a 0.5 in the 29th and the 31st position and zeroes elsewhere, saying that he will be in either of these states with probability 0.5, and certainly nowhere else. Note that the 1’s at the top and bottom of the transition matrix will absorb and probability that arrives at that state for the rest of the process.

To keep things simple, let’s consider a much smaller problem with only six states, where state 1 is ‘down the cliff’ and state 6 is ‘home’; and we’ll come back to the larger problem at the end. We want to calculate the limit of the drunkard’s state as the transition matrix is applied a large number of times, ie. to calculate

S_n = P^n \cdot S_0

An efficient way to calculate powers of a large matrix is to first diagonalise it. We have \inline P = U A\ U^{-1}, where \inline A is a diagonal matrix whose diagonal elements are the eigenvalues of \inline P, and \inline U is the matrix whose columns are the eigenvectors of \inline P. Note that, as \inline P is not symmetric, \inline U^{-1} will have row vectors that the the LEFT eigenvectors of \inline P, which in general will be different to the right eigenvectors. It is easy to see how to raise \inline P to the power n

P^n = (U A\ U^{-1})^n = U A^n\ U^{-1}

since all of the \inline Us and \inline U^{-1}s in the middle cancel. Since \inline A is diagonal, to raise it to the power of n we simlpy raise each element (which are just the eigenvalues of \inline P) to the power of n. To calculate the eigenvalues of P we solve the characteristic equation

|P - \lambda_a I| = 0


P = \begin{pmatrix} 1 & 0.5 & 0 & 0 & 0 & 0\\ 0 & 0 & 0.5 & 0 & 0 & 0\\ 0 & 0.5 & 0 & 0.5 & 0 & 0\\ 0 & 0 & 0.5 & 0 & 0.5 & 0 \\ 0 & 0 & 0 & 0.5 & 0 & 0\\ 0 & 0 & 0 & 0 & 0.5 & 1 \end{pmatrix}

This gives six eigenvalues, two of which are one (these will turn out to correspond to the two absorbing states) and the remainder are strictly \inline 0\leq \lambda_a < 1. Consequently, when raised to the power n in the diagonal matrix above, all of the terms will disappear except for the first and the last eigenvalue which are 1 as n becomes large.

Calculating the eigenvectors is time consuming and we’d like to avoid it if possible. Luckily, in the limit that n gets large, we’ve seen that most of the eigenvalues raised to the power n will go to zero which will reduce significantly the amount we need to calculate. We have

P^n\cdot S = \bigl( U A^n \ U^{-1}\bigr)\cdot S

and in the limit that n gets large, \inline U \cdot A^n is just a matrix of zeroes with a one in the upper left and lower right entry. At first sight it looks as though calculating \inline U^{-1} will be required, which is itself an eigenvector problem, but in fact we only have to calculate a single eigenvector – the first (or equivalently the last), which will give the probability of evolving from an initial state S to a final state 0 (or 100).

\inline U^{-1} is the matrix of left eigenvectors of \inline P, each of which satisfies

x_a \cdot P = \lambda_a x_a

and we are looking for the eigenvectors corresponding to eigenvalue of 1, so we need to solve the matrix equation

x \cdot \begin{pmatrix} 0 & 0.5 & 0 & 0 & 0 & 0\\ 0 & -1 & 0.5 & 0 & 0 & 0\\ 0 & 0.5 & -1 & 0.5 & 0 & 0\\ 0 & 0 & 0.5 & -1 & 0.5 & 0 \\ 0 & 0 & 0 & 0.5 & -1 & 0\\ 0 & 0 & 0 & 0 & 0.5 & 0 \end{pmatrix} = 0

We know that if he starts in the first state (ie. over the cliff) he must finish there (trivially, after 0 steps) with probability 100%, so that \inline x_1 = 1 and \inline x_6 = 0. The solution to this is

x_n = {6 - n \over 5}

which is plotted here for each initial state

The drunkard's final state probabilities
The probability of the drunkard finishing in either final state given his initial state. Note that this sums to one in all cases as eventually he must finish in one or the other state.

which says that the probability of ending up in state 1 (down the cliff!) falls linearly with starting distance from the cliff. We can scale up this final matrix equation to the original 100 by 100 state space, and find that for someone starting in state 30, there is a 70/100 chance of ending up down the cliff, and consequently only a 30% chance of getting home!

This problem is basically the same as the Gambler’s ruin problem, where a gambler in a casino stakes $1 on each toss of a coin and leaves after reaching a goal of $N or when broke. There are some very neat methods for solving them via martingale methods that don’t use the mechanics above that I’ll look at in a future post.

Interview Quesions III

Today’s question will test some of the statistics and correlation I’ve discussed in the last couple of months. Assume throughout that x\sim {\mathbb N}(0,1) and y\sim {\mathbb N}(0,1) are jointly normally distributed such that {\mathbb E}[x \cdot y] = \rho

a) Calculate {\mathbb E}[\ e^x \ ]
b) Calculate {\mathbb E}[\ e^x \ | \ y = b\ ]

The first expectation is of a lognormal variate, and the second is of a lognormal variate conditional on some earlier value of the variate having been a particular value – these are very typical of the sorts of quantities that a quant deals with every day, so the solution will be quite instructive! Before reading the solution have a go at each one, the following posts may be useful: SDEs pt. 1, SDEs pt. 2, Results for Common Distributions

a) Here, we use the standard result for expectations

    \begin{align*} {\mathbb E}[ \ e^x \ ] &= \int^{\infty}_{-\infty} e^x \cdot f(x) \ dx \nonumber \\ \nonumber \\ &= {1 \over \sqrt{2 \pi}}\int^{\infty}_{-\infty} e^x \cdot e^{-{1 \over 2}x^2} \ dx \nonumber \\ \nonumber \\ &= {1 \over \sqrt{2 \pi}}\int^{\infty}_{-\infty} \exp\Bigl( -{1\over 2}\Bigl[ x^2 - 2x + 1 - 1 \Bigr] \Bigr) \ dx \nonumber \\ \nonumber \\ &= {e^{1\over 2} \over \sqrt{2 \pi}}\int^{\infty}_{-\infty} \exp\Bigl( -{1\over 2} (x-1)^2 \Bigr) \ dx \nonumber \\ \nonumber \\ &= e^{1\over 2} \nonumber \end{align}

b) This one is a little tougher, so first of all I’ll discuss what it means and some possible plans of attack. We want to calculate the expectation of e^x, given that y takes a value of b. Of course, if x and y were independent, this wouldn’t make any difference and the result would be the same. However, because they are correlated, the realised value of y will have an effect on the distribution of x.

To demonstrate this, I’ve plotted a few scatter-graphs illustrating the effect of specifying y on x, with x and y uncorrelated and then becoming increasing more correlated.

When x and y are uncorrelated, the realised value of y doesn't affect the distribution for x, which is still normally distributed around zero
When x and y are uncorrelated, the realised value of y doesn’t affect the distribution for x, which is still normally distributed around zero
When x and y are correlated, the realised value of y has an effect on the distribution of x, which is no longer centered on zero and has a smaller variance
When x and y are correlated, the realised value of y has an effect on the distribution of x, which is no longer centered on zero and has a smaller variance
When the correlation of x and y becomes high, the value of x is almost completely determined by y. Now, if y is specified then x is tightly centered around a value far from zero
When the correlation of x and y becomes high, the value of x is almost completely determined by y. Now, if y is specified then x is tightly centered around a value far from zero

The simplest way of attempting this calculation is to use the result for jointly normal variates given in an earlier post, which says that if x and y have correlation \rho, we can express x in terms of y and a new variate z \sim {\mathbb N}(0,1) which is uncorrelated with y

    \[x = \rho\cdot y + \sqrt{1 - \rho^2}\cdot z\]


    \[(\ e^x\ |\ y = b\ ) = e^{\rho y + \sqrt{1-\rho^2}z} = e^{\rho b}\cdot e^{\sqrt{1-\rho^2}z}\]

Since the value of y is already determined (ie. y = b), I’ve separated this term out and the only thing I have to calculate is the expectation of the second term in z. Since y and z are independent, we can calculate the expectation of the z, which is the same process as before but featuring slightly more complicated pre-factors

    \begin{align*} {\mathbb E}[ \ e^x \ |\ y = b\ ] &= e^{\rho b} \int^{\infty}_{-\infty} e^{\sqrt{1-\rho^2}z} \cdot f(z) \ dz \nonumber \\ \nonumber \\ &= {e^{\rho b} \over \sqrt{2 \pi}}\int^{\infty}_{-\infty} e^{\sqrt{1-\rho^2}z} \cdot e^{-{1 \over 2}z^2} \ dz \nonumber \\ \nonumber \\ &= {e^{\rho b} \over \sqrt{2 \pi}}\int^{\infty}_{-\infty} \exp\Bigl( -{1\over 2}\Bigl[ z^2 - 2\sqrt{1-\rho^2}z \nonumber \\ & \quad \quad \quad \quad \quad \quad \quad \quad + (1-\rho^2) - (1-\rho^2) \Bigr] \Bigr) \ dz \nonumber \\ \nonumber \\ &= {e^{\rho b} \over \sqrt{2 \pi}} e^{{1 \over 2} (1-\rho^2)} \int^{\infty}_{-\infty} \exp\Bigl( -{1\over 2} \bigl(z-\sqrt{1-\rho^2}\bigr)^2 \Bigr) \ dz \nonumber \\ \nonumber \\ &= e^{{1\over 2}(1-\rho^2)+\rho b} \nonumber \end{align}

We can check the limiting values of this – if \rho = 0 then x and y are independent [this is not a general result by the way – see wikipedia for example – but it IS true for jointly normally distributed variables], in this case {\mathbb E}[ \ e^x \ |\ y = b\ ] = e^{0.5} just as above. If \rho = \pm 1, {\mathbb E} [\ e^x \ |\ y=b\ ] = e^{\pm b}, which also makes sense since in this case x = \pm y = \pm b, so y fully determines the expectation of x.

The more general way to solve this is to use the full 2D joint normal distribution as given in the previous post mentioned before,

    \[f(x,y) = {1 \over {2\pi \sqrt{1-\rho^2}}} \exp{\Bigl(-{1 \over 2(1-\rho^2)}(x^2 + y^2 - 2\rho xy)\Bigr)}\]

This is the joint probability function of x and y, but it’s not quite what we need – the expectation we are trying to calculate is

    \[{\mathbb E}[ \ e^x \ |\ y \ ] = \int^{\infty}_{-\infty} e^x \cdot f(\ x \ |\ y \ ) \ dx\]

So we need to calculate the conditional expectation of x given y, for which we need Bayes’ theorem

    \[f(x,y) = f( \ x \ | \ y \ ) \cdot f(y)\]

Putting this together, we have

    \[{\mathbb E}[ \ e^x \ |\ y = b\ ] = \int^{\infty}_{-\infty} e^x \cdot {f(x,y) \over f(y)}\ dx\]

    \[={1 \over {2\pi \sqrt{1-\rho^2}}} \int^{\infty}_{-\infty} { e^x \over e^{-{1\over 2}y^2}} \exp{\Bigl(-{1 \over 2(1-\rho^2)}(x^2 + y^2 - 2\rho xy)\Bigr)} dx\]

    \[={e^{{1\over 2}b^2} \over {2\pi \sqrt{1-\rho^2}}} \int^{\infty}_{-\infty} e^x \exp{\Bigl(-{1 \over 2(1-\rho^2)}(x^2 + b^2 - 2\rho xb)\Bigr)} dx\]

This integral is left as an exercise to the reader, but it is very similar to those given above and should give the same answer as the previous expression for {\mathbb E}[\ e^x \ | \ y = b\ ] after some simplification!

Stochastic Differential Equations Pt2: The Lognormal Distribution

This post follows from the earlier post on Stochastic Differential Equations.

I finished last time by saying that the solution to the BS SDE for terminal spot at time T was

\inline S_T = S_0 e^{(\mu - {1 \over 2}\sigma^2)T + \sigma W_T}

When we solve an ODE, it gives us an expression for the position of a particle at time T. But we’ve already said that we are uncertain about the price of an asset in the future, and this expression expresses that uncertainty through the \inline \sigma W_T term in the exponent. We said in the last post that the difference between this quantity at two different times s and t was normally distributed, and since this term is the distance between t=0 and t=T (we have implicitly ignored a term W_0, but this is ok because we assumed that the process started at zero) it is also normally distributed,

W_T \sim {\mathbb N}(0,T)

It’s a well-known property of the normal distribution (see the Wikipedia entry for this and many others) that if X \sim {\mathbb N}(0,1) then aX \sim {\mathbb N}(0,a^2) for constant a. We can use this in reverse to reduce W_T to a standard normal variable x, by taking a square root of time outside of the distribution so W_T \sim \sqrt{T}\cdot{\mathbb N}(0,1) and we now only need standard normal variables, which we know lots about. We can repeat our first expression in these terms

\inline S_T = S_0 e^{(\mu - {1 \over 2}\sigma^2)T + \sigma \sqrt{T} X}

What does all of this mean? In an ODE environment, we’d be able to specify the exact position of a particle at time T. Once we try to build in uncertainty via SDEs, we are implicitly sacrificing this ability, so instead we can only talk about expected positions, variances, and other probabilistic quantities. However, we certainly can do this, the properties of the normal distribution are very well understood from a probabilistic standpoint so we expect to be able to make headway! Just as X is a random variable distributed across a normal distribution, S(t) is now a random variable whose distribution is a function of random variable X and the other deterministic terms in the expression. We call this distribution the lognormal distribution since the log of S is distributed normally.

The random nature of S is determined entirely by the random nature of X. If we take a draw from X, that will entirely determine the corresponding value of S, since the remaining terms are deterministic. The first things we might want to do are calculate the expectation of S, its variance, and plot its distribution. To calculate the expectation, we integrate over all possible realisations of X weighted by their probability, complete the square and use the gaussian integral formula with a change of variables

{\mathbb E}[S_t] = \int^{\infty}_{-\infty} S_t(x) p(x) dx

={S_0 \over \sqrt{2\pi}}\int^{\infty}_{-\infty} e^{(\mu-{1\over 2}\sigma^2)t + \sigma \sqrt{t}x} e^{-{1\over 2}x^2} dx

={ {S_0 e^{(\mu-{1\over 2}\sigma^2)t}} \over \sqrt{2\pi}}\int^{\infty}_{-\infty} e^{-{1\over 2}x^2 + \sigma \sqrt{t} x} dx

={{S_0 e^{(\mu-{1\over 2}\sigma^2)t}} \over \sqrt{2\pi}}\int^{\infty}_{-\infty} e^{-{1\over 2} (x - \sigma \sqrt{t})^2} e^{{1\over 2}\sigma^2 t} dx

={{S_0 e^{\mu t}} \over \sqrt{2\pi}}\int^{\infty}_{-\infty} e^{-{1\over 2} y^2} dy

=S_0 e^{\mu t}

which is just the linear growth term acting over time [exercise: calculate the variance in a similar way]. We know what the probability distribution of X looks like (it’s a standard normal variable), but what does the probability distribution of S look like? We can calculate the pdf using the change-of-variables technique, which says that if S = g(x), then the area under each curve in corresponding regions must be equal:

\int_{x_1}^{x_2} p_x(x) dx = \int_{g(x_1)}^{g(x_2)} p_S(S) dS

p_x(x) dx = p_S(S) dS

p_S(S_t) = p_x(x) {dx \over dS_t}

 We know the function S(x), but the easiest way to calculate this derivative is first to invert the function t make it ameanable to differentiation

x = {\ln{S_t \over S_0} - (\mu - {1 \over 2}\sigma^2)t \over \sigma \sqrt{t}}

{dx \over dS_t} = {1 \over \sigma \sqrt{t} S_t}

So the pdf of S expressed in terms of S is

 p_S(S_t) = {1 \over S_t \sigma \sqrt{2\pi t}} \exp{-\Bigl(\ln{S_t\over S_0} - (\mu-{1\over 2}\sigma^2)t \Bigr)^2\over 2 \sigma^2 t}

Well it’s a nasty looking function indeed! I’ve plotted it below for a few typical parameter sets and evolution times.

A lognormal PDF with typical parameter values. Of course, it is highly unlikely that parameter values will stay constant for 4 years – we’ll discuss time dependence of these another day
A more-volatile PDF. Note that even though the mode is falling over time, the mean (which is independent of vol) is still going up due to the fat tails at positive values.

This distribution is really central to a lot of what we do, so I’ll come back to it soon and discuss a few more of its properties. The one other thing to mention is that if we want to calculate an expected value over S (which will turn out to be something we do a lot), we have two approaches – either integrate over \inline p_S(S_t)

{\mathbb E}[f(S_t)] = \int_0^{\infty} f(S_t)p_S(S_t)dS

or,instead express the function in terms of x instead (using \inline S_t = S_0 e^{(\mu - {1 \over 2}\sigma^2)t + \sigma \sqrt{t} x}) and instead integrate over the normal distribution

{\mathbb E}[f(S_t(x))] = \int_{-\infty}^{\infty} f(x)p_x(x)dx

This is typically the easier option. I think it is called the Law of the Unconscious Statistician. On that note, we’ve certainly covered enough ground for the moment!