Interview Questions VII: Integrated Brownian Motion

 
For a standard Weiner process denoted W_t, calculate

    \[{\mathbb E} \Bigl[ \Bigl( \int^T_0 W_t dt \Bigr)^2 \Bigr]\]

This integral is the limit of the sum of W_t at each infinitessimal time slice dt from 0 to T, it is called Integrated Brownian Motion.




We see immediately that this will be a random variable with expectation zero, so the result of the squared expectation will simply be the variance of the random variable, which is what we need to calculate. Here I show two ways to solve this, first by taking the limit of the sum and second using stochastic integration by parts

Need help? Discuss this in the Interview Questions forum!

1) Limit of a sum

(1)   \begin{align*} I &= \int^T_0 W_t dt \nonumber \\ &= \lim_{n \to \infty}{\frac T n}\sum^n_{i=1} W_{\frac {Ti} n} \nonumber \end{align*}

This sum of values along the Wenier process is not independent of one-another, since only the increments are independent. However, we can re-write them in terms of sums of these independent increments

(2)   \begin{align*} W_{\frac {Ti} n} &= \sum^i_{j=1} ( W_{\frac {Tj} n} - W_{\frac {T(j-1)} n} ) \nonumber \\ &= \sum^i_{j=1} \overline{W}_{\frac {Tj} n} \nonumber \end{align*}

where \overline{W}_{\frac {T(j-1)} n} \sim {\mathbb N}(0,{\frac T n}) are the individual independent increments of the brownian motion. Substituting into our previous equation and reversing the order of the summation

(3)   \begin{align*} \lim_{n \to \infty}{\frac T n}\sum^n_{i=1} W_{\frac {Ti} n} &= \lim_{n \to \infty}{\frac T n} \sum^n_{i=1} \sum^i_{j=1} \overline{W}_{\frac {Tj} n} \nonumber \\ &= \lim_{n \to \infty}{\frac T n} \sum^n_{j=1} \sum^n_{i=j} \overline{W}_{\frac {Tj} n} \nonumber \\ &= \lim_{n \to \infty}{\frac T n} \sum^n_{j=1} (n-j+1) \overline{W}_{\frac {Tj} n} \nonumber \\ \end{align*}

which is simply a weighted sum of independent gaussians. To calculate the total variance, we sum the individual variances using the summation formula for \sum_1^N n^2

(4)   \begin{align*} {\mathbb V}[I] &= \lim_{n \to \infty} {\frac {T^2} {n^2}} \sum^n_{j=1} (n-j+1)^2 {\frac T n} \nonumber \\ &= \lim_{n \to \infty} {\frac {T^3} {n^3}} \Bigl( {\frac 1 3} n^3 + O(n^2) \Bigr) \nonumber \\ &= {\frac 1 3} T^3 \end{align*}

which is the solution.

2) Stochastic integration by parts

The stochastic version of integration by parts for potentially stochastic variables X_t, Y_t looks like this:

    \[X_T \cdot Y_T = X_0 \cdot Y_0 + \int^T_0 X_t dY_t + \int^T_0 Y_t dX_t\]

Re-arranging this gives

    \[\int^T_0 Y_t dX_t = \Bigl[ X_t \cdot Y_t \Bigr]^T_0 - \int^T_0 X_t dY_t\]

Now setting Y_t = -W_t and X_t = (T-t) we have

(5)   \begin{align*} \int^T_0 W_t dt &= \Bigl[ -W_t \cdot (T-t) \Bigr]^T_0 + \int^T_0 (T-t) dW_t \nonumber \\ &= \int^T_0 (T-t) dW_t \nonumber \end{align*}

We recognise this as a weighted sum of independent gaussian increments, which is (as expected) a gaussian variable with expectation 0 and variance that we can calculate with the Ito isometry

(6)   \begin{align*} {\mathbb E}\Bigl[ \Bigl( \int^T_0 (T-t) dW_t \Bigr)^2 \Bigr] &= {\mathbb E}\Bigl[ \int^T_0 (T-t)^2 dt \Bigr] \nonumber \\ &= \Bigl[ T^2t - Tt^2 + {\frac 1 3} t^3 \Bigr]^T_0 \nonumber \\ &= {\frac 1 3} T^3 \end{align*}

which is the solution.

Improving Monte Carlo: Control Variates

I’ve already discussed quite a lot about Monte Carlo in quantitative finance. MC can be used to value products for which an analytical price is not available in a given model, which includes most exotic derivatives in most models. However, two big problems are the time that it takes to run and the ‘Monte Carlo error’ in results.

One technique for improving MC is to use a ‘control variate’. The idea is to find a product whose price is strongly correlated to the product that we’re trying to price, but which is more easy to calculate or which we already know the price of. When we simulate a path in MC, it will almost surely give either an under-estimate or an over-estimate of the true price, but we don’t know which, and averaging all of these errors is what leads to the Monte Carlo error in the final result. The insight in the control variate technique is to use the knowledge given to us by the control variate to reduce this error. If the two prices are strongly correlated and a path produces an over-estimate of product price, it most likely also produces an over-estimate of the control variate and visa versa, which will allow us to improve our estimate of the product we’re trying to price.

The textbook example is the Asian Option. Although the arithmetic version of the asian option discussed in previous posts has no analytic expression in BS, a similar Geometric asian option does have an analytic price. So, for a given set of model parameters, we can calculate the price of the option. As a reminder, the payoff of an arithmetic asian option at expiry is

    \[C_{\rm arit}(T) = \Bigl({1\over N}\sum_{i=0}^{N-1} S(t_i) - K \Bigr)^+\]

and the payoff of the related geometric averaging asian is

    \[C_{\rm geo}(T) = \Bigl( \bigl(\prod_{i=0}^{N-1} S(t_i)\bigr)^{1\over N} - K \Bigr)^+\]

Denoting the price of the arithmetic option as X and the geometric option as Y, the traditional monte carlo approach is to generate N paths, and for each one calculate X_i (the realisation of the payoff along the path) and take the average over all paths, so that

    \[{\mathbb E}[X] = {1 \over N} \sum_{i=0}^{N-1} X_i\]

which will get closer to the true price as N \to \infty.

Using Y as a control variate, we instead calculate

    \[{\mathbb E}[X] = {1\over N} \sum_{i=0}^{N-1}\bigl( X_i - \lambda( Y_i - {\mathbb E}[Y] ) \bigr)\]

where {\mathbb E}[Y] is the price of the geometric option known from the analytical expression, and \lambda is a constant (in this case we will set it to 1).

What do we gain from this? Well, consider the variance of X_i - \lambda (Y_i - {\mathbb E } [Y])

    \[{\rm Var}\bigl( X_i - ( Y_i - {\mathbb E}[Y] ) \bigr) = {\rm Var}( X_i ) + \lambda^2 {\rm Var} ( Y_i ) - 2 \lambda {\rm Cov}( X_i Y_i )\]

(since {\mathbb E } [Y] is known, so has zero variance) which is minimal for \lambda = \sqrt{ {\rm Var}(X_i) \over {\rm Var}(Y_i) }\cdot \rho(X_i,Y_i) in which case

    \[{ {\rm Var}\bigl( X_i - ( Y_i - {\rm E}[Y] ) \bigr) \over {\rm Var}( X_i )} = 1 - \rho(X_i,Y_i)^2\]

that is, if the two prices are strongly correlated, the variance of the price calculated using the control variate will be a significantly smaller. I’ve plotted a sketch of the prices of the two types of average for 100 paths – the correlation is about 99.98%. Consequently, we expect to see a reduction in variance of about 2000 times for a given number of paths (although we now have to do a little more work on each path, as we need to calculate the geometric average as well as the arithmetic average of spots). This is roughly 45 times smaller standard error on pricing – well over an extra decimal place, which isn’t bad – and this is certainly much easier than running 2000 times as many paths to achieve the same result.

The relationship between payout for a geometric and an arithmetic asian option, which here demonstrate a 99.98% sample correlation
The relationship between payout for a geometric and an arithmetic asian option, which here demonstrate a 99.98% sample correlation. Parameters used were: r: 3%; vol: 10%; K: 105; S(0): 100; averaging dates: monthly intervals for a year

Fitting the initial discount curve in a stochastic rates model

 
I’ve introduced the Vasicek stochastic rates model in an earlier post, and here I’m going to introduce a development of it called the Hull-White (or sometimes Hull-White extended Vasicek) model.

The rates are modelled by a mean-reverting stochastic process

    \[dr = \bigl( \theta(t) - a r(t) \bigr) dt + \sigma dW_t\]

which is similar to the Vasicek model, except that the \theta(t) term is now allowed to vary with time (in general a and \sigma are too, but I’ll ignore those issues for today).

The freedom to set theta as a deterministic function of time allows us to calibrate the model to match the initial discount curve, which means at least initially our model will give the right price for things like FRAs. Calibrating models to match market prices is one of the main things quants do. Of course, the market doesn’t really obey our model. this means that, in time, the market prices and the prices predicted by our model will drift apart, and the model will need to be re-calibrated. But the better a model captures the various risk factors in the market, the less often this procedure will be needed.

Using the trick

    \[d \bigl( e^{at} r(t)\bigr) = e^{at} \bigl(dr + ar(t)dt \bigr)\]

to re-express the equation and integrating gives

    \[r(t) = r_0 e^{-at} + e^{-at}\int^t_0 \theta(u) e^{au} du + \sigma e^{-at} \int^t_0 e^{au} dW_u\]

where r_0 is the rate at t=0. The observable quantities from the discount curve are the initial discount factors (or equivalently the initial forward rates) P(0,t), where

    \[P(0,t) = {\mathbb E}\bigl[ e^{-\int^t_0 r(u) du} \bigr]\]

The rate r_0 is normally distributed, so the integral \int^t_0 r(u) du must be too. This is because an integral is essentially a sum, and a sum of normal distributions is also normally distributed. Applying the Ito isometry as discussed before, the expectation of this variable will come wholly from the deterministic terms and the variance will come entirly from the stochastic terms, giving

    \begin{align*} {\mathbb E} \bigl[ \int^t_0 r(u) du \bigr] &= r_0 B(0,t) + \int^t_0 e^{-au} (\int^u_0 \theta(s) e^{as} ds ) du \nonumber \\ {\mathbb V}\bigl[ \int^t_0 r(u) du \bigr] &= \sigma^2 \int^t_0 B(u,t)^2 du \nonumber \end{align*}




where throughout

    \[B(t,t') = {1 \over a}\bigl( 1 - e^{-a(t'-t)}\bigr)\]

and since

    \[{\mathbb E} \bigl[ e^x | x \sim {\mathbb N}(\mu,\sigma^2) \bigr] = e^{\mu + {1\over 2}\sigma^2}\]

we have

    \[P(0,t) = \exp \Bigl\{ -r_0 B(0,t) - \int^t_0 e^{-au} \bigl( \int_0^u \theta(s) e^{as} ds + {1 \over 2} \sigma^2 B(u,t)^2 \bigr) du \Bigr\}\]

Two differentiations of this expression give

    \[-{\partial \over \partial t} \ln P(0,t) = r_0 e^{-at} + e^{-at} \int^t_0 \theta(u) e^{au} du + {1\over 2} \sigma^2 B(0,t)^2\]

    \[-{\partial^2 \over \partial t^2} \ln P(0,t) = -ar_0 e^{-at} + \theta(t) - ae^{-at} \int^t_0 \theta(u) e^{au} du + \sigma^2 e^{-at} B(0,t)\]

and combining these equations gives an expression for \theta(t) that exactly fits the initial discount curve for the given currency

    \[\theta(t) = -{\partial^2 \over \partial t^2} \ln P(0,t) - a{\partial \over \partial t} \ln P(0,t) + {\sigma^2\over 2a}( 1 - e^{-2at} )\]

and since -{\partial \over \partial t} \ln P(0,t) = f(0,t) is simply the initial market observed forward rate to each time horizon coming from the discount curve, this can be compactly expressed as

    \[\theta(t) = {\partial \over \partial t} f(0,t) + a f(0,t) + {\sigma^2 \over 2a} ( 1 - e^{-2at} )\]

Today we’ve seen how a simple extension to the ‘basic’ Vasicek model allows us to match the initial discount curve seen in the market. Allowing the volatility parameter to vary will allow us to match market prices of other products such as swaptions (an option to enter into a swap), which I’ll discuss another time. But we’re gradually building up a suite of simple models that we can combine later to model much more complicated environments.

Interview Questions VI: The Drunkard’s Walk

A drunk man is stumbling home at night after closing time. Every step he takes moves him either 1 metre closer or 1 metre further away from his destination, with an equal chance of going in either direction (!). His home is 70 metres down the road, but unfortunately, there is a cliff 30 metres behind him at the other end of the street. What are the chances that he makes it to his house BEFORE tumbling over the edge of the cliff?




This is a fun question and quite heavy on conditional probability. We are trying to find the probability that the drunkard has moved 70 metres forward BEFORE he has ever moved 30 metres backward, or visa versa. There are several ways of attempting this, including some pretty neat martingale maths, but I’m going to attempt it here in the language of matrices and markov chains.

Essentially, there are 100 states that the drunkard can be in, from right beside the cliff all the way down the road to right beside his front door. Let’s label these from 1 to 100, in terms of the number of metres away from the cliff, so that he starts in state 30. At each step, he transitions from his current state to either one state higher or one state lower with probability 50%, and the process continues until he reaches either the cliff or the door, at which point the process will cease in either a good or a very bad night’s rest. We call these two states, 0 and 100, ‘absorbers’, because the process stops at this point and no transitions to new states can happen. A markov diagram that illustrates this process can be drawn like this:

A markov chain displaying the transition probabilities for each state in the drunkard's walk
A markov chain displaying the transition probabilities for each state in the drunkard’s walk

We can characterise each step in the process by a transition matrix acting on a state vector. The drunkard initially has a state of 30 metres, so his state vector is a long string of zeroes with a single 1 at the 30th position:

S_0 = \begin{pmatrix} 0 \\ \vdots \\ 1\\ \vdots\\ 0 \end{pmatrix}

This vector is probabalistic – a 1 indicates with 100% certainty that the drunkard is in the 30th state. However, with subsequent moves this probability density will be spread over the nearby states as his position’s probability density diffuses into other states. The transition matrix multiplies the drunkard’s state vector to give his state vector after another step:

P = \begin{pmatrix} 1 & 0.5 & 0 & \cdots & 0 & 0\\ 0 & 0 & 0.5 & \cdots & 0 & 0\\ 0 & 0.5 & 0 & \cdots & 0 & 0\\ 0 & 0 & 0.5 & & 0 & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots\\ 0 & 0 & 0 & \cdots & 0.5 & 1 \end{pmatrix}; \quad S_{i+1} = P \cdot S_i

So, after one step the drunkard’s state vector will have a 0.5 in the 29th and the 31st position and zeroes elsewhere, saying that he will be in either of these states with probability 0.5, and certainly nowhere else. Note that the 1’s at the top and bottom of the transition matrix will absorb and probability that arrives at that state for the rest of the process.

To keep things simple, let’s consider a much smaller problem with only six states, where state 1 is ‘down the cliff’ and state 6 is ‘home’; and we’ll come back to the larger problem at the end. We want to calculate the limit of the drunkard’s state as the transition matrix is applied a large number of times, ie. to calculate

S_n = P^n \cdot S_0

An efficient way to calculate powers of a large matrix is to first diagonalise it. We have \inline P = U A\ U^{-1}, where \inline A is a diagonal matrix whose diagonal elements are the eigenvalues of \inline P, and \inline U is the matrix whose columns are the eigenvectors of \inline P. Note that, as \inline P is not symmetric, \inline U^{-1} will have row vectors that the the LEFT eigenvectors of \inline P, which in general will be different to the right eigenvectors. It is easy to see how to raise \inline P to the power n

P^n = (U A\ U^{-1})^n = U A^n\ U^{-1}

since all of the \inline Us and \inline U^{-1}s in the middle cancel. Since \inline A is diagonal, to raise it to the power of n we simlpy raise each element (which are just the eigenvalues of \inline P) to the power of n. To calculate the eigenvalues of P we solve the characteristic equation

|P - \lambda_a I| = 0

with

P = \begin{pmatrix} 1 & 0.5 & 0 & 0 & 0 & 0\\ 0 & 0 & 0.5 & 0 & 0 & 0\\ 0 & 0.5 & 0 & 0.5 & 0 & 0\\ 0 & 0 & 0.5 & 0 & 0.5 & 0 \\ 0 & 0 & 0 & 0.5 & 0 & 0\\ 0 & 0 & 0 & 0 & 0.5 & 1 \end{pmatrix}

This gives six eigenvalues, two of which are one (these will turn out to correspond to the two absorbing states) and the remainder are strictly \inline 0\leq \lambda_a < 1. Consequently, when raised to the power n in the diagonal matrix above, all of the terms will disappear except for the first and the last eigenvalue which are 1 as n becomes large.

Calculating the eigenvectors is time consuming and we’d like to avoid it if possible. Luckily, in the limit that n gets large, we’ve seen that most of the eigenvalues raised to the power n will go to zero which will reduce significantly the amount we need to calculate. We have

P^n\cdot S = \bigl( U A^n \ U^{-1}\bigr)\cdot S

and in the limit that n gets large, \inline U \cdot A^n is just a matrix of zeroes with a one in the upper left and lower right entry. At first sight it looks as though calculating \inline U^{-1} will be required, which is itself an eigenvector problem, but in fact we only have to calculate a single eigenvector – the first (or equivalently the last), which will give the probability of evolving from an initial state S to a final state 0 (or 100).

\inline U^{-1} is the matrix of left eigenvectors of \inline P, each of which satisfies

x_a \cdot P = \lambda_a x_a

and we are looking for the eigenvectors corresponding to eigenvalue of 1, so we need to solve the matrix equation

x \cdot \begin{pmatrix} 0 & 0.5 & 0 & 0 & 0 & 0\\ 0 & -1 & 0.5 & 0 & 0 & 0\\ 0 & 0.5 & -1 & 0.5 & 0 & 0\\ 0 & 0 & 0.5 & -1 & 0.5 & 0 \\ 0 & 0 & 0 & 0.5 & -1 & 0\\ 0 & 0 & 0 & 0 & 0.5 & 0 \end{pmatrix} = 0

We know that if he starts in the first state (ie. over the cliff) he must finish there (trivially, after 0 steps) with probability 100%, so that \inline x_1 = 1 and \inline x_6 = 0. The solution to this is

x_n = {6 - n \over 5}

which is plotted here for each initial state

The drunkard's final state probabilities
The probability of the drunkard finishing in either final state given his initial state. Note that this sums to one in all cases as eventually he must finish in one or the other state.

which says that the probability of ending up in state 1 (down the cliff!) falls linearly with starting distance from the cliff. We can scale up this final matrix equation to the original 100 by 100 state space, and find that for someone starting in state 30, there is a 70/100 chance of ending up down the cliff, and consequently only a 30% chance of getting home!

This problem is basically the same as the Gambler’s ruin problem, where a gambler in a casino stakes $1 on each toss of a coin and leaves after reaching a goal of $N or when broke. There are some very neat methods for solving them via martingale methods that don’t use the mechanics above that I’ll look at in a future post.

Asian Options II: Monte Carlo

In a recent post, I introduced Asian options, an exotic option which pay the average of the underlying over a sequence of fixing dates.

The payoff of an Asian call is

C(T) = \Big({1 \over N}\sum^N_{i=0} S_{t_i} - K \Big)^+

and to price the option at an earlier time via risk-neutral valuation requires us to solve the integral

C(t) = P_{t,T}\ \big( \iint\cdots\int \big) \Big({1 \over N}\sum^N_{i=0} S_{t_i} - K \Big)^+ \phi(S_{t_1},S_{t_1},\cdots,S_{t_N}) dS_{t_1}dS_{t_2}\cdots dS_{t_N}

A simple way to do this is using Monte Carlo – we simulate a spot path drawn from the distribution \inline \phi(S_{t_1},S_{t_1},\cdots,S_{t_N}) and evaluate the payoff at expiry of that path, then average over a large number of paths to get an estimate of C(t).

It’s particularly easy to do this in Black-Scholes, all we have to do is simulate N gaussian variables, and evolve the underlying according to a geometric brownian motion.

I’ve updated the C++ and Excel Monte Carlo pricer on the blog to be able to price asian puts and calls by Monte Carlo – have a go and see how they behave relative to vanilla options. One subtlety is that we can no longer input a single expiry date, we now need an array of dates as our function input. If you have a look in the code, you’ll see that I’ve adjusted the function optionPricer to take a datatype MyArray, which is the default array used by XLW (it won’t be happy if you tell it to take std::vector). This array can be entered as a row or column of excel cells, and should be the dates, expressed as years from the present date, to average over.

The Dupire Local Vol Model

In this post I’m going to look at a further generalisation of the Black-Scholes model, which will allow us to re-price any arbitrary market-observed volatility surface, including those featuring a volatility smile.

I’ve previously looked at how we can produce different at-the-money vols at different times by using a piecewise constant volatility \inline \sigma(t), but we were still unable to produce smiley vol surfaces which are often observed in the market. We can go further by allowing vol to depend on both t and the value of the underlying S, so that the full BS model dynamics are given by the following SDE

dS_t = S_t\cdot \Big(\ r(t) dt\ +\ \sigma(t,S_t) dW_t\ \Big)

Throughout this post we will make constant use of the probability distribution of the underlying implied by this SDE at future times, which I will denote \inline \phi(t,S_t). It can be shown [and I will show in a later post!] that the evolution of this distribution obeys the Kolmogorov Forward Equation (sometimes called the Fokker-Planck equation)

{\partial \phi(t,S_t) \over \partial t} = -{\partial \over \partial S_t}\big(rS_t\phi(t,S_t)\big) + {1\over 2}{\partial^2 \over \partial S_t^2}\big(\sigma^2(t,S_t) S_t^2 \phi(t,S_t)\big)

This looks a mess, but it essentially tells us how the probability distribution changes with time – we can see that is looks very much like a heat equation with an additional driving term due to the SDE drift.

Vanilla call option prices are given by

C(t,S_t) = P(0,t)\int_K^{\infty} \big(S_t - K\big)\phi(t, S_t) dS_t

Assuming the market provides vanilla call option prices at all times and strikes [or at least enough for us to interpolate across the region of interest], we can calculate the time derivative of the call price which is equal to

{\partial C \over \partial T} = -rC + P(0,T)\int_K^{\infty}\big( S_T - K \big)\ {\partial \phi \over \partial T}\ dS_T

and we can substitute in the value of the time derivative of the probability distribution from the Kolmogorov equation above

rC + {\partial C \over \partial T} = P(0,T)\int_K^{\infty}\big( S_T - K \big)\Big[ -{\partial \over \partial S_T}\big( rS_T\phi\big) + {1\over 2}{\partial^2 \over \partial S_T^2}\big( \sigma^2S_T^2\phi\big) \Big]dS_T

These two integrals can be solved by integration by parts with a little care

\begin{align*} -\int_K^{\infty}\big( S_T - K \big){\partial \over \partial S_T}\big( rS_T\phi\big)dS_T & = -\Big[rS_T\phi \big( S_T - K \big)\Big]^{\infty}_{K} + \int_K^{\infty}rS_T\phi\ dS_T \\ & = r\int_K^{\infty} (S_T\phi)\ dS_T \end{align*}\begin{align*} \int_K^{\infty}\big( S_T - K \big){\partial^2 \over \partial S_T^2}\big( \sigma^2 S_T^2\phi\big)dS_T & =\Big[\big( S_T - K \big){\partial \over \partial S_T}(\sigma^2 S_T^2\phi) \Big]^{\infty}_{K} - \int_K^{\infty}{\partial \over \partial S_T}(\sigma^2 S_T^2\phi)\ dS_T \\ & = -\sigma^2 K^2\phi(K,T) \end{align*}where in both cases, the boundary terms disappear at the upper limit due to the distribution \inline \phi(t,S_t) and its derivatives, which go to zero rapidly at high spot.

We already have an expression for \inline \phi(t,S_t) in terms of C and its derivatives from our survey of risk-neutral probabilities,

\phi(t,S_t) = {1 \over P(0,t)}{\partial^2 C \over \partial K^2}

and we can re-arrange the formula above for call option prices

\begin{align*} P(0,T)\int_K^{\infty} S_T\ \phi\ dS_T & = C + P(0,T)\int_K^{\infty} K\phi\ dS_T \\ & = C + K {\partial C \over \partial K} \end{align*}and substituting these expressions for \inline \phi(t,S_t) and \inline \int^{\infty}_K (S_T \phi)\ dS_T into the equation above

\begin{align*} rC + {\partial C \over \partial T} & = P(0,T)\cdot \Big[ r\int_K^{\infty} (S_T\phi)\ dS_T + \sigma^2 K^2\phi(K,T) \Big]\\ & = rC + rK {\partial C \over \partial K} + \sigma^2 K^2{\partial^2 C \over \partial K^2} \end{align*}

and remember that \inline \sigma = \sigma(t,S_t), which is our Dupire local vol. Cancelling the rC terms from each side and re-arranging gives

\sigma(T,K) = \sqrt{ {\partial C \over \partial T} + rK {\partial C \over \partial K} \over K^2{\partial^2 C \over \partial K^2}}

It’s worth taking a moment to think what this means. From the market, we will have access to call prices at many strikes and expires. If we can choose a robust interpolation method across time and strike, we will be able to calculate the derivative of price with time and with strike, and plug those into the expression above to give us a Dupire local vol at each point on the time-price plane. If we are running a Monte-Carlo engine, this is the vol that we will need to plug in to evolve the spot from a particular spot level and at a particular time, in order to recover the vanilla prices observed on the market.

A nice property of the local vol model is that it can match uniquely any observed market call price surface. However, the model has weaknesses as well – by including only one source of uncertainty (the volatility), we are making too much of a simplification. Although vanilla prices match, exotics priced using local vol models typically have prices that are much lower than prices observed on the market. The local vol model tends to put most of the vol at early times, so that longer running exotics significantly underprice.

It is important to understand that this is NOT the implied vol used when calculating vanilla vol prices. The implied vol and the local vol are related along a spot path by the expression

\Sigma^2 T = \oint_0^T\sigma^2(S_t,t)dt

(where \inline \Sigma is the implied vol) and the two are quite different. Implied vol is the square root of the average variance per unit time, while the local vol gives the amount of additional variance being added at particular positions on the S-t plane. Since we have an expression for local vol in terms of the call price surface, and there is a 1-to-1 correspondence between call prices and implied vols, we can derive an expression to calculate local vols directly from an implied vol surface. The derivation is long and tedious but trivial mathematically so I don’t present it here, the result is that the local vol is given by (rates are excluded here for simplicity)

\sigma(y,T) = \sqrt{{\partial w \over \partial T} \over \Big[ 1 - {y \over w}{\partial w \over \partial y} + {1\over 2}{\partial^2 w \over \partial y^2}+ {1 \over 4}\Big( -{1 \over 4} - {1 \over w}+ {y^2 \over w}\Big)\Big({\partial w \over \partial y}\Big)^2 \Big]}

where \inline w = \Sigma^2 T is the total implied variance to a maturity and strike and \inline y = \ln{K \over F_T} is the log of ‘moneyness’.

This is probably about as far as Black-Scholes alone will take you. Although we can reprice any vanilla surface, we’re still not pricing exotics very well – to correct this we’re going to need to consider other sources of uncertainty in our models. There are a wide variety of ways of doing this, and I’ll being to look at more advanced models in future posts!

Forwards vs. Futures

I’ve covered Forwards and Futures in previous posts, and now that I’ve covered the basics of Stochastic Interest Rates as well, we can have a look at the difference between Forwards and Futures Contracts from a financial perspective.

As discussed before, the price of a Forward Contract is enforceable by arbitrage if the underlying is available and freely storable and there are Zero Coupon Bonds available to the Forward Contract delivery time. In this case, the forward price is

F(t,T) = S(t) \cdot {1 \over {\rm ZCB}(t,T)}

In this post I’m going to assume a general interest rate model, which in particular may well be stochastic. In such cases, the price of a ZCB at the present time is given by

{\rm ZCB}(t,T) = {\mathbb E}\Big[ \exp{\Big\{\int_t^T r(t') dt'\Big\} } \Big]

Futures Contracts are a bit more complicated, and we need to extend our earlier description in the case that there are interest rates. The basic description was given before, but additionally in the presence of interest rates, any deposit that is in either party’s account is delivered to the OTHER party at the end of each time period. So, taking the example from the previous post, on day 4 we had $4 on account with the exchange – if rates on that day were 10% p.a., over that day the $4 balance would accrue about 10c interest, which would be paid to the other party.

Let’s say we’re at time s, and want to calculate the Futures price to time T. Our replication strategy is now as follows, following the classic proof due to Cox, Ingersall and Ross but in continuous time. Futures Contracts are free to enter into and break out of due to the margin in each account, so entering X Futures Contracts at time t and closing them at time t+dt will lead to a net receipt (or payment if negative) of \inline {\rm X}\cdot\big[ H(t+dt,T) - H(t,T)\big]. From t+dt to T, we invest (borrow) this amount at the short rate and thus recieve (need to pay)

{\rm X}\cdot\big[ H(t+\tau,T) - H(t,T)\big]\cdot\prod_t^T \big( 1 + r(t)\tau \big)

and now moving to continuous time

{\rm X}\cdot\big[ H(t+dt,T) - H(t,T)\big]\cdot\int_t^T e^{ r(t)}\ dt

We follow this strategy in continuous time, constantly opening contracts and closing them in the following time period [I’m glossing over discrete vs. continuous time here – as long as the short rate corresponds to the discrete time step involved this shouldn’t be a problem], and investing our profits and financing our losses both at the corresponding short rate. We choose a different X for each period [t,t+td] so that \inline {\rm X}(t) = \int_s^t \exp{\{r(t')\}}dt'. We also invest an amount H(s,T) at time s at the short rate, and continually roll this over so that it is worth \inline H(s,T)\cdot \int_s^T \exp{\{r(t)\}}dt at time T

Only the final step of this strategy costs money to enter, so the net price of the portfolio and trading strategy is H(s,T). The net payoff at expiry is

H(s,T)\cdot \int_s^T e^{r(t)}dt + \sum_s^T {\rm X}\cdot[H(t+dt,T)-H(t,T))]\cdot\int_t^T e^{r(t)}dt

= H(s,T)\cdot \int_s^T e^{r(t)}dt + \sum_s^T \int_s^t e^{r(t)}dt\cdot[H(t+dt,T)-H(t,T))]\cdot\int_t^T e^{r(t)}dt

= H(s,T)\cdot \int_s^T e^{r(t)}dt + \int_s^T e^{r(t)}dt \cdot \sum_s^T [H(t+dt,T)-H(t,T))]

= H(s,T)\cdot \int_s^T e^{r(t)}dt + \int_s^T e^{r(t)}dt \cdot [H(T,T)-H(s,T))]= H(T,T) \cdot \int_s^T e^{r(t)}dt

And H(T,T) is S(T), so the net payoff of a portfolio costing H(s,T) is

= S(T) \cdot \int_s^T e^{r(t)}dt

How does this differ from a portfolio costing the Forward price? Remembering that in Risk-Neutral Valuation, the present value of an asset is equal to the expectation of its future value discounted by a numeraire. In the risk-neutral measure, this numeraire is a unit of cash B continually re-invested at the short rate, which is worth \inline B(t,T) = e^{\int_t^T r(t')dt' }, so we see that the Futures Price is a martingale in the risk-neutral measure (sometimes called the ‘cash measure’ because of its numeraire). So the current value of a Futures Contract on some underlying should be

H(t,T) = {\mathbb E}^{\rm RN}\big[ S(T) | {\cal F}_t \big]

ie. the undiscounted expectation of the future spot in the risk-neutral measure. The Forward Price is instead the expected price in the T-forward measure whose numeraire is a ZCB expiring at time T

F(t,T) = {\mathbb E}^{\rm T}\big[ S(T) | {\cal F}_t \big]

We can express these in terms of each other remembering F(T,T) = S(T) = H(T,T) and using a change of numeraire (post on this soon!). I also use the expression for two correlated lognormal, which I derived at the bottom of this post

\begin{align*} F(t,T) &= {\mathbb E}^{T}\big[ F(T,T) | {\cal F}_t \big] \\ &= {\mathbb E}^{T}\big[ S(T) | {\cal F}_t \big] \\ &= {\mathbb E}^{T}\big[ H(T,T) | {\cal F}_t \big] \\ &= {\mathbb E}^{RN}\big[ H(T,T) {B(t)\over B(T)} {{\rm ZCB}(t,T)\over {\rm ZCB(T,T)}}| {\cal F}_t \big] \\ &= {\rm ZCB}(t,T){\mathbb E}^{RN}\big[ H(T,T) {1\over B(T)}| {\cal F}_t \big] \\ &= {\rm ZCB}(t,T){\mathbb E}^{RN}\big[ H(T,T)\big] {\mathbb E}^{RN}\big[ e^{-\int_t^T r(t')dt'} \big] e^{\sigma_H \sigma_B \rho} \\ &= H(t,T) \cdot e^{\sigma_H \sigma_B \rho} \\ \end{align*}

where \inline \sigma_H is the volatility of the Futures price, and \inline \sigma_B is the volatility of a ZCB – in general the algebra will be rather messy!

As a concrete example, let’s consider the following model for asset prices, with S driven by a geometric brownian motion and rates driven by the Vasicek model discussed before

{dS \over S} = r(t) dt + \sigma_S dW_t

dr = a \big[ \theta - r(t)\big] dt + \sigma_r \widetilde{dW_t}

And (critically) assuming that the two brownian processes are correlated according to rho

dW_t \cdot \widetilde{dW_t} = \rho dt

In this case, the volatility \inline \sigma_B is the volatility of \inline {\mathbb E}\big[ e^{-\int_t^T r(t')dt'}\big], and as I discussed in the post on stochastic rates, this is tractable and lognormally distributed in this model.

We can see that in the case of correlated stochastic rates, these two prices are not the same – which means that Futures and Forward Contracts are fundamentally different financial products.

 

For two standard normal variates x and y with correlation rho, we have:

\begin{align*} {\mathbb E}\big[ e^ {\sigma_1 x} \big]& = e^ {{1\over 2}\sigma_1^2 } \end{align*}

and

\begin{align*} {\mathbb E}\big[ e^ {\sigma_1 x + \sigma_2 y} \big]& = {\mathbb E}\big[ e^ {\sigma_1 x + \sigma_2 \rho x + \sigma_2 \sqrt{1-\rho^2}z} \big]\\ & = {\mathbb E}\big[ e^ {(\sigma_1 + \sigma_2 \rho) x + \sigma_2 \sqrt{1-\rho^2}z} \big]\\ & = \big[ e^ {{1\over 2}(\sigma_1 + \sigma_2 \rho)^2 + {1\over 2}(\sigma_2 \sqrt{1-\rho^2})^2} \big]\\ & = \big[ e^ {{1\over 2}\sigma_1^2 + {1\over 2}\sigma_2^2 + \sigma_1 \sigma_2 \rho} \big]\\ & = {\mathbb E}\big[ e^ {\sigma_1 x }\big] {\mathbb E} \big[ e^{\sigma_2 y}\big] e^{ \sigma_1 \sigma_2 \rho} \end{align*}

Stochastic Rates Models

In a previous post, I introduced the Black-Scholes model, in which the price of an underlying stock is modeled with a stochastic variable that changes unpredictably with time. I’ve also discussed the basic model-independent rates products whose value can be determined at the present time exactly. However, to progress further with interest rate derivatives, we’re going to need to model interest rates more carefully. We’ve assumed rates are deterministic so far, but of course this isn’t true – just like stocks, they change with time in an uncertain manner, so we need to allow them to become stochastic as well.

One way of doing this is by analogy with the BS case, by allowing the short rate (which is the instantaneous risk-neutral interest rate r(t)) to become stochastic as well, and then integrating it over the required period of time to calculate forward rates.

A very basic example of this is the Vasicek Model. In this model the short rate is defined to be stochastic, with behaviour governed by the following SDE

    \[dr = k\ [\ \theta - r(t)\ ]\ dt\ +\ \sigma\ dW\]

where k\theta and \sigma are constants and dW is the standard Wiener increment as described before. This is marginally more complicated than the BS model, but still belongs to the small family of SDEs that are analytically tractable. Unlike stock prices, we expect rates to be mean-reverting – stock price variance grows with time, but we expect the distribution of rates to be confined to a fairly narrow range by comparison. The term in square brackets above achieves this, since if r(t) is greater than \theta then it will be negative and cause the rate to be pulled down, while if it is below \theta the term will be positive and push the rate up towards \theta.

Solving the equation requires a trick, which is instead of thinking about the rate alone to think about the quantity d\big(\ e^{kt}\cdot r(t)\ \big). This is equal to e^{kt}\cdot dr(t)\ + \ ke^{kt} dt \cdot r(t), and substituting in the incremental rate term from the original equation we have

    \[d\big(\ e^{kt}\cdot r(t)\ \big) = e^{kt} \Big(k\ [ \theta - r(t) ] dt + \sigma dW \Big) + \ e^{kt} dt \cdot r(t)\]

    \[d\big(\ e^{kt}\cdot r(t)\ \big) = e^{kt} \Big( k\cdot \theta\cdot dt + \sigma\cdot dW \Big)\]

note that the term in r(t) has been cancelled out, and the remaining terms can be integrated directly from a starting time s to a finishing time T

    \[\Big[\ e^{kt}\cdot r(t)\ \Big]_{t=s}^{t=T} = \int_{t=s}^{t=T} k\ \theta\ e^{kt}\cdot dt + \int_{t=s}^{t=T} e^{kt} \sigma\cdot dW\]

    \[e^{kT}\cdot r(T)\ - e^{ks}\cdot r(s)\ = \Big[\ \theta\ e^{kt}\ \Big]_{t=s}^{t=T} + \sigma \int_{t=s}^{t=T} e^{kt}\cdot dW\]

    \[r(T)\ = e^{-k(T-s)}\cdot r(s)\ + \theta\Big(\ 1 - e^{-k(T-s)}\ \Big) + \sigma \int_{s}^{T} e^{-k(T-t)}\cdot dW(t)\]

where r(s) is the initial rate. This is simply a gaussian distribution with mean and variance given by

    \[\mathbb{E} \big[\ r(T)\ |\ {\cal F}_{\rm s}\ \big] = e^{-k(T-s)}\cdot r(s)\ +\ \theta\Big(\ 1 - e^{-k(T-s)}\ \Big)\]

    \[\mathbb{V} \big[\ r(T)\ |\ {\cal F}_{\rm s}\ \big] = {\sigma^2 \over 2k} \Big(\ 1 - e^{-2k(T-s)}\ \Big)\]

Where the variance is calculated using the “Ito isometry

    \[\mathbb{V} \big[\ r(T)\ |\ {\cal F}_{\rm s}\ \big] = \mathbb{E} \big[\ \big(r(T)\big)^2\ |\ {\cal F}_{\rm s}\ \big] - \big(\ \mathbb{E} \big[\ r(T)\ |\ {\cal F}_{\rm s}\ \big]\ \big)^2\]

    \[= \Big( \sigma \int_s^T e^{-k(T-t)} dW(t) \Big)^2\]

    \[=\sigma^2 \int_s^T \Big( e^{-k(T-t)}\Big)^2 dt\]

    \[= {\sigma^2 \over 2k} \Big(\ 1 - e^{-2k(T-s)}\ \Big)\]

as stated above. Note that this allows the possibility of rates going negative, which is generally considered to be a weakness of the model, but the chance is usually rather small.




As we know the distribution of the short rate, we can calculate some other relevant quantities. Of primary importance are the Zero Coupon Bonds, which are required for calculation of forward interest rates. A ZCB is a derivative that pays $1 at a future time t, and we can price this using the Risk-Neutral Valuation technique.

According to the fundamental theorem of asset pricing, the current price of a derivative divided by our choice of numeraire must be equal to it’s future expected price at any time divided by the value of the numeraire at that time, with the expectation taken in the probability measure corresponding to the choice of numeraire.

In the risk-neutral measure, the numeraire is just a unit of currency, initially worth $1 but continually re-invested at the instantaneous short rate, so that its price at time t is $1 \cdot e^{\int_0^t r(t') dt}. Now, the price of the ZCB is given by

    \[{Z(0,t) \over \$1} = {\mathbb E}^{RN}\Big[\ {Z(s,t)\over \$1 \cdot e^{\int_{0}^s r(t') dt' }}\ |\ {\cal F}_0\ \Big]\]

    \[Z(0,t) = {\mathbb E}\Big[\ {Z(s,t)\cdot e^{-\int_{0}^s r(t') dt' }}\ |\ {\cal F}_0\ \Big]\]

Although the RHS is true at any time, we only know the value of the ZCB exactly at a single time at the moment – the expiry date t, at which it is worth exactly $1. Plugging in these values we have

    \[Z(0,t) = \$1 \cdot {\mathbb E}\Big[\ e^{-\int_{0}^t r(t') dt' }\ |\ {\cal F}_0\ \Big]\]

So the ZCB is given by the expectation of the integral of the rate over a period of time. Since the rate is itself gaussian, and an integral is the limit of a sum, it’s not surprising that this quantity is also gaussian (it’s effectively the sum of many correlated gaussians, which is also gaussian, as discussed before), but it’s rather tricky to calculate, I’ve included the derivation at the bottom of the post to same space here. The mean and variance are given by

    \[\mathbb{E} \big[\int_s^T r(t)\ dt \ |\ {\cal F}_{\rm s}\ \big] = \big(r(s) - \theta \big) \cdot B(s,T)\ +\ \theta\cdot\big( T- s \big)\]

    \[\mathbb{V} \big[\int^T_s r(t)\ dt \ |\ {\cal F}_{\rm s}\ \big] = {\sigma^2 \over k^2} \Big( (T-s) -B(s,T) - {k\over 2} B(s,T)^2 \Big)\]

where

    \[B(s,T) = {1\over k}\big( 1 - e^{-k(T-s)}\big)\]

The ZCB is given by the expectation of the exponential of a gaussian variable – and we’ve seen on several occasions that

    \[\mathbb{E} \big[\ \exp{(x)}\ |\ x\sim {\mathbb N}(\mu, \sigma^2)\ \big] = \exp{ \Big(\mu + {1\over 2}\sigma^2 \Big) }\]

So the ZCB prices are

    \[Z(s,T) = A(s,T)\ e^{-B(s,T)\ r(s)}\]

with B(s,T) as defined above and

    \[A(s,T) = \exp{ \Big[ \Big( \theta - {\sigma^2 \over 2 k^2}\Big)\Big( B(s,T) - T + s \Big) - {\sigma^2 \over 4 k^2} B(s,T)^2 \Big]}\]

and as these expressions only depend on the rates at the initial time, we can calculate ZCB bond prices and hence forward rates for any future expiry dates at any given time if we have the current instantaneous rate.

Although we can calculate a discount curve for a given set of parameters, the Vasicek model can’t calibrate to an initial ZCB curve taken from the market, which is a serious disadvantage. There are more advanced generalisations which can, and I’ll discuss some soon, but they will use all of the same tricks and algebra that I’ve covered here.

I’ve written enough for one day here – in later posts I’ll discuss changing to the t-forward measure, in which the ZCB Z(0,t) forms the numeraire instead of a unit of currency, which simplifies many calculations, and I’ll use it to price caplets under stochastic rates, and show that these are equivalent to european options on ZCBs.

An alternate approach to the short-rate model approach discussed today which is very popular these days is the Libor Market Model (LMM) approach, in which instead of simulating the short rate and calculating the required forwards, the different forwards required are instead computed directly and in tandem – I’ll look further at this approach in another post.

 

Here is the calculation of the distribution of the integral of the instantaneous rate over the period s to T:

    \[r(t)\ = e^{-k(t-s)}\cdot r(s)\ + \theta\Big(\ 1 - e^{-k(t-s)}\ \Big) + \sigma \int_{s}^{t} e^{-k(t-u)}\cdot dW(u)\]

    \[\int_s^T r(t)\ dt = \int_s^T \Big[\ e^{-k(t-s)} r(s)\ + \theta\Big(\ 1 - e^{-k(t-s)}\ \Big) + \sigma \int_{s}^{t} e^{-k(t-u)} dW(u)\Big] dt\]

and splitting this into the terms that contribute to the expectation and the variance we have

    \[\mathbb{E} \big[\int_s^T r(t)\ dt \ |\ {\cal F}_{\rm s}\ \big] = \int_s^T \Big[\ e^{-k(t-s)} r(s)\ + \theta\Big(\ 1 - e^{-k(t-s)}\ \Big)\Big] dt\]

    \[= r(s) \int_s^T e^{-k(t-s)} \ dt + \theta \int_s^T \Big(\ 1 - e^{-k(t-s)}\ \Big) dt\]

    \[= r(s) \Big[ {1 \over -k}\ e^{-k(t-s)} \Big]^T_s + \theta \Big[ t - {1 \over -k}\ e^{-k(t-s)}\ \Big]^T_s\]

    \[= \big(r(s) - \theta \big)B(s,T)\ +\ \theta\cdot\big( T- s \big)\]

to calculate the variance, we first need to deal with the following term

    \[\int_s^T \Big[\ \sigma \int_{s}^{t} e^{-k(t-u)} dW(u)\Big] dt\]

we use stochastic integration by parts

    \[\sigma \int_s^T \int_{s}^{t} e^{-k(t-u)} dW(u)\ dt = \int_s^T e^{-kt}\big(\int_{s}^{t} e^{ku} dW(u)\big)\ dt\]

    \[= \sigma\int_s^T \big(\int_{s}^{t} e^{ku} dW(u)\big)\ d_t \Big( \int^t_s e^{-kv}dv \Big)\]

    \[= \sigma\Big[ \int_{s}^{t} e^{ku} dW(u)\cdot \int^t_s e^{-kv}dv\ \Big]^T_s - \sigma\int_s^T dW(t)\Big( e^{kt} \int^t_s e^{-kv}dv\ \Big)\]

    \[= \sigma\int_{s}^{T} e^{ku} dW(u)\cdot \int^T_s e^{-kv}dv - \sigma\int_s^T dW(t) e^{kt} \Big( \int^T_s e^{-kv}dv - \int^T_t e^{-kv}dv \Big)\]

    \[= \sigma\int_s^T e^{kt} \Big(\int^T_t e^{-kv}dv\Big) dW(t)\]

    \[= {\sigma \over k}\int_s^T \big(1 - e^{-k(T-t)}\big)\ dW(t)\]

and we’re now in a position to try and find the variance of the integral

    \[\mathbb{V} \big[\int^T_s r(t)\ dt \ |\ {\cal F}_{\rm s}\ \big] = {\mathbb E}\Big[\ \Big({\sigma \over k}\int_s^T \big(1 - e^{-k(T-t)}\big)\ dW(t)\Big)^2\ \Big]\]

    \[= {\mathbb E}\Big[\ {\sigma^2 \over k^2}\int_s^T \big(1 - e^{-k(T-t)}\big)^2\ dt\ \Big]\]

where Ito’s isometry has been used again, and several more lines of routine algebra leads to the result

    \[= {\sigma^2 \over k^2} \Big( (T-s) -B(s,T) - {k\over 2} B(s,T)^2 \Big)\]

Bootstrapping the Discount Curve from Swap Rates

Today’s post will be a short one about calculation of discount curves from swap rates. I’ve discussed both swaps and discount curves in previous posts, you should read those before this one or it might not make much sense!

Although bonds can be used to calculate discount bond prices, typically swaps are the most liquid products on the market and will go to the longest expiry times (often 80+ years for major currencies), so these are used to calculate many of the points on the discount curve [and often both of these can be done simultaneously to give better reliability].

In the previous post on swaps, I calculated the swap rate X that makes swaps zero-valued at the current time

    \[X = {Z(0,t_0) - Z(0,t_N) \over \sum_{n=0}^{N-1} \tau \cdot Z(0,t_{n+1})}\]

where the t‘s here represent the fixing dates of the swap (although payment is made at the beginning of the following period, so the n‘th period is received at t_{n+1}.

Consider the sequence of times \{t_0 = 0, t_1, t_2, \dots , t_n\} for which a sequence of swaps are quoted on the markets, with swap rates X_i for the swap running from t_0 up to t_i. We can back out the discount factor at each time as follows:

    \[X_0 = {Z(0,0) - Z(0,t_1) \over \tau \cdot Z(0,t_1)}\]

    \[\Rightarrow Z(0,t_1) = {1 \over 1 + \tau X_0 }\]

    \[X_1 = {Z(0,0) - Z(0,t_2) \over \tau \big(Z(0,t_1) + Z(0,t_2)\big) }\]

    \[\Rightarrow Z(0,t_2) = {1 - \tau \cdot X_1 \cdot Z(0,t_1) \over 1 + \tau X_1}\]

and we can see from this the general procedure, calculating another ZCB from each successive swap rate using the expression

    \[Z(0,t_i) = {1 - \sum_{i=1}^{i-1} \big(\tau \cdot X_i \cdot Z(0,t_i) \big) \over 1 + \tau X_{i-1}}\]

These swaps and ZCBs are called co-initial because they both started at the same time t_0.

Now imagine that instead the swaps X_i have the first fixing at time t_i and their final fixing at time t_n for 0 \leq i < n – such swaps are called co-terminal swaps as they start at different times but finish at the same one. Once again we can calculate the discount factors up to a constant factor, this time by working backwards:

    \[X_{n-1} = {Z(0,t_{n-1}) - Z(0,t_n) \over \tau \cdot Z(0,t_n)}\]

    \[\Rightarrow Z(0,t_{n-1}) = Z(0,t_n)\cdot\big( { 1 + \tau X_{n-1} }\big)\]

    \[X_{n-2} = {Z(0,t_{n-2}) - Z(0,t_n) \over \tau \big(Z(0,t_{n-1}) + Z(0,t_n)\big) }\]

    \[\Rightarrow Z(0,t_{n-2}) = Z(0,t_{n})\big( 1 + \tau X_{n-2} ( 1 + \tau X_{n-1}) \big)\]

and so on, the dcfs can be backed out.

To specify the exact values of co-terminal swaps, we need to know at least one dcf exactly. In general the co-initial case will also require this – I implicitly assumed that they started fixing at t=0 where we know Z(0,0) = 1, but for general co-initial swaps we would also have this issue.

Solving the Heat Equation

This post follows on from my earlier post on the BS Equation from Delta Hedging. In that post, I showed how delta hedging arguments within the BS model lead to the heat equation

{1\over 2}\sigma^2 {\partial^2 D\over \partial y^2} - {\partial D \over \partial \tau} = 0

where \inline D = C e^{-r \tau}, C(S,t) is the derivative price, \inline y = \ln S + \big( r - {1\over 2}\sigma^2 \big)\tau, and \inline \tau = T - t. In this post I’m going to solve this equation to give the BS option price for a vanilla call. S can only take positive values, but due the the logarithm y runs from \inline -\infty to \inline \infty.

In physics, this equation is usually encountered with periodic boundary conditions coming from the boundaries of the region of interest, but here we don’t have that luxury so will need to be more careful. In this post I’m going to solve the equation using the method of separation of variables, in a later post I’ll re-solve it using the method of Fourier transforms.

The heat equation is linear, which means that if \inline f(y,\tau) and \inline g(y,\tau) are both solutions, then so is the sum \inline \big( f(y,\tau) + g(y,\tau) \big). This means that if we can find a selection of possible solutions, we can combine them to match the boundary condition given by the known form of \inline D(y,\tau = 0) at expiry (coming from \inline C(S,\tau = 0) = \big( S - K \big)^+). We look for separable solutions, which are those of the form \inline f(y,\tau) = Y(y)T(\tau), in which case the derivatives in the heat equation only act on their respective terms:

{1\over 2}\sigma^2 T(\tau){\partial^2 Y(y)\over \partial y^2} - Y(y){\partial T(\tau) \over \partial \tau} = 0

{1\over 2}\sigma^2 Y''T = Y \dot{T}

{1\over 2}\sigma^2 {Y''\over Y} = {\dot{T} \over T} = \lambda^2

where a dash denotes differentiation by y and a dot differentiation by tau, In the final equation, both sides depend only on independent variables y and tau respectively – the only way this can be true is if both sides are equal to some constant, which we call \inline \lambda^2.

In general, we should consider the possibilities that \inline \lambda^2 is zero or negative, but it turns out that these won’t matter in this example so I ignore them for brevity. Two equations can be extracted, linked by \inline \lambda^2

Y'' - 2 {\lambda^2 \over \sigma^2} Y = 0

\dot{T} - \lambda^2 T = 0

Solving these equations,

Y_\lambda(y) = Y_{0,\lambda} e^{\sqrt{2}{\lambda \over \sigma }y}

T_\lambda (\tau) = e^{\lambda^2 \tau}

with any positive value of lambda allowed. We first try to find the fundamental solution of the heat equation, which is the solution that satisfies \inline f(y,\tau = 0) = \delta(y-a) where \inline \delta is the Dirac delta function, and I’ll use this to find the particular solution for a vanilla call at the end. At \inline \tau = 0 all of the functions T are 1, so these can be neglected, and we simply sum up the right number of Y terms, each with a different value of \inline \lambda which we index with n, to produce the delta function:

\sum_n Y_{0,\lambda_n} e^{\sqrt{2}{\lambda_n \over \sigma }y} = \delta(y-a)because \inline \lambda can take any value on the continuous interval \inline -\infty < \lambda < \infty, we change the sum to an integral, and also express the delta function using its integral representation and compare co-efficient to equate the two expressions

\int_{-\infty}^{\infty} Y_{0,\lambda_n} e^{\sqrt{2}{\lambda_n \over \sigma }y}dn = {1 \over 2\pi} \int_{-\infty}^{\infty} e^{in(y-a)}dn

so

Y_{0,n} = {1 \over 2\pi}e^{-in a} \quad ; \quad \lambda_n = {1 \over \sqrt{2}}\sigma i n

And plugging these into our earlier expressions for Y and T, and combining these in the same way for \inline \tau > 0 to provide the temporal behavior, we arrive at

f(y,\tau) = \int_{-\infty}^{\infty} Y_{\lambda_n}(y) T_{\lambda_n}(\tau) dn

= {1\over 2\pi} \int_{-\infty}^{\infty} e^{in(y-a)} e^{-{\sigma^2 n^2 \tau \over 2}}dn

and once again, this is a gaussian integral that we’ll tackle by completing the square…

= {1\over 2\pi} \int_{-\infty}^{\infty} \exp\Big\{ -{1 \over 2} \sigma^2 \tau \Big( n^2 - 2 {i(y-a)\over \sigma^2 \tau} + \big( {i(y-a)\over \sigma^2 \tau}\big)^2-\big( {i(y-a)\over \sigma^2 \tau}\big)^2 \Big) \Big\} dn

= {1\over 2\pi} \int_{-\infty}^{\infty} \exp\Big\{ -{1 \over 2} \sigma^2 \tau \Big(n- \big( {i(y-a)\over \sigma}\big) \Big)^2 \Big\} \cdot \exp{-(y-a)^2 \over 2 \sigma^2 \tau} dn

= {1\over 2\pi}\cdot\sqrt{2\pi \over \sigma^2 \tau}\cdot \exp{-(y-a)^2 \over 2 \sigma^2 \tau}

= {1\over \sqrt{2\pi \sigma^2 \tau}}\cdot \exp{-(y-a)^2 \over 2 \sigma^2 \tau}

and this is the fundamental solution given at the end of the first post and also at wikipedia. Note that although we used separation of variables to find the forms of the functions Y and T, the separability hasn’t been carried through the summation of many different T-Y combinations – the fundamental solution that we’ve arrived at ISN’T separable!

The fundamental solution is the time and space evolution of a function that is initially a delta-function, such that the evolution always obeys the heat equation. We can use the following property of the delta function to construct specific solutions for particular payoff functions:

f(y) = \int_{-\infty}^{\infty} \delta(y-a) f(a) da

So, because the fundamental solution \inline f(y,\tau=0) = \delta(y-a), we can construct the payoff function using this integral at \inline \tau = 0

D(y,\tau=0) = \int_{-\infty}^{\infty} f(a,\tau=0) D(a,\tau=0) da

and consequently, at later times

D(y,\tau) = \int_{-\infty}^{\infty} f(a,\tau) D(a,\tau=0) daand we can solve this for the specific case that \inline C(S,\tau = 0) = \big( S(\tau=0) - K \big)^+ as follows (denoting a as y’)

D(y,\tau) = {e^{-r\tau} \over \sqrt{2\pi\sigma^2\tau}}\int_{-\infty}^{\infty} \exp\Big\{ - {(y'-y)^2 \over 2\sigma^2 \tau} \Big\} \cdot \big( e^{y'} - K \big)^+ dy'

The second factor is only non-zero when \inline y' > \ln K = y_L, so we can absorb this condition into the lower limit

= {e^{-r\tau} \over \sqrt{2\pi\sigma^2\tau}}\int_{y_L}^{\infty} \exp\Big\{ - {(y'-y)^2 \over 2\sigma^2 \tau} \Big\} \cdot \big( e^{y'} - K \big) dy'and splitting this into two integrals, which we do in turn

= {e^{-r\tau} \over \sqrt{2\pi\sigma^2\tau}}\int_{y_L}^{\infty} K\exp\Big\{ - {(y'-y)^2 \over 2\sigma^2 \tau} \Big\} dy' = e^{-r\tau}\cdot K\cdot \Phi\Big({y - y_L \over \sigma \sqrt{\tau}}\Big)

and

{e^{-r\tau} \over \sqrt{2\pi\sigma^2\tau}}\int_{y_L}^{\infty} \exp\Big\{ - {(y'-y)^2 \over 2\sigma^2 \tau} \Big\} \cdot e^{y'} dy'

= {e^{-r\tau} \over \sqrt{2\pi\sigma^2\tau}}\int_{y_L}^{\infty} \exp\Big\{ - {1\over 2\sigma^2 \tau}\Big( y'^2 + y^2 - 2y'y -2y'\sigma^2\tau \pm ( y + \sigma^2\tau )^2 \Big) \Big\} dy'

= {e^{-r\tau} \over \sqrt{2\pi\sigma^2\tau}}\int_{y_L}^{\infty} \exp\Big\{ - {1\over 2\sigma^2 \tau}\Big( \big(y'-(y+\sigma^2 \tau)\big)^2 - \big(y + \sigma^2 \tau\big)^2 + y^2\Big) \Big\} dy'

= {e^{-r\tau} \over \sqrt{2\pi\sigma^2\tau}}\int_{y_L}^{\infty} \exp\Big\{ - {1\over 2\sigma^2 \tau}\Big( \big(y'-(y+\sigma^2 \tau)\big)^2 - \sigma^2\tau\big(2y + \sigma^2 \tau\big)\Big) \Big\} dy'

= {e^{-r\tau} \over \sqrt{2\pi\sigma^2\tau}}\int_{y_L}^{\infty} \exp\Big\{ - {\big(y'-(y+\sigma^2 \tau)\big)^2\over 2\sigma^2 \tau}\Big\}\cdot S e^{r\tau} dy'

= e^{-r\tau} \cdot F \cdot \Phi \Big( {y - y_L + \sigma^2 \tau \over \sigma \sqrt{\tau}} \Big)

where as usual, \inline \Phi(x) is the cumulative normal distribution; and we’ve used the expression \inline y = \ln S + (r - {1\over 2}\sigma^2)\tau and also \inline y + {1 \over 2}\sigma^2 \tau = S e^{r\tau} = F. Putting all of this together, we finally arrive at the Black-Scholes expression for a vanilla call price,

{e^{-r\tau} \over \sqrt{2\pi\sigma^2\tau}}\int_{y_L}^{\infty} \exp\Big\{ - {(y'-y)^2 \over 2\sigma^2 \tau} \Big\} \cdot \big( e^{y'} - K \big) dy'

= e^{-r\tau} \Big( F \cdot \Phi(d_1) - K\cdot \Phi(d_2)\Big)

 

 

Wow! Finally! Hopefully this will convince you that the method of risk-neutral valuation is a bit more straight-forward!