Hedging in a finite-state model (Binary Trees)

Today’s post is about hedging in a world where there are only a few outcomes. It demonstrates a lot of principles that are important in asset pricing, and it’s also a frequent topic for interview questions so almost got included as one of those!

A very basic approximation for a stock is a binary tree, as in the diagram shown below. The stock is worth 100 today, and after some time (say a year) it will be worth either 110 with probability p or 90 with probability (1-p). In this very basic model, we assume no other values are possible – the stock either does well and goes up by 10, or it does badly and goes down by 10. Furthermore, we assume that there is a risk-free yearly interest rate r at which we can deposit money in the bank and receive (1+r) times our initial deposit in a year.

The two possibilities for a stock's evolution in the binary tree model

The two possibilities for a stock’s evolution in the binary tree model

We’re going to try and price options in this model. Let’s say we’ve got a call option to buy the stock in a year’s time for 100. How much is this worth? Because of the simple state of the world, we can enumerate the two possibilities – either the stock is worth 110, in which case the call option to buy it for 100 is worth 10, or the stock is worth 90, and the option is worthless. We’ll try and simulate that payoff using a combination of cash-in-the-bank and the stock itself, and then by no-arbitrage the price must be the same today.

Our portfolio consists of \alpha stocks S and \beta cash-in-the-bank. The value of the cash will certainly be (1+r)\beta in a year, while the stock will be 110 \alpha or 90 \alpha depending on the state, and we want the sum of these two to equal the option payoff (in either the up or the down scenario), which gives us two simultaneous equations

    \[(1+r) \beta + 110 \alpha = 10\]

    \[(1+r) \beta + 90 \alpha = 0\]


    \[\alpha = 0.5\]

    \[\beta = {-45 \over (1+r)}\]

This says that a portfolio containing half of a stock and minus 45 pounds in the bank will exactly replicate the payoff of the option in either world state. The value of this portfolio today is just 0.5*100 – 45/(1+r), and I’ve plotted that as a function of r below.

The price of a call option on a stock in the two-state world given above

The price of a call option on a stock as a function of the risk-free rate in the two-state world given above

This gives meaningless prices if r > 0.1 (the price of the option must be between 0 and 10/(1+r) as these are the discounted values of the least/most that it can pay out at expiry). What does this mean intuitively? Well, if r > 0.1 then we have an arbitrage: 100 in the bank will *always* yield more than the stock at expiry, so we should short the stock as much as possible and put the proceeds in the bank. If r < -0.1 the situation is exactly reversed

The important point about all of this was that the option price DOESN’T depend on the relative chances of the stock increasing to 110 or falling to 90. As long as both of these are strictly greater than zero and less than one, then ANY set of probabilities will lead to the same price for the option. This is the discrete analogue of a result I presented before (here) – that the expected growth of the stock in the real world doesn’t matter to the option’s price, it is the risk-free rate that affects its price. Indeed, it is possible to derive the continuous result using a binary tree by letting the time period go to zero, I won’t attempt it here but the derivation can be found in many textbooks (eg. on wikipedia).

Things really get interesting when we try to extend the model in a slightly different way. Often banks will be interested in models that have not just a diffusive component, but also a ‘jump’ component, which gives a finite chance that a price will fall suddenly by a large amount (I’ll present one such model in the near future). Unlike a straight-forward Black-Scholes model, because these jumps happen suddenly and unexpectedly they are very difficult to hedge, and are meant to represent market crashes that can result in sudden sharp losses for banks.

In our simple tree model, this can be incorporated by moving to a three-branch model, shown below

The three possibilities for a stock's evolution in the three-branch tree model

The three possibilities for a stock’s evolution in the three-branch tree model

We have the same two branches as before, but an additional state now exists in which the stock price crashes to 50 with probability q. In this case, again trying to price an option to buy the stock for 100 gives three simultaneous equations

    \[(1+r) \beta + 110 \alpha = 10\]

    \[(1+r) \beta + 90 \alpha = 0\]

    \[(1+r) \beta + 50 \alpha = 0\]

Unlike before, we can’t find a single alpha and beta that will replicate the payoff in all three world states, as we have three equations and only two unknowns. Consequently, the best we will be able to to is sub-replicate or super-replicate the payoff. That is, find portfolios that either always pay equal to or greater than the option, or always pay less than or equal to the option. These will give bounds on the lowest and highest arbitrage-free option prices, but that’s as far as arbitrage-free prices will take us (in fact ANY of the prices between this limit is arbitrage-free) – choosing an actual price will require knowing the probabilities of p and q and deciding on our personal risk-premium.

In the simple three-state model, lower and upper bounds can be calculated by ignoring the second and third equation respectively above, and they give the limits shown in the graph below. Once again, as r \to 0.1 they converge, but note that a case where r < -0.1 is now possible, as the ‘crash’ option means that the stock can still pay out less than the bank account

The limiting values given by the no-arbitrage requirement on the price of a call option as a function of the risk-free rate in the three-branch tree model given above

The limiting values given by the no-arbitrage requirement on the price of a call option as a function of the risk-free rate in the three-branch tree model given above

This is what’s called an incomplete market. The securities that exist in the market aren’t sufficient to hedge us against all future possible states of the universe. In this case, we can’t uniquely determine prices by risk-neutral pricing – sice we can’t hedge out all of the risk, risk preferences of investors will play a part in determining the market-clearing prices of securities. Most models that are used in the industry are incomplete in this way – I’ll be looking at some examples of this soon which will include models involving jump processes and stochastic volatility.

Improving Monte Carlo: Control Variates

I’ve already discussed quite a lot about Monte Carlo in quantitative finance. MC can be used to value products for which an analytical price is not available in a given model, which includes most exotic derivatives in most models. However, two big problems are the time that it takes to run and the ‘Monte Carlo error’ in results.

One technique for improving MC is to use a ‘control variate’. The idea is to find a product whose price is strongly correlated to the product that we’re trying to price, but which is more easy to calculate or which we already know the price of. When we simulate a path in MC, it will almost surely be either an under-estimate or an over-estimate of the true price, but we don’t know which, and averaging all of these errors is what leads to the Monte Carlo error in the final result. The insight in the control variate technique is to use the knowledge given to us by the control variate to reduce this error. If the two prices are strongly correlated and a path produces an over-estimate of product price, it most likely also produces an over-estimate of the control variate and visa versa, which will allow us to improve our estimate of the product we’re trying to price.

The textbook example is the Asian Option. Although the arithmetic version of the asian option discussed in previous posts has no analytic expression in BS, a similar Geometric asian option does have an analytic price. So, for a given set of model parameters, we can calculate the price of the option. As a reminder, the payoff of an arithmetic asian option at expiry is

C_{\rm arit}(T) = \Bigl({1\over N}\sum_{i=0}^{N-1} S(t_i) - K \Bigr)^+

and the payoff of the related geometric averaging asian is

C_{\rm geo}(T) = \Bigl( \bigl(\prod_{i=0}^{N-1} S(t_i)\bigr)^{1\over N} - K \Bigr)^+

Denoting the price of the arithmetic option as \inline X and the geometric option as \inline Y, the traditional monte carlo approach is to generate \inline N paths, and for each one calculate \inline X_i (the realisation of the payoff along the path) and take the average over all paths, so that

{\mathbb E}[X] = {1 \over N} \sum_{i=0}^{N-1} X_i

which will get closer to the true price as \inline N \to \infty.

Using \inline Y as a control variate, we instead calculate

{\mathbb E}[X] = {1\over N} \sum_{i=0}^{N-1}\bigl( X_i - \lambda( Y_i - {\mathbb E}[Y] ) \bigr)

where \inline {\mathbb E}[Y] is the price of the geometric option known from the analytical expression, and \inline \lambda is a constant (in this case we will set it to 1).

What do we gain from this? Well, consider the variance of X_i - \lambda (Y_i - {\mathbb E } [Y])

{\rm Var}\bigl( X_i - ( Y_i - {\mathbb E}[Y] ) \bigr) = {\rm Var}( X_i ) + \lambda^2 {\rm Var} ( Y_i ) - 2 \lambda {\rm Cov}( X_i Y_i )

(since {\mathbb E } [Y] is known, so has zero variance) which is minimal for \inline \lambda = \sqrt{ {\rm Var}(X_i) \over {\rm Var}(Y_i) }\cdot \rho(X_i,Y_i) in which case

{ {\rm Var}\bigl( X_i - ( Y_i - {\rm E}[Y] ) \bigr) \over {\rm Var}( X_i )} = 1 - \rho(X_i,Y_i)^2

that is, if the two prices are strongly correlated, the variance of the price calculated using the control variate will be a significantly smaller. I’ve plotted a sketch of the prices of the two types of average for 100 paths – the correlation is about 99.98%. Consequently, we expect to see a reduction in variance of about 2000 times for a given number of paths (although we now have to do a little more work on each path, as we need to calculate the geometric average as well as the arithmetic average of spots). This is roughly 45 times smaller standard error on pricing – well over an extra decimal place, which isn’t bad – and this is certainly much easier than running 2000 times as many paths to achieve the same result.

The relationship between payout for a geometric and an arithmetic asian option, which here demonstrate a 99.98% sample correlation

The relationship between payout for a geometric and an arithmetic asian option, which here demonstrate a 99.98% sample correlation. Parameters used were: r: 3%; vol: 10%; K: 105; S(0): 100; averaging dates: monthly intervals for a year

Asian options III: the Geometric Asian

 I’ve introduced the asian option here, it is similar to a vanilla option but its payoff is based on the average price over a period of time, rather than solely the price at expiry. We saw that it was exotic – it is not possible to give it a unique price using only the information available from the market.

Today I’m going to introduce another type of option – the geometric asian option. This option is similar, but its payoff is now based on the geometric average, rather than the arithmetic average, of the spot over the averaging dates

C(T) = \Bigl( \bigl( \prod_{i=1}^{N} S_i \bigr)^{1 \over N} - K \Bigr)^+

This option is exotic, just like the regular (arithmetic-average) asian option. At first sight it seems even more complicated; and what’s more, it is almost never encountered in practice, so why bother with it? Well, it’s a very useful as an exercise because in many models where the arithmetic asian’s price has no closed form, the geometric asian happens to have one! Further, since an arithmetic average is ALWAYS higher than a geometric average for a set of numbers, the price of the geometric asian will give us a strict lower bound on the price of the arithmetic asian.

Considering the Black-Scholes model in its simplest form for the moment (although the pricing formula can be extended to more general models), let’s consider what the spot will look like at each of the averaging times, and as we did in the earlier post, considering a simple geometric asian averaging over only two times \inline t_1 and \inline t_2 so that the payoff is C(T) = \Bigl( \sqrt{ S_1 \cdot S_2 } - K \Bigr)^+

At \inline t_0\inline S(t_0) = S_0

At \inline t_1,

S(t_1) = S_1 = S_0 \exp{ \Bigl\{ (r - {1\over 2} \sigma^2 )t_1 + \sigma \sqrt{t_1} x_1 \Bigr\} }

where \inline x_1 \sim {\mathbb N}(0,1). At \inline t_2,

S(t_2) = S_2 = S_1 \exp{ \Bigl\{ (r - {1\over 2} \sigma^2 )(t_2 - t_1) + \sigma \sqrt{t_2-t_1} x_2 \Bigr\} }

where \inline x_2 \sim {\mathbb N}(0,1) also, and importantly \inline x_2 is uncorrelated with \inline x_1 due to the independent increments of the Brownian motion

Now the reason we couldn’t make a closed form solution for the arithmetic asian was that \inline S_1 + S_2 is the sum of two lognormal distributions, which itself is NOT lognormally distributed. However, as discussed in my post on distributions, the product \inline S_1\cdot S_2 of two lognormal distributions IS lognormal, so valuing an asian that depends on the product of these two is similar to pricing a vanilla, with a slight change to the parameters that we need

\begin{matrix} \sqrt{ S_1 \cdot S_2 } & = & S_0 \exp \Bigl\{ {1\over 2}(r - {1\over 2} \sigma^2 )(t_1 + t_2) + \sigma ( \sqrt{t_1} x_1 + {1\over 2} \sqrt{t_2 - t_1} x_2 ) \Bigr\} \\ & = & S_0 \exp \Bigl\{ {1\over 2}(r - {1\over 2} \sigma^2 )(t_1 + t_2) + {1 \over 2} \sigma \sqrt{ 3 t_1 + t_2 } x_3 \Bigr\}    \end{matrix}

where \inline x_3 is another normal variable (we’ve used the result about the sum of two normally distributed variables here)

If we re-write this as

\begin{matrix} \sqrt{ S_1 \cdot S_2 } & = & S_0 \exp \Big\{ {1 \over 2} r ( t_1 + t_2 ) \Bigr\} \exp \Big\{ -{1 \over 2} \sigma^2 (t_1 + t_2) + {1 \over 2} \sigma \sqrt{ 3 t_1 + t_2 } x_3 \Big\} \\ & = & \sqrt{ F_1 \cdot F_2 } \exp \Big\{ {1 \over 2} \sigma^2 ( \tilde{t} - \tilde{\tilde{t}} ) \Big\} \exp \Big\{ -{1 \over 2} \sigma^2 \tilde{t} + \sigma \tilde{t} x_3 \Big\} \end{matrix}

where F_1 and F_2 are the forwards at the respective times and \tilde{t} and \tilde{\tilde{t}} are defined below. This is just the same as the vanilla pricing problem solved here. So, we can use a vanilla pricer to price a geometric asian with two averaging dates, but we need to enter transformed parameters

\begin{matrix} \tilde{t} & = & {1 \over 2} \sqrt{ 3t_1 + t_3 } \\ \tilde{\tilde{t}} & = & {1 \over 2} (t_1 + t_2) \\ F & \to & \sqrt{ F_1 \cdot F_2 } \cdot e^{{1 \over 2} \sigma^2 (\tilde{t} - \tilde{\tilde{t}}) } \\ \sigma^2 t & \to & \sigma^2 \tilde{\tilde{t}} \end{matrix}

In fact this result is quite general, we can price a geometric asian with any number of averaging dates, using the general transformations below (have a go at demonstrating this following the logic above)

 \begin{matrix} \tilde{t} & = & {1 \over n} \sqrt{  \sum_{i , j=0}^n {\rm min}(t_i, t_j) } \\ \tilde{\tilde{t}} & = & {1 \over n} \sum_{i =1}^n t_i \\ F & \to & \Big( \prod_{i=1}^n F_i \Big)^{1 \over n} \cdot e^{{1 \over 2} \sigma^2 (\tilde{t} - \tilde{\tilde{t}}) } \\ \sigma^2 t & \to & \sigma^2 \tilde{\tilde{t}} \end{matrix}

NB. I’m trialling some new systems of writing equations in wordpress as my old solution won’t update equations – please let me know if you have opinions! This version was achieved by installing the “WP QuickLatex” plugin and using ... tags

Credit Default Swaps Pt II – Credit Spreads

As well as governments, companies also need to borrow money and one way of achieving this is to issue bonds. Like government bonds, these pay a fixed rate of interest each year. Of course, this rate of interest will typically be quite a bit higher than government bonds in order to justify the increased credit risk associated with them. The difference between the two rates of interest is called the credit spread.

As we did before, today I’m just going to look at pricing a ZCB, but coupon-bearing bonds can be constructed as a product of these so it’s a relatively straight-forward extension.

Consider a ZCB issued by a risky company. It’s going to pay £1 at expiry, and nothing in-between. However, if the company defaults in the mean-time we get nothing. In fact, we’ll extend this a bit – instead of getting nothing we say that we will receive a fraction R of £1, since the company’s assets will be run down and some of their liabilities will get paid. This also allows us to distinguish between different classes of debt based on their seniority (we’re getting dangerously close to corporate finance here so let’s steer back to the maths).

In order to protect against the risk of loss, we might enter a Credit Default Swap (CDS) with a third party. In exchange for us making regular payments of £K/year to them (here we’re assuming that the payment accrues continuously for mathematical ease), they will pay us if and when the counterparty defaults. There are a few varieties of CDS – they might take the defaulted bond from us and reimburse what we originally paid for it, or they might just pay out an amount equal to our loss. We’ll assume the second variety here, so in the event of default they pay us £(1-R), since this is what we lose (up to some discount factors, but we’re keeping it simple here!). Also, if default occurs we stop making the regular payments, since we no longer need protection, so this product is essentially an insurance policy.

A cartoon of the payment schedule for a Credit Default Swap. We make continuous payments to the insurer of K per unit time, and in return we recieve a one-off payment of (1-R) should the counterparty default, in which case the contract finishes early and no more payments happen

A cartoon of the payment schedule for a Credit Default Swap. We make continuous payments to the insurer of K per unit time, and in return we receive a one-off payment of (1-R) should the counterparty default, in which case the contract finishes early and no more payments happen

We value this product in the normal way. The value of the fixed leg is just the discounted cash flow of each payment multiplied by the probability that the counterparty hasn’t defaulted (because we stop paying in that case) – and remember that we said the payments are accruing continuously, so we use an integral instead of a sum:

{\rm fixed} = -K \int_0^T \delta(0,t) S(t) dt

and the value of the insurance leg is the value of the payment times the chance that it defaults in any given interval

{\rm floating} = -(1-R) \int_0^T \delta(0,t) p(t) dt

where in keeping with our notation from before, \inline p(t) is the probability of default at a given time and \inline S(t) is the chance that the counterparty hasn’t defaulted by a given time.

The contract has a fair value if these two legs are of equal value, which happens when

0 = \int \delta(0,t) \Bigl(-(1-R) {dS \over dt} - K\cdot S(t) \Bigr) dt

At this point we refer to our exponential default model from Part I of this post.

In the exponential default model described in the previous post, we postulated a probability of default \inline p(t) = \lambda e^{-\lambda t} which led to Survival Function \inline S(t) = e^{-\lambda t}. Substituting these in to the equation above we have

0 = \int \delta(0,t) e^{-\lambda t} \Bigl((1-R) \lambda - K \Bigr) dt

which is only zero when the integrand is also zero, so

K = (1-R)\lambda

This is called the credit triangle, and it tells us the price of insuring a risky bond against default if we have it’s hazard rate. If the expected lifetime of the firm increases (ie. \inline \lambda decreases) then the spreads will fall, as insurance is less likely to be required. If the expected recovery after default increases (\inline R increases) then spreads also fall, as the payout will be smaller if it is required.

Although a little care is needed when moving from ZCBs to coupon-bearing bonds, it can be seen that the payments K (normally called the spreads) paid to insure the bond against default should be essentially the difference in interest payments between government (‘risk-free’) bonds and the risky bonds we are considering.

The default probabilities we have used here can be calibrated from market observed values for spreads K between government and corporate bonds. This resembles the process used to calculate volatility – the observable quantity on the market is actually the price, not the volatility/hazard rate, and we back this out from market prices using simple products and then use these parameters to price more exotic products to give consistent prices. This is the market’s view of the default probabilities in the risk-neutral measure, which means that they may not actually bear any resemblance to real world default probabilities (eg. historical default probabilities) and in fact are usually larger and often several times larger. See for example this paper for more on the subject.

Of course, throughout this analysis we have ignored the risk of the insurer themselves defaulting – indeed, credit derivatives rapidly become a nightmare when considered in general, and the Lehmann Brothers default was largely down to some fairly heroic approximations that were made in pricing credit products that ultimately didn’t hold up in extreme environments. I’ll explore some of this behaviour soon in a post on correlated defaults and the gaussian copula model.

Credit Default Swaps Pt I: A Default Model for Firms

In today’s post I’m going to discuss a simple model for default of a firm, and in Part II I’ll discuss the price of insuring against losses caused by the default. As usual, the model I discuss today will be a vast over-simplification of reality, but it will serve as a building block for development. Indeed, there are many people in the credit derivatives industry who take these things to a much higher level of complexity.

Modelling the default of a firm is an interesting challenge. Some models are based around ratings agencies, giving firms a certain grade and treating transitions between grades as a Markov chain (similar to a problem I discussed before). I’m going to start with a simpler exponential default model. This says that the event of a given firm defaulting on all of its liabilities is a random event obeying a Poisson process. That is, it is characterised by a single parameter \inline \lambda which gives the likelihood of default in a given time period. We make a simplifying assumption that this is independent of all previous time periods, so default CAN’T be foreseen (this may be a weakness of the model, but perhaps not… discuss!).

Also, the firm can’t default more than once, so the process stops after default. A generalisation of the model will treat \inline \lambda as a function of time \inline \lambda(t) and even potentially a stochastic variable, but we won’t think about that for now.

Mathematically, the probability of default time tau occurring in the small window
\inline [t,t+dt] is

\lim_{dt \to 0} p( \tau < t + dt\ |\ \tau > t ) = \lambda dt

If we start at \inline t=0 with the firm not yet having defaulted, this tells us that

p ( \tau < dt\ |\ \tau > 0 ) = \lambda dt

and in a second and third narrow window, using the independence of separate time windows in the Poisson process and taking a physicist’s view on the meaning of \inline 2dt and similar terms,

\begin{matrix} p ( dt < \tau < 2dt ) & = & p ( \tau < 2dt\ |\ \tau > dt )\cdot p(\tau > dt ) \\ &=& \lambda dt \cdot (1- \lambda dt) \end{matrix}\begin{matrix} p (\ 2dt < \tau < 3dt\ ) &=& \lambda dt \cdot \bigl(1 - \lambda dt \cdot(1 - \lambda dt) - \lambda dt \bigr)\\ &=& \lambda dt \cdot (1 - 2\lambda dt - \lambda^2 dt^2)\\ &=& \lambda dt \cdot (1 - \lambda dt)^2\end{matrix}

and in general

p (\ t < \tau < t+dt\ ) = (1- \lambda dt)^n\cdot \lambda dtwhere \inline n = {t \over dt}} and we must take the limit that \inline dt \to 0, which we can do by making use of the identity

\lim_{a\to \infty} (1 + {x \over a})^a = e^x

we see that

\begin{matrix} \lim_{dt \to 0}p (\ t < \tau < t+dt\ ) & = & \lambda e^{-\lambda t}\\ & = & p(\tau = t) \end{matrix}

and its cumulative density function is

{\rm F}(t) = \int_0^t p(\tau = u) du = 1 - e^{-\lambda t}which gives the probability that default has happened before time \inline t (the survival function \inline {\rm S}(t) = 1 - {\rm F}(t) gives the reverse – the chance the firm is still alive at time \inline t) *another derivation of \inline p(\tau=t) is given at the bottom

A few comments on this result. Firstly, note that the form of \inline p(\tau=t) is the Poisson distribution for \inline n=1, which makes a lot of sense since this was what we started with! It implies a mean survival time of \inline \lambda^{-1}, which gives us some intuition about the physical meaning of lambda. The CDF (and consequently the survival function) are just exponential decays, I’ve plotted them below.

The survival/default probabilities for a firm in the exponential default model with various values of  decay parameter

The survival/default probabilities for a firm in the exponential default model with various values of decay parameter

Having characterised the default probability of the firm, in Part II I will think about how it affects the price of products that they issue.


*Another derivation of \inline p(\tau=t) is as follows:

\lambda dt = p ( \tau < t+dt\ |\ \tau > t ) = {p(t < \tau < t+dt)\over p(\tau > t)}

the first of these terms is approximately \inline p(\tau=t)dt, while the second is simply the survival function \inline S(t), which by definition obeys

p(\tau=t) = -{\partial S \over \partial t}

combining this we have

\lambda = -{1\over S}\cdot {\partial S \over \partial t}

and integrating gives

-\lambda t = \ln {S(t) \over S(0)}from which we get the result above, that

S(t) = e^{-\lambda t}

Fitting the initial discount curve in a stochastic rates model

I’ve introduced the Vasicek stochastic rates model in an earlier post, and here I’m going to introduce a development of it called the Hull-White (or sometimes Hull-White extended Vasicek) model.

The rates are modelled by a mean-reverting stochastic process

dr = \bigl( \theta(t) - a r(t) \bigr) dt + \sigma dW_t

which is similar to the Vasicek model, except that the \inline \theta(t) term is now allowed to vary with time (in general \inline a and \inline \sigma are too, but I’ll ignore those issues for today).

The freedom to set theta as a deterministic function of time allows us to calibrate the model to match the initial discount curve, which means at least initially our model will give the right price for things like FRAs. Calibrating models to match market prices is one of the main things quants do. Of course, the market doesn’t really obey our model. this means that, in time, the market prices and the prices predicted by our model will drift apart, and the model will need to be re-calibrated. But the better a model captures the various risk factors in the market, the less often this procedure will be needed.

Using the trick

d \bigl( e^{at} r(t)\bigr) = e^{at} \bigl(dr + ar(t)dt \bigr) to re-express the equation and integrating gives

r(t) = r_0 e^{-at} + e^{-at}\int^t_0 \theta(u) e^{au} du + \sigma e^{-at} \int^t_0 e^{au} dW_u

where \inline r_0 is the rate at \inline t=0. The observable quantities from the discount curve are the initial discount factors (or equivalently the initial forward rates) \inline P(0,t), where

P(0,t) = {\mathbb E}\bigl[ e^{-\int^t_0 r(u) du} \bigr]

The rate \inline r_0 is normally distributed, so the integral \inline \int^t_0 r(u) du must be too. This is because an integral is essentially a sum, and a sum of normal distributions is also normally distributed. Applying the Ito isometry as discussed before, the expectation of this variable will come wholly from the deterministic terms and the variance will come entirly from the stochastic terms, giving

{\mathbb E} \bigl[ \int^t_0 r(u) du \bigr] = r_0 B(0,t) + \int^t_0 e^{-au} (\int^u_0 \theta(s) e^{as} ds ) du{\mathbb V}\bigl[ \int^t_0 r(u) du \bigr] = \sigma^2 \int^t_0 B(u,t)^2 du

where throughout

B(t,t') = {1 \over a}\bigl( 1 - e^{-a(t'-t)}\bigr)and since

{\mathbb E} \bigl[ e^x | x \sim {\mathbb N}(\mu,\sigma^2) \bigr] = e^{\mu + {1\over 2}\sigma^2}we have

P(0,t) = \exp \Bigl\{ -r_0 B(0,t) - \int^t_0 e^{-au} \bigl( \int_0^u \theta(s) e^{as} ds + {1 \over 2} \sigma^2 B(u,t)^2 \bigr) du \Bigr\}

Two differentiations of this expression give

-{\partial \over \partial t} \ln P(0,t) = r_0 e^{-at} + e^{-at} \int^t_0 \theta(u) e^{au} du + {1\over 2} \sigma^2 B(0,t)^2-{\partial^2 \over \partial t^2} \ln P(0,t) = -ar_0 e^{-at} + \theta(t) - ae^{-at} \int^t_0 \theta(u) e^{au} du + \sigma^2 e^{-at} B(0,t)

and combining these equations gives an expression for \inline \theta(t) that exactly fits the initial discount curve for the given currency

\theta(t) = -{\partial^2 \over \partial t^2} \ln P(0,t) - a{\partial \over \partial t} \ln P(0,t) + {\sigma^2\over 2a}( 1 - e^{-2at} )and since \inline -{\partial \over \partial t} \ln P(0,t) = f(0,t) is simply the initial market observed forward rate to each time horizon coming from the discount curve, this can be compactly expressed as

\theta(t) = {\partial \over \partial t} f(0,t) + a f(0,t) + {\sigma^2 \over 2a} ( 1 - e^{-2at} )

Today we’ve seen how a simple extension to the ‘basic’ Vasicek model allows us to match the initial discount curve seen in the market. Allowing the volatility parameter to vary will allow us to match market prices of other products such as swaptions (an option to enter into a swap), which I’ll discuss another time. But we’re gradually building up a suite of simple models that we can combine later to model much more complicated environments.

Interview Questions VI: The Drunkard’s Walk

A drunk man is stumbling home at night after closing time. Every step he takes moves him either 1 metre closer or 1 metre further away from his destination, with an equal chance of going in either direction (!). His home is 70 metres down the road, but unfortunately, there is a cliff 30 metres behind him at the other end of the street. What are the chances that he makes it to his house BEFORE tumbling over the edge of the cliff?

This is a fun question and quite heavy on conditional probability. We are trying to find the probability that the drunkard has moved 70 metres forward BEFORE he has ever moved 30 metres backward, or visa versa. There are several ways of attempting this, including some pretty neat martingale maths, but I’m going to attempt it here in the language of matrices and markov chains.

Essentially, there are 100 states that the drunkard can be in, from right beside the cliff all the way down the road to right beside his front door. Let’s label these from 1 to 100, in terms of the number of metres away from the cliff, so that he starts in state 30. At each step, he transitions from his current state to either one state higher or one state lower with probability 50%, and the process continues until he reaches either the cliff or the door, at which point the process will cease in either a good or a very bad night’s rest. We call these two states, 0 and 100, ‘absorbers’, because the process stops at this point and no transitions to new states can happen. A markov diagram that illustrates this process can be drawn like this:

A markov chain displaying the transition probabilities for each state in the drunkard's walk

A markov chain displaying the transition probabilities for each state in the drunkard’s walk

We can characterise each step in the process by a transition matrix acting on a state vector. The drunkard initially has a state of 30 metres, so his state vector is a long string of zeroes with a single 1 at the 30th position:

S_0 = \begin{pmatrix} 0 \\ \vdots \\ 1\\ \vdots\\ 0 \end{pmatrix}

This vector is probabalistic – a 1 indicates with 100% certainty that the drunkard is in the 30th state. However, with subsequent moves this probability density will be spread over the nearby states as his position’s probability density diffuses into other states. The transition matrix multiplies the drunkard’s state vector to give his state vector after another step:

P = \begin{pmatrix} 1 & 0.5 & 0 & \cdots & 0 & 0\\ 0 & 0 & 0.5 & \cdots & 0 & 0\\ 0 & 0.5 & 0 & \cdots & 0 & 0\\ 0 & 0 & 0.5 & & 0 & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots\\ 0 & 0 & 0 & \cdots & 0.5 & 1 \end{pmatrix}; \quad S_{i+1} = P \cdot S_i

So, after one step the drunkard’s state vector will have a 0.5 in the 29th and the 31st position and zeroes elsewhere, saying that he will be in either of these states with probability 0.5, and certainly nowhere else. Note that the 1’s at the top and bottom of the transition matrix will absorb and probability that arrives at that state for the rest of the process.

To keep things simple, let’s consider a much smaller problem with only six states, where state 1 is ‘down the cliff’ and state 6 is ‘home'; and we’ll come back to the larger problem at the end. We want to calculate the limit of the drunkard’s state as the transition matrix is applied a large number of times, ie. to calculate

S_n = P^n \cdot S_0

An efficient way to calculate powers of a large matrix is to first diagonalise it. We have \inline P = U A\ U^{-1}, where \inline A is a diagonal matrix whose diagonal elements are the eigenvalues of \inline P, and \inline U is the matrix whose columns are the eigenvectors of \inline P. Note that, as \inline P is not symmetric, \inline U^{-1} will have row vectors that the the LEFT eigenvectors of \inline P, which in general will be different to the right eigenvectors. It is easy to see how to raise \inline P to the power n

P^n = (U A\ U^{-1})^n = U A^n\ U^{-1}

since all of the \inline Us and \inline U^{-1}s in the middle cancel. Since \inline A is diagonal, to raise it to the power of n we simlpy raise each element (which are just the eigenvalues of \inline P) to the power of n. To calculate the eigenvalues of P we solve the characteristic equation

|P - \lambda_a I| = 0


P = \begin{pmatrix} 1 & 0.5 & 0 & 0 & 0 & 0\\ 0 & 0 & 0.5 & 0 & 0 & 0\\ 0 & 0.5 & 0 & 0.5 & 0 & 0\\ 0 & 0 & 0.5 & 0 & 0.5 & 0 \\ 0 & 0 & 0 & 0.5 & 0 & 0\\ 0 & 0 & 0 & 0 & 0.5 & 1 \end{pmatrix}

This gives six eigenvalues, two of which are one (these will turn out to correspond to the two absorbing states) and the remainder are strictly \inline 0\leq \lambda_a < 1. Consequently, when raised to the power n in the diagonal matrix above, all of the terms will disappear except for the first and the last eigenvalue which are 1 as n becomes large.

Calculating the eigenvectors is time consuming and we’d like to avoid it if possible. Luckily, in the limit that n gets large, we’ve seen that most of the eigenvalues raised to the power n will go to zero which will reduce significantly the amount we need to calculate. We have

P^n\cdot S = \bigl( U A^n \ U^{-1}\bigr)\cdot S

and in the limit that n gets large, \inline U \cdot A^n is just a matrix of zeroes with a one in the upper left and lower right entry. At first sight it looks as though calculating \inline U^{-1} will be required, which is itself an eigenvector problem, but in fact we only have to calculate a single eigenvector – the first (or equivalently the last), which will give the probability of evolving from an initial state S to a final state 0 (or 100).

\inline U^{-1} is the matrix of left eigenvectors of \inline P, each of which satisfies

x_a \cdot P = \lambda_a x_a

and we are looking for the eigenvectors corresponding to eigenvalue of 1, so we need to solve the matrix equation

x \cdot \begin{pmatrix} 0 & 0.5 & 0 & 0 & 0 & 0\\ 0 & -1 & 0.5 & 0 & 0 & 0\\ 0 & 0.5 & -1 & 0.5 & 0 & 0\\ 0 & 0 & 0.5 & -1 & 0.5 & 0 \\ 0 & 0 & 0 & 0.5 & -1 & 0\\ 0 & 0 & 0 & 0 & 0.5 & 0 \end{pmatrix} = 0

We know that if he starts in the first state (ie. over the cliff) he must finish there (trivially, after 0 steps) with probability 100%, so that \inline x_1 = 1 and \inline x_6 = 0. The solution to this is

x_n = {6 - n \over 5}

which is plotted here for each initial state

The drunkard's final state probabilities

The probability of the drunkard finishing in either final state given his initial state. Note that this sums to one in all cases as eventually he must finish in one or the other state.

which says that the probability of ending up in state 1 (down the cliff!) falls linearly with starting distance from the cliff. We can scale up this final matrix equation to the original 100 by 100 state space, and find that for someone starting in state 30, there is a 70/100 chance of ending up down the cliff, and consequently only a 30% chance of getting home!

This problem is basically the same as the Gambler’s ruin problem, where a gambler in a casino stakes $1 on each toss of a coin and leaves after reaching a goal of $N or when broke. There are some very neat methods for solving them via martingale methods that don’t use the mechanics above that I’ll look at in a future post.

Asian Options II: Monte Carlo

In a recent post, I introduced Asian options, an exotic option which pay the average of the underlying over a sequence of fixing dates.

The payoff of an Asian call is

C(T) = \Big({1 \over N}\sum^N_{i=0} S_{t_i} - K \Big)^+

and to price the option at an earlier time via risk-neutral valuation requires us to solve the integral

C(t) = P_{t,T}\ \big( \iint\cdots\int \big) \Big({1 \over N}\sum^N_{i=0} S_{t_i} - K \Big)^+ \phi(S_{t_1},S_{t_1},\cdots,S_{t_N}) dS_{t_1}dS_{t_2}\cdots dS_{t_N}

A simple way to do this is using Monte Carlo – we simulate a spot path drawn from the distribution \inline \phi(S_{t_1},S_{t_1},\cdots,S_{t_N}) and evaluate the payoff at expiry of that path, then average over a large number of paths to get an estimate of C(t).

It’s particularly easy to do this in Black-Scholes, all we have to do is simulate N gaussian variables, and evolve the underlying according to a geometric brownian motion.

I’ve updated the C++ and Excel Monte Carlo pricer on the blog to be able to price asian puts and calls by Monte Carlo – have a go and see how they behave relative to vanilla options. One subtlety is that we can no longer input a single expiry date, we now need an array of dates as our function input. If you have a look in the code, you’ll see that I’ve adjusted the function optionPricer to take a datatype MyArray, which is the default array used by XLW (it won’t be happy if you tell it to take std::vector). This array can be entered as a row or column of excel cells, and should be the dates, expressed as years from the present date, to average over.

The Dupire Local Vol Model

In this post I’m going to look at a further generalisation of the Black-Scholes model, which will allow us to re-price any arbitrary market-observed volatility surface, including those featuring a volatility smile.

I’ve previously looked at how we can produce different at-the-money vols at different times by using a piecewise constant volatility \inline \sigma(t), but we were still unable to produce smiley vol surfaces which are often observed in the market. We can go further by allowing vol to depend on both t and the value of the underlying S, so that the full BS model dynamics are given by the following SDE

dS_t = S_t\cdot \Big(\ r(t) dt\ +\ \sigma(t,S_t) dW_t\ \Big)

Throughout this post we will make constant use of the probability distribution of the underlying implied by this SDE at future times, which I will denote \inline \phi(t,S_t). It can be shown [and I will show in a later post!] that the evolution of this distribution obeys the Kolmogorov Forward Equation (sometimes called the Fokker-Planck equation)

{\partial \phi(t,S_t) \over \partial t} = -{\partial \over \partial S_t}\big(rS_t\phi(t,S_t)\big) + {1\over 2}{\partial^2 \over \partial S_t^2}\big(\sigma^2(t,S_t) S_t^2 \phi(t,S_t)\big)

This looks a mess, but it essentially tells us how the probability distribution changes with time – we can see that is looks very much like a heat equation with an additional driving term due to the SDE drift.

Vanilla call option prices are given by

C(t,S_t) = P(0,t)\int_K^{\infty} \big(S_t - K\big)\phi(t, S_t) dS_t

Assuming the market provides vanilla call option prices at all times and strikes [or at least enough for us to interpolate across the region of interest], we can calculate the time derivative of the call price which is equal to

{\partial C \over \partial T} = -rC + P(0,T)\int_K^{\infty}\big( S_T - K \big)\ {\partial \phi \over \partial T}\ dS_T

and we can substitute in the value of the time derivative of the probability distribution from the Kolmogorov equation above

rC + {\partial C \over \partial T} = P(0,T)\int_K^{\infty}\big( S_T - K \big)\Big[ -{\partial \over \partial S_T}\big( rS_T\phi\big) + {1\over 2}{\partial^2 \over \partial S_T^2}\big( \sigma^2S_T^2\phi\big) \Big]dS_T

These two integrals can be solved by integration by parts with a little care

\begin{align*} -\int_K^{\infty}\big( S_T - K \big){\partial \over \partial S_T}\big( rS_T\phi\big)dS_T & = -\Big[rS_T\phi \big( S_T - K \big)\Big]^{\infty}_{K} + \int_K^{\infty}rS_T\phi\ dS_T \\ & = r\int_K^{\infty} (S_T\phi)\ dS_T \end{align*}\begin{align*} \int_K^{\infty}\big( S_T - K \big){\partial^2 \over \partial S_T^2}\big( \sigma^2 S_T^2\phi\big)dS_T & =\Big[\big( S_T - K \big){\partial \over \partial S_T}(\sigma^2 S_T^2\phi) \Big]^{\infty}_{K} - \int_K^{\infty}{\partial \over \partial S_T}(\sigma^2 S_T^2\phi)\ dS_T \\ & = -\sigma^2 K^2\phi(K,T) \end{align*}where in both cases, the boundary terms disappear at the upper limit due to the distribution \inline \phi(t,S_t) and its derivatives, which go to zero rapidly at high spot.

We already have an expression for \inline \phi(t,S_t) in terms of C and its derivatives from our survey of risk-neutral probabilities,

\phi(t,S_t) = {1 \over P(0,t)}{\partial^2 C \over \partial K^2}

and we can re-arrange the formula above for call option prices

\begin{align*} P(0,T)\int_K^{\infty} S_T\ \phi\ dS_T & = C + P(0,T)\int_K^{\infty} K\phi\ dS_T \\ & = C + K {\partial C \over \partial K} \end{align*}and substituting these expressions for \inline \phi(t,S_t) and \inline \int^{\infty}_K (S_T \phi)\ dS_T into the equation above

\begin{align*} rC + {\partial C \over \partial T} & = P(0,T)\cdot \Big[ r\int_K^{\infty} (S_T\phi)\ dS_T + \sigma^2 K^2\phi(K,T) \Big]\\ & = rC + rK {\partial C \over \partial K} + \sigma^2 K^2{\partial^2 C \over \partial K^2} \end{align*}

and remember that \inline \sigma = \sigma(t,S_t), which is our Dupire local vol. Cancelling the rC terms from each side and re-arranging gives

\sigma(T,K) = \sqrt{ {\partial C \over \partial T} + rK {\partial C \over \partial K} \over K^2{\partial^2 C \over \partial K^2}}

It’s worth taking a moment to think what this means. From the market, we will have access to call prices at many strikes and expires. If we can choose a robust interpolation method across time and strike, we will be able to calculate the derivative of price with time and with strike, and plug those into the expression above to give us a Dupire local vol at each point on the time-price plane. If we are running a Monte-Carlo engine, this is the vol that we will need to plug in to evolve the spot from a particular spot level and at a particular time, in order to recover the vanilla prices observed on the market.

A nice property of the local vol model is that it can match uniquely any observed market call price surface. However, the model has weaknesses as well – by including only one source of uncertainty (the volatility), we are making too much of a simplification. Although vanilla prices match, exotics priced using local vol models typically have prices that are much lower than prices observed on the market. The local vol model tends to put most of the vol at early times, so that longer running exotics significantly underprice.

It is important to understand that this is NOT the implied vol used when calculating vanilla vol prices. The implied vol and the local vol are related along a spot path by the expression

\Sigma^2 T = \oint_0^T\sigma^2(S_t,t)dt

(where \inline \Sigma is the implied vol) and the two are quite different. Implied vol is the square root of the average variance per unit time, while the local vol gives the amount of additional variance being added at particular positions on the S-t plane. Since we have an expression for local vol in terms of the call price surface, and there is a 1-to-1 correspondence between call prices and implied vols, we can derive an expression to calculate local vols directly from an implied vol surface. The derivation is long and tedious but trivial mathematically so I don’t present it here, the result is that the local vol is given by (rates are excluded here for simplicity)

\sigma(y,T) = \sqrt{{\partial w \over \partial T} \over \Big[ 1 - {y \over w}{\partial w \over \partial y} + {1\over 2}{\partial^2 w \over \partial y^2}+ {1 \over 4}\Big( -{1 \over 4} - {1 \over w}+ {y^2 \over w}\Big)\Big({\partial w \over \partial y}\Big)^2 \Big]}

where \inline w = \Sigma^2 T is the total implied variance to a maturity and strike and \inline y = \ln{K \over F_T} is the log of ‘moneyness’.

This is probably about as far as Black-Scholes alone will take you. Although we can reprice any vanilla surface, we’re still not pricing exotics very well – to correct this we’re going to need to consider other sources of uncertainty in our models. There are a wide variety of ways of doing this, and I’ll being to look at more advanced models in future posts!

Asian Options I

Today I’m going to introduce the first exotic option that I’ve looked at in my blog, the Asian option. There is quite a lot to be said about them so I’ll do it over a few posts. I’ll also be updating the pricers to price these options in a variety of ways.

The simplest form of Asian option has a payoff similar to a vanilla option, but instead of depending on the value of the underlying spot price at expiry, it instead depends on the average the underlying spot price over a series of dates specified in the option contract. That is, the value of the option at expiry C(T) is the payoff

C(T) = \Big( {1\over N}\sum^{N}_{i=0} S_{t_i} - K\Big)^+

where K is the strike, N is the number of averaging dates, and \inline S_t is the value of the spot that is realised at each of these dates.

First of all, what does it mean that this is an exotic option? From the market, we will be able to see the prices of vanilla options at or near each of the dates that the contract depends on. As I discussed in my post on Risk-Neutral Valuation, from these prices we can calculate the expected distribution of possible future spot prices at each date – shouldn’t we be able to use this to calculate a unique price of the Asian option consistent with the vanilla prices?

The answer, unfortunately, is no. Although we know the distribution at each time, this doesn’t tell us anything about the correlations between the price at time \inline t_1 and time \inline t_2, which will be critical for the Asian option. To illustrate this, let’s consider a simple Asian that only averages over two dates, \inline t_1 and \inline t_2, and we’ll take \inline t_2 to be the payoff date for ease, and try to calculate its price via risk-neutral valuation. The price at expiry will be

C(t_2) = \Big( {1\over 2}\big(S_{t_1} + S_{t_2}\big) - K\Big)^+

To calculate the price at earlier times, we use the martingale property of the price in the risk-neutral measure

C(0) = N(0)\cdot{\mathbb E}^{RN} [\ {C(t_2) \over N(t_2)}\ |\ {\cal F}_0\ ]

and expanding this out as an integral, we have

C(0) = P_{0,t_2} \int \int \Big( {1 \over 2} \big( S_{t_1} + S_{t_2} \big) - K \Big)^+\ P(S_{t_1},S_{t_2})\ dS_{t_1} dS_{t_2}From the market, we know both \inline P(S_{t_1}) and \inline P(S_{t_2}), which are respectively the market-implied distributions of the spot price at times \inline t_1 and \inline t_2 calculated from vanilla call prices using the expression we derived before

P(S_{t}) = {1 \over \delta_{0,t}}{\partial^2 C(t) \over \partial K^2}

 But, we don’t know the JOINT distribution \inline P(S_{t_1},S_{t_2}) which expresses the effect of the dependence of the two distributions. We know from basic statistics that

P(S_{t_2},S_{t_1}) = P(S_{t_2} | S_{t_1})\cdot P(S_{t_1})

and since the spot at the later time depends on the realised value of the spot at the earlier time \inline P(S_{t_2} | S_{t_1}) \neq P(S_{t_2}), so we don’t have enough information to solve the problem from only the market information available from vanilla options.

What is this telling us? It is saying that the market-available information isn’t enough to uniquely determine a model for underlying price movements with time. There are a variety of models that will all re-create the same vanilla prices, but will disagree on the price of anything more exotic, including our simple Asian option. However, observed vanilla prices do allow us to put some bounds on the price of the option. For example, a careful application of the triangle inequality tells us that

\Big( {1\over 2}\big(S_{t_1} + S_{t_2}\big) - K \Big)^+ \leq {1\over 2} \Big( S_{t_1} - K_1 \Big)^+ + {1\over 2} \Big( S_{t_2} - K_2 \Big)^+

for any \inline K_1 + K_2 = 2K; the expressions on the right are the payoff at expiry of call options on \inline S_{t_1} and \inline S_{t_2} respectively, and we can see the prices of these on the market, which allow an upper bound on the asian price. This means that, while we can’t uniquely price the option using the market information alone, we CAN be sure that any model consistent with market vanilla prices will produce an asian price that is lower than the limit implied by this inequality.

In order to go further, we need to make a choice of model. Our choice will determine the price that we get for our option, and changing model may well lead to a change in price – analysing this ‘model risk’ is one of a quant’s main jobs. A very simple choice such as Black Scholes [but note that BS might not be able to re-create an arbitrary vanilla option surface, for example if there is a vol smile at some time slices] will allow us to solve the integral above by numerical integration or by Monte Carlo (this option form still doesn’t have a closed form solution even in BS).

In the coming posts I’m going to develop the Monte Carlo pricer on the blog to price asian options in BS, and look at a number of simplifying assumptions that can be made which will allow us to find closed form approximations to the price of Asian options.


Finally, a brief note on the naming and usage of Asian options. Opinion seems divided – some people believe they are called Asian options because they’re neither European nor American, while some people say it’s because they were first traded in Japan. They are often traded so that the days on which the averaging is done are the business days of the last month of the option’s life. As the look up period is spread out over a period, they are less susceptible to market manipulation on any particular date, which can be important in less liquid markets, although the theoretical prices of Asian options like this are usually very similar in price to vanilla options at any of the look up dates. More exotic Asians might average over the last day of a month for a year. This average is much less volatile than the underlying over the same period, so Asian options are much cheaper than a portfolio of vanillas, but it’s easy to imagine that they might well match the risk exposure of a company very well, since companies trading in two or more currency areas will probably be happy to hedge their average exchange rate over a period of time rather than the exchange rate at any one particular time.