Interview Questions VII: Integrated Brownian Motion

 
For a standard Weiner process denoted W_t, calculate

    \[{\mathbb E} \Bigl[ \Bigl( \int^T_0 W_t dt \Bigr)^2 \Bigr]\]

This integral is the limit of the sum of W_t at each infinitessimal time slice dt from 0 to T, it is called Integrated Brownian Motion.




We see immediately that this will be a random variable with expectation zero, so the result of the squared expectation will simply be the variance of the random variable, which is what we need to calculate. Here I show two ways to solve this, first by taking the limit of the sum and second using stochastic integration by parts

Need help? Discuss this in the Interview Questions forum!

1) Limit of a sum

(1)   \begin{align*} I &= \int^T_0 W_t dt \nonumber \\ &= \lim_{n \to \infty}{\frac T n}\sum^n_{i=1} W_{\frac {Ti} n} \nonumber \end{align*}

This sum of values along the Wenier process is not independent of one-another, since only the increments are independent. However, we can re-write them in terms of sums of these independent increments

(2)   \begin{align*} W_{\frac {Ti} n} &= \sum^i_{j=1} ( W_{\frac {Tj} n} - W_{\frac {T(j-1)} n} ) \nonumber \\ &= \sum^i_{j=1} \overline{W}_{\frac {Tj} n} \nonumber \end{align*}

where \overline{W}_{\frac {T(j-1)} n} \sim {\mathbb N}(0,{\frac T n}) are the individual independent increments of the brownian motion. Substituting into our previous equation and reversing the order of the summation

(3)   \begin{align*} \lim_{n \to \infty}{\frac T n}\sum^n_{i=1} W_{\frac {Ti} n} &= \lim_{n \to \infty}{\frac T n} \sum^n_{i=1} \sum^i_{j=1} \overline{W}_{\frac {Tj} n} \nonumber \\ &= \lim_{n \to \infty}{\frac T n} \sum^n_{j=1} \sum^n_{i=j} \overline{W}_{\frac {Tj} n} \nonumber \\ &= \lim_{n \to \infty}{\frac T n} \sum^n_{j=1} (n-j+1) \overline{W}_{\frac {Tj} n} \nonumber \\ \end{align*}

which is simply a weighted sum of independent gaussians. To calculate the total variance, we sum the individual variances using the summation formula for \sum_1^N n^2

(4)   \begin{align*} {\mathbb V}[I] &= \lim_{n \to \infty} {\frac {T^2} {n^2}} \sum^n_{j=1} (n-j+1)^2 {\frac T n} \nonumber \\ &= \lim_{n \to \infty} {\frac {T^3} {n^3}} \Bigl( {\frac 1 3} n^3 + O(n^2) \Bigr) \nonumber \\ &= {\frac 1 3} T^3 \end{align*}

which is the solution.

2) Stochastic integration by parts

The stochastic version of integration by parts for potentially stochastic variables X_t, Y_t looks like this:

    \[X_T \cdot Y_T = X_0 \cdot Y_0 + \int^T_0 X_t dY_t + \int^T_0 Y_t dX_t\]

Re-arranging this gives

    \[\int^T_0 Y_t dX_t = \Bigl[ X_t \cdot Y_t \Bigr]^T_0 - \int^T_0 X_t dY_t\]

Now setting Y_t = -W_t and X_t = (T-t) we have

(5)   \begin{align*} \int^T_0 W_t dt &= \Bigl[ -W_t \cdot (T-t) \Bigr]^T_0 + \int^T_0 (T-t) dW_t \nonumber \\ &= \int^T_0 (T-t) dW_t \nonumber \end{align*}

We recognise this as a weighted sum of independent gaussian increments, which is (as expected) a gaussian variable with expectation 0 and variance that we can calculate with the Ito isometry

(6)   \begin{align*} {\mathbb E}\Bigl[ \Bigl( \int^T_0 (T-t) dW_t \Bigr)^2 \Bigr] &= {\mathbb E}\Bigl[ \int^T_0 (T-t)^2 dt \Bigr] \nonumber \\ &= \Bigl[ T^2t - Tt^2 + {\frac 1 3} t^3 \Bigr]^T_0 \nonumber \\ &= {\frac 1 3} T^3 \end{align*}

which is the solution.

Credit Valuation Adjustment (CVA)

Now that we’ve covered a basic model for the default of firms and the pricing of Credit Default Swaps, we’re ready to consider the implication of your counterparty’s credit risk on the price of a derivative contract signed with them – this is called the ‘Credit Valuation Adjustment’ or CVA, and is the amount that one should change the value of an uncollatorised credit-risk-free derivative to reflect the counterparty’s risk of default.

When we have priced derivatives under Black-Scholes assumptions, we’ve implicitly assumed that the contract runs to expiry. However, if our counterparty defaults before the end of the contract, this won’t happen (typically, in the event of default a liquidator will attempt to collect everything the firm is owed from counterparties, and use the finite resources collected to pay back creditors as much as possible).

If we hold a single derivative contract against a defaulting counterparty and it’s out-of-the-money (ie. the value is negative – we owe them money), we will still be required to hold the position. On the other hand, if we’re in-the-money, we won’t necessarily get all of our contract value back – as with CDS contracts, we assume that of the true value of the contract, we will only receive a fraction R, the ‘Recovery Rate’. If we gain nothing if the value is negative, but lose something when the value is positive, then to calculate the pricing impact we need to calculate expected value of the contract at a future point if it is positive, and ignore the price impact coming from the chance that it is negative, which we call the Expected Positive Exposure (EPE).

For a contract depending on underlier S with value C(S,t) at future time t, we can calculate this by integrating across all possible underlier values that generate a positive contract value at that time

    \[EPE(t) = \int_{C(s) \textgreater 0} C(S,t) \cdot p(S) dS\]

this will be larger for contracts whose values have a higher volatility, which are more likely to have a high positive or negative value at some point in the future.

Our loss due to a counterparty default event at time t is the discounted expectation of our loss due to a default accounting for the finite recovery value

    \[(1 - R) \cdot EPE(t) \cdot P(0,t)\]

where P(0,t) is the discount factor to time t as usual.

In order to calculate the CVA for a whole contract, we need to sum the product of the loss-given-default at t time and the probability of default at time t

    \[CVA = (1-R) \int_0^{T_f} EPE(t) \cdot P(0,t) \cdot S'(t) \; dt\]

where T_F is the time of the final cashflow due to the contract and S'(t) is the probability of default at time t as discussed in the previous post on firm default models, and if this is uncorrelated to the value of the contract, this integral is relatively straightforward to calculate. Default probabilities can be calibrated from observed CDS prices on the counterparties, most major counterparties (typically large investment banks) will have CDS contracts available on the market to use for calibration.

It’s worth mentioning that CVA is a pricing problem – we’re treating the pricing adjustment as the price of an exotic derivative and hedging it with a portfolio of CDS contracts. This means we’re calculating an actual dollar value, and when a trader quotes a derivative price to a counterparty he should also quote a ‘CVA adjustment’ to represent the counterparty’s chance of default. Internally, the trader will often have to pass this straight through to a ‘CVA Desk’, which internally manages the risk the bank faces due to the risk of default of any given counterparty – and will then reimburse the trader if she faces a loss due to such an event!

It’s worth noting that many simple products like ZCBs, forwards and FRAs have model-independent prices, so we can calculate their value independent of any assumptions about the market’s future behaviour. However, we’ve had to make some assumptions about the underlying volatility in the market to calculate EPEs, so the CVA for these products is model-dependent – we need to make some assumptions about the model driving the market and can’t enforce a price today purely by no-arbitrage considerations (although it will be informed by the prices of other model-dependent products whose prices we can see).

We’ve also discussed only the CVA adjustment from an individual contract, which is already quite computationally complex (in a future post, I’ll calculate this for some simple products in a simple model). If we have a portfolio of products in place against a specific counterparty, these should all be considered together and may lead to some offsetting exposures that reduce the CVA adjustment on a portfolio basis (although even this isn’t always the case, depending upon ‘netting agreements’ in the Credit Support Annex – CSA – that you have in place with the given counterparty…).

Further, CVA only applies to non-collaterised positions. Many positions (like futures) require the deposit of margin in an escrow account, exactly to try and cover for the risk of counterparty default. This leads instead to ‘gap risk’ (ie. that the margin might not be sufficient) but also to the related ‘Funding Valuation Adjustment’ (or FVA, essentially the cost of funding the margin you provide).

Hedging in a finite-state model (Binary Trees)

Today’s post is about hedging in a world where there are only a few outcomes. It demonstrates a lot of principles that are important in asset pricing, and it’s also a frequent topic for interview questions so almost got included as one of those!

A very basic approximation for a stock is a binary tree, as in the diagram shown below. The stock is worth 100 today, and after some time (say a year) it will be worth either 110 with probability p or 90 with probability (1-p). In this very basic model, we assume no other values are possible – the stock either does well and goes up by 10, or it does badly and goes down by 10. Furthermore, we assume that there is a risk-free yearly interest rate r at which we can deposit money in the bank and receive (1+r) times our initial deposit in a year.

The two possibilities for a stock's evolution in the binary tree model
The two possibilities for a stock’s evolution in the binary tree model

We’re going to try and price options in this model. Let’s say we’ve got a call option to buy the stock in a year’s time for 100. How much is this worth? Because of the simple state of the world, we can enumerate the two possibilities – either the stock is worth 110, in which case the call option to buy it for 100 is worth 10, or the stock is worth 90, and the option is worthless. We’ll try and simulate that payoff using a combination of cash-in-the-bank and the stock itself, and then by no-arbitrage the price must be the same today.

Our portfolio consists of \alpha stocks S and \beta cash-in-the-bank. The value of the cash will certainly be (1+r)\beta in a year, while the stock will be 110 \alpha or 90 \alpha depending on the state, and we want the sum of these two to equal the option payoff (in either the up or the down scenario), which gives us two simultaneous equations

    \[(1+r) \beta + 110 \alpha = 10\]

    \[(1+r) \beta + 90 \alpha = 0\]

so

    \[\alpha = 0.5\]

    \[\beta = {-45 \over (1+r)}\]

This says that a portfolio containing half of a stock and minus 45 pounds in the bank will exactly replicate the payoff of the option in either world state. The value of this portfolio today is just 0.5*100 – 45/(1+r), and I’ve plotted that as a function of r below.

The price of a call option on a stock in the two-state world given above
The price of a call option on a stock as a function of the risk-free rate in the two-state world given above

This gives meaningless prices if r > 0.1 (the price of the option must be between 0 and 10/(1+r) as these are the discounted values of the least/most that it can pay out at expiry). What does this mean intuitively? Well, if r > 0.1 then we have an arbitrage: 100 in the bank will *always* yield more than the stock at expiry, so we should short the stock as much as possible and put the proceeds in the bank. If r < -0.1 the situation is exactly reversed

The important point about all of this was that the option price DOESN’T depend on the relative chances of the stock increasing to 110 or falling to 90. As long as both of these are strictly greater than zero and less than one, then ANY set of probabilities will lead to the same price for the option. This is the discrete analogue of a result I presented before (here) – that the expected growth of the stock in the real world doesn’t matter to the option’s price, it is the risk-free rate that affects its price. Indeed, it is possible to derive the continuous result using a binary tree by letting the time period go to zero, I won’t attempt it here but the derivation can be found in many textbooks (eg. on wikipedia).

Things really get interesting when we try to extend the model in a slightly different way. Often banks will be interested in models that have not just a diffusive component, but also a ‘jump’ component, which gives a finite chance that a price will fall suddenly by a large amount (I’ll present one such model in the near future). Unlike a straight-forward Black-Scholes model, because these jumps happen suddenly and unexpectedly they are very difficult to hedge, and are meant to represent market crashes that can result in sudden sharp losses for banks.

In our simple tree model, this can be incorporated by moving to a three-branch model, shown below

The three possibilities for a stock's evolution in the three-branch tree model
The three possibilities for a stock’s evolution in the three-branch tree model

We have the same two branches as before, but an additional state now exists in which the stock price crashes to 50 with probability q. In this case, again trying to price an option to buy the stock for 100 gives three simultaneous equations

    \[(1+r) \beta + 110 \alpha = 10\]

    \[(1+r) \beta + 90 \alpha = 0\]

    \[(1+r) \beta + 50 \alpha = 0\]

Unlike before, we can’t find a single alpha and beta that will replicate the payoff in all three world states, as we have three equations and only two unknowns. Consequently, the best we will be able to to is sub-replicate or super-replicate the payoff. That is, find portfolios that either always pay equal to or greater than the option, or always pay less than or equal to the option. These will give bounds on the lowest and highest arbitrage-free option prices, but that’s as far as arbitrage-free prices will take us (in fact ANY of the prices between this limit is arbitrage-free) – choosing an actual price will require knowing the probabilities of p and q and deciding on our personal risk-premium.

In the simple three-state model, lower and upper bounds can be calculated by ignoring the second and third equation respectively above, and they give the limits shown in the graph below. Once again, as r \to 0.1 they converge, but note that a case where r < -0.1 is now possible, as the ‘crash’ option means that the stock can still pay out less than the bank account

The limiting values given by the no-arbitrage requirement on the price of a call option as a function of the risk-free rate in the three-branch tree model given above
The limiting values given by the no-arbitrage requirement on the price of a call option as a function of the risk-free rate in the three-branch tree model given above

This is what’s called an incomplete market. The securities that exist in the market aren’t sufficient to hedge us against all future possible states of the universe. In this case, we can’t uniquely determine prices by risk-neutral pricing – sice we can’t hedge out all of the risk, risk preferences of investors will play a part in determining the market-clearing prices of securities. Most models that are used in the industry are incomplete in this way – I’ll be looking at some examples of this soon which will include models involving jump processes and stochastic volatility.

Improving Monte Carlo: Control Variates

I’ve already discussed quite a lot about Monte Carlo in quantitative finance. MC can be used to value products for which an analytical price is not available in a given model, which includes most exotic derivatives in most models. However, two big problems are the time that it takes to run and the ‘Monte Carlo error’ in results.

One technique for improving MC is to use a ‘control variate’. The idea is to find a product whose price is strongly correlated to the product that we’re trying to price, but which is more easy to calculate or which we already know the price of. When we simulate a path in MC, it will almost surely give either an under-estimate or an over-estimate of the true price, but we don’t know which, and averaging all of these errors is what leads to the Monte Carlo error in the final result. The insight in the control variate technique is to use the knowledge given to us by the control variate to reduce this error. If the two prices are strongly correlated and a path produces an over-estimate of product price, it most likely also produces an over-estimate of the control variate and visa versa, which will allow us to improve our estimate of the product we’re trying to price.

The textbook example is the Asian Option. Although the arithmetic version of the asian option discussed in previous posts has no analytic expression in BS, a similar Geometric asian option does have an analytic price. So, for a given set of model parameters, we can calculate the price of the option. As a reminder, the payoff of an arithmetic asian option at expiry is

    \[C_{\rm arit}(T) = \Bigl({1\over N}\sum_{i=0}^{N-1} S(t_i) - K \Bigr)^+\]

and the payoff of the related geometric averaging asian is

    \[C_{\rm geo}(T) = \Bigl( \bigl(\prod_{i=0}^{N-1} S(t_i)\bigr)^{1\over N} - K \Bigr)^+\]

Denoting the price of the arithmetic option as X and the geometric option as Y, the traditional monte carlo approach is to generate N paths, and for each one calculate X_i (the realisation of the payoff along the path) and take the average over all paths, so that

    \[{\mathbb E}[X] = {1 \over N} \sum_{i=0}^{N-1} X_i\]

which will get closer to the true price as N \to \infty.

Using Y as a control variate, we instead calculate

    \[{\mathbb E}[X] = {1\over N} \sum_{i=0}^{N-1}\bigl( X_i - \lambda( Y_i - {\mathbb E}[Y] ) \bigr)\]

where {\mathbb E}[Y] is the price of the geometric option known from the analytical expression, and \lambda is a constant (in this case we will set it to 1).

What do we gain from this? Well, consider the variance of X_i - \lambda (Y_i - {\mathbb E } [Y])

    \[{\rm Var}\bigl( X_i - ( Y_i - {\mathbb E}[Y] ) \bigr) = {\rm Var}( X_i ) + \lambda^2 {\rm Var} ( Y_i ) - 2 \lambda {\rm Cov}( X_i Y_i )\]

(since {\mathbb E } [Y] is known, so has zero variance) which is minimal for \lambda = \sqrt{ {\rm Var}(X_i) \over {\rm Var}(Y_i) }\cdot \rho(X_i,Y_i) in which case

    \[{ {\rm Var}\bigl( X_i - ( Y_i - {\rm E}[Y] ) \bigr) \over {\rm Var}( X_i )} = 1 - \rho(X_i,Y_i)^2\]

that is, if the two prices are strongly correlated, the variance of the price calculated using the control variate will be a significantly smaller. I’ve plotted a sketch of the prices of the two types of average for 100 paths – the correlation is about 99.98%. Consequently, we expect to see a reduction in variance of about 2000 times for a given number of paths (although we now have to do a little more work on each path, as we need to calculate the geometric average as well as the arithmetic average of spots). This is roughly 45 times smaller standard error on pricing – well over an extra decimal place, which isn’t bad – and this is certainly much easier than running 2000 times as many paths to achieve the same result.

The relationship between payout for a geometric and an arithmetic asian option, which here demonstrate a 99.98% sample correlation
The relationship between payout for a geometric and an arithmetic asian option, which here demonstrate a 99.98% sample correlation. Parameters used were: r: 3%; vol: 10%; K: 105; S(0): 100; averaging dates: monthly intervals for a year

Asian options III: the Geometric Asian

I’ve introduced the Asian Option before, it is similar to a vanilla option but its payoff is based on the average price over a period of time, rather than solely the price at expiry. We saw that it was exotic – it is not possible to give it a unique price using only the information available from the market.

Today I’m going to introduce another type of option – the geometric asian option. This option is similar, but its payoff is now based on the geometric average, rather than the arithmetic average, of the spot over the averaging dates

    \[C(T) = \Bigl( \bigl( \prod_{i=1}^{N} S_i \bigr)^{1 \over N} - K \Bigr)^+\]

This option is exotic, just like the regular (arithmetic-average) asian option. At first sight it seems even more complicated; and what’s more, it is almost never encountered in practice, so why bother with it? Well, it’s a very useful as an exercise because in many models where the arithmetic asian’s price has no closed form, the geometric asian happens to have one! Further, since an arithmetic average is ALWAYS higher than a geometric average for a set of numbers, the price of the geometric asian will give us a strict lower bound on the price of the arithmetic asian.

Considering the Black-Scholes model in its simplest form for the moment (although the pricing formula can be extended to more general models), let’s consider what the spot will look like at each of the averaging times, and as we did in the earlier post, considering a simple geometric asian averaging over only two times t_1 and t_2 so that the payoff is C(T) = \Bigl( \sqrt{ S_1 \cdot S_2 } - K \Bigr)^+

At t_0S(t_0) = S_0. At t_1,

    \[S(t_1) = S_1 = S_0 \exp{ \Bigl\{ (r - {1\over 2} \sigma^2 )t_1 + \sigma \sqrt{t_1} x_1 \Bigr\} }\]

where x_1 \sim {\mathbb N}(0,1). At t_2,

    \[S(t_2) = S_2 = S_1 \exp{ \Bigl\{ (r - {1\over 2} \sigma^2 )(t_2 - t_1) + \sigma \sqrt{t_2-t_1} x_2 \Bigr\} }\]

where x_2 \sim {\mathbb N}(0,1) also, and importantly x_2 is uncorrelated with x_1 due to the independent increments of the Brownian motion.

Now the reason we couldn’t make a closed form solution for the arithmetic asian was that S_1 + S_2 is the sum of two lognormal distributions, which itself is NOT lognormally distributed. However, as discussed in my post on distributions, the product S_1 \cdot S_2 of two lognormal distributions IS lognormal, so valuing an asian that depends on the product of these two is similar to pricing a vanilla, with a slight change to the parameters that we need

    \[\begin{matrix} \sqrt{ S_1 \cdot S_2 } & = & S_0 \exp \Bigl\{ {1\over 2}(r - {1\over 2} \sigma^2 )(t_1 + t_2) + \sigma ( \sqrt{t_1} x_1 + {1\over 2} \sqrt{t_2 - t_1} x_2 ) \Bigr\} \\ & = & S_0 \exp \Bigl\{ {1\over 2}(r - {1\over 2} \sigma^2 )(t_1 + t_2) + {1 \over 2} \sigma \sqrt{ 3 t_1 + t_2 } x_3 \Bigr\} \end{matrix}\]

where x_3 is another normal variable (we’ve used the result about the sum of two normally distributed variables here).

If we re-write this as

    \[\begin{matrix} \sqrt{ S_1 \cdot S_2 } & = & S_0 \exp \Big\{ {1 \over 2} r ( t_1 + t_2 ) \Bigr\} \exp \Big\{ -{1 \over 2} \sigma^2 (t_1 + t_2) + {1 \over 2} \sigma \sqrt{ 3 t_1 + t_2 } x_3 \Big\} \\ & = & \sqrt{ F_1 \cdot F_2 } \exp \Big\{ {1 \over 2} \sigma^2 ( \tilde{t} - \tilde{\tilde{t}} ) \Big\} \exp \Big\{ -{1 \over 2} \sigma^2 \tilde{t} + \sigma \tilde{t} x_3 \Big\} \end{matrix}\]

where F_1 and F_2 are the forwards at the respective times and \tilde{t} and \tilde{\tilde{t}} are defined below. This is just the same as the vanilla pricing problem solved here. So, we can use a vanilla pricer to price a geometric asian with two averaging dates, but we need to enter transformed parameters

\begin{matrix} \tilde{t} & = & {1 \over 2} \sqrt{ 3t_1 + t_3 } \\ \tilde{\tilde{t}} & = & {1 \over 2} (t_1 + t_2) \\ F & \to & \sqrt{ F_1 \cdot F_2 } \cdot e^{{1 \over 2} \sigma^2 (\tilde{t} - \tilde{\tilde{t}}) } \\ \sigma^2 t & \to & \sigma^2 \tilde{\tilde{t}} \end{matrix}

In fact this result is quite general, we can price a geometric asian with any number of averaging dates, using the general transformations below (have a go at demonstrating this following the logic above)

 \begin{matrix} \tilde{t} & = & {1 \over n} \sqrt{  \sum_{i , j=0}^n {\rm min}(t_i, t_j) } \\ \tilde{\tilde{t}} & = & {1 \over n} \sum_{i =1}^n t_i \\ F & \to & \Big( \prod_{i=1}^n F_i \Big)^{1 \over n} \cdot e^{{1 \over 2} \sigma^2 (\tilde{t} - \tilde{\tilde{t}}) } \\ \sigma^2 t & \to & \sigma^2 \tilde{\tilde{t}} \end{matrix}

Credit Default Swaps Pt II – Credit Spreads

As well as governments, companies also need to borrow money and one way of achieving this is to issue bonds. Like government bonds, these pay a fixed rate of interest each year. Of course, this rate of interest will typically be quite a bit higher than government bonds in order to justify the increased credit risk associated with them. The difference between the two rates of interest is called the credit spread.

As we did before, today I’m just going to look at pricing a ZCB, but coupon-bearing bonds can be constructed as a product of these so it’s a relatively straight-forward extension.

Consider a ZCB issued by a risky company. It’s going to pay £1 at expiry, and nothing in-between. However, if the company defaults in the mean-time we get nothing. In fact, we’ll extend this a bit – instead of getting nothing we say that we will receive a fraction R of £1, since the company’s assets will be run down and some of their liabilities will get paid. This also allows us to distinguish between different classes of debt based on their seniority (we’re getting dangerously close to corporate finance here so let’s steer back to the maths).

In order to protect against the risk of loss, we might enter a Credit Default Swap (CDS) with a third party. In exchange for us making regular payments of £K/year to them (here we’re assuming that the payment accrues continuously for mathematical ease), they will pay us if and when the counterparty defaults. There are a few varieties of CDS – they might take the defaulted bond from us and reimburse what we originally paid for it, or they might just pay out an amount equal to our loss. We’ll assume the second variety here, so in the event of default they pay us £(1-R), since this is what we lose (up to some discount factors, but we’re keeping it simple here!). Also, if default occurs we stop making the regular payments, since we no longer need protection, so this product is essentially an insurance policy.

A cartoon of the payment schedule for a Credit Default Swap. We make continuous payments to the insurer of K per unit time, and in return we receive a one-off payment of (1-R) should the counterparty default, in which case the contract finishes early and no more payments happen
A cartoon of the payment schedule for a Credit Default Swap. We make continuous payments to the insurer of K per unit time, and in return we receive a one-off payment of (1-R) should the counterparty default, in which case the contract finishes early and no more payments happen

We value this product in the normal way. The value of the fixed leg is just the discounted cash flow of each payment multiplied by the probability that the counterparty hasn’t defaulted (because we stop paying in that case) – and remember that we said the payments are accruing continuously, so we use an integral instead of a sum:

    \[{\rm fixed} = -K \int_0^T \delta(0,t) S(t) dt\]

and the value of the insurance leg is the value of the payment times the chance that it defaults in any given interval

    \[{\rm floating} = -(1-R) \int_0^T \delta(0,t) p(t) dt\]

where in keeping with our notation from before, p(t) is the probability of default at a given time and S(t) is the chance that the counterparty hasn’t defaulted by a given time.

The contract has a fair value if these two legs are of equal value, which happens when

    \[0 = \int \delta(0,t) \Bigl(-(1-R) {dS \over dt} - K\cdot S(t) \Bigr) dt\]

At this point we refer to our exponential default model from Part I of this post.

In the exponential default model described in the previous post, we postulated a probability of default p(t) = \lambda e^{-\lambda t} which led to Survival Function S(t) = e^{-\lambda t}. Substituting these in to the equation above we have

    \[0 = \int \delta(0,t) e^{-\lambda t} \Bigl((1-R) \lambda - K \Bigr) dt\]

which is only zero when the integrand is also zero, so

    \[K = (1-R)\lambda\]

This is called the credit triangle, and it tells us the price of insuring a risky bond against default if we have it’s hazard rate. If the expected lifetime of the firm increases (ie. \lambda decreases) then the spreads will fall, as insurance is less likely to be required. If the expected recovery after default increases (R increases) then spreads also fall, as the payout will be smaller if it is required.

Although a little care is needed when moving from ZCBs to coupon-bearing bonds, it can be seen that the payments K (normally called the spreads) paid to insure the bond against default should be essentially the difference in interest payments between government (‘risk-free’) bonds and the risky bonds we are considering.

The default probabilities we have used here can be calibrated from market observed values for spreads K between government and corporate bonds. This resembles the process used to calculate volatility – the observable quantity on the market is actually the price, not the volatility/hazard rate, and we back this out from market prices using simple products and then use these parameters to price more exotic products to give consistent prices. This is the market’s view of the default probabilities in the risk-neutral measure, which means that they may not actually bear any resemblance to real world default probabilities (eg. historical default probabilities) and in fact are usually larger and often several times larger. See for example this paper for more on the subject.

Of course, throughout this analysis we have ignored the risk of the insurer themselves defaulting – indeed, credit derivatives rapidly become a nightmare when considered in general, and the Lehman Brothers default was largely down to some fairly heroic approximations that were made in pricing credit products that ultimately didn’t hold up in extreme environments. I’ll explore some of this behaviour soon in a post on correlated defaults and the gaussian copula model.

Credit Default Swaps Pt I: A Default Model for Firms

In today’s post I’m going to discuss a simple model for default of a firm, and in Part II I’ll discuss the price of insuring against losses caused by the default. As usual, the model I discuss today will be a vast over-simplification of reality, but it will serve as a building block for development. Indeed, there are many people in the credit derivatives industry who take these things to a much higher level of complexity.

Modelling the default of a firm is an interesting challenge. Some models are based around ratings agencies, giving firms a certain grade and treating transitions between grades as a Markov chain (similar to a problem I discussed before). I’m going to start with a simpler exponential default model. This says that the event of a given firm defaulting on all of its liabilities is a random event obeying a Poisson process. That is, it is characterised by a single parameter \inline \lambda which gives the likelihood of default in a given time period. We make a simplifying assumption that this is independent of all previous time periods, so default CAN’T be foreseen (this may be a weakness of the model, but perhaps not… discuss!).

Also, the firm can’t default more than once, so the process stops after default. A generalisation of the model will treat \inline \lambda as a function of time \inline \lambda(t) and even potentially a stochastic variable, but we won’t think about that for now.

Mathematically, the probability of default time tau occurring in the small window
\inline [t,t+dt] is

\lim_{dt \to 0} p( \tau < t + dt\ |\ \tau > t ) = \lambda dt

If we start at \inline t=0 with the firm not yet having defaulted, this tells us that

p ( \tau < dt\ |\ \tau > 0 ) = \lambda dt

and in a second and third narrow window, using the independence of separate time windows in the Poisson process and taking a physicist’s view on the meaning of \inline 2dt and similar terms,

\begin{matrix} p ( dt < \tau < 2dt ) & = & p ( \tau < 2dt\ |\ \tau > dt )\cdot p(\tau > dt ) \\ &=& \lambda dt \cdot (1- \lambda dt) \end{matrix}\begin{matrix} p (\ 2dt < \tau < 3dt\ ) &=& \lambda dt \cdot \bigl(1 - \lambda dt \cdot(1 - \lambda dt) - \lambda dt \bigr)\\ &=& \lambda dt \cdot (1 - 2\lambda dt - \lambda^2 dt^2)\\ &=& \lambda dt \cdot (1 - \lambda dt)^2\end{matrix}

and in general

p (\ t < \tau < t+dt\ ) = (1- \lambda dt)^n\cdot \lambda dtwhere \inline n = {t \over dt}} and we must take the limit that \inline dt \to 0, which we can do by making use of the identity

\lim_{a\to \infty} (1 + {x \over a})^a = e^x

we see that

\begin{matrix} \lim_{dt \to 0}p (\ t < \tau < t+dt\ ) & = & \lambda e^{-\lambda t}\\ & = & p(\tau = t) \end{matrix}

and its cumulative density function is

{\rm F}(t) = \int_0^t p(\tau = u) du = 1 - e^{-\lambda t}which gives the probability that default has happened before time \inline t (the survival function \inline {\rm S}(t) = 1 - {\rm F}(t) gives the reverse – the chance the firm is still alive at time \inline t) *another derivation of \inline p(\tau=t) is given at the bottom

A few comments on this result. Firstly, note that the form of \inline p(\tau=t) is the Poisson distribution for \inline n=1, which makes a lot of sense since this was what we started with! It implies a mean survival time of \inline \lambda^{-1}, which gives us some intuition about the physical meaning of lambda. The CDF (and consequently the survival function) are just exponential decays, I’ve plotted them below.

The survival/default probabilities for a firm in the exponential default model with various values of  decay parameter
The survival/default probabilities for a firm in the exponential default model with various values of decay parameter

Having characterised the default probability of the firm, in Part II I will think about how it affects the price of products that they issue.

 

*Another derivation of \inline p(\tau=t) is as follows:

\lambda dt = p ( \tau < t+dt\ |\ \tau > t ) = {p(t < \tau < t+dt)\over p(\tau > t)}

the first of these terms is approximately \inline p(\tau=t)dt, while the second is simply the survival function \inline S(t), which by definition obeys

p(\tau=t) = -{\partial S \over \partial t}

combining this we have

\lambda = -{1\over S}\cdot {\partial S \over \partial t}

and integrating gives

-\lambda t = \ln {S(t) \over S(0)}from which we get the result above, that

S(t) = e^{-\lambda t}

Fitting the initial discount curve in a stochastic rates model

 
I’ve introduced the Vasicek stochastic rates model in an earlier post, and here I’m going to introduce a development of it called the Hull-White (or sometimes Hull-White extended Vasicek) model.

The rates are modelled by a mean-reverting stochastic process

    \[dr = \bigl( \theta(t) - a r(t) \bigr) dt + \sigma dW_t\]

which is similar to the Vasicek model, except that the \theta(t) term is now allowed to vary with time (in general a and \sigma are too, but I’ll ignore those issues for today).

The freedom to set theta as a deterministic function of time allows us to calibrate the model to match the initial discount curve, which means at least initially our model will give the right price for things like FRAs. Calibrating models to match market prices is one of the main things quants do. Of course, the market doesn’t really obey our model. this means that, in time, the market prices and the prices predicted by our model will drift apart, and the model will need to be re-calibrated. But the better a model captures the various risk factors in the market, the less often this procedure will be needed.

Using the trick

    \[d \bigl( e^{at} r(t)\bigr) = e^{at} \bigl(dr + ar(t)dt \bigr)\]

to re-express the equation and integrating gives

    \[r(t) = r_0 e^{-at} + e^{-at}\int^t_0 \theta(u) e^{au} du + \sigma e^{-at} \int^t_0 e^{au} dW_u\]

where r_0 is the rate at t=0. The observable quantities from the discount curve are the initial discount factors (or equivalently the initial forward rates) P(0,t), where

    \[P(0,t) = {\mathbb E}\bigl[ e^{-\int^t_0 r(u) du} \bigr]\]

The rate r_0 is normally distributed, so the integral \int^t_0 r(u) du must be too. This is because an integral is essentially a sum, and a sum of normal distributions is also normally distributed. Applying the Ito isometry as discussed before, the expectation of this variable will come wholly from the deterministic terms and the variance will come entirly from the stochastic terms, giving

    \begin{align*} {\mathbb E} \bigl[ \int^t_0 r(u) du \bigr] &= r_0 B(0,t) + \int^t_0 e^{-au} (\int^u_0 \theta(s) e^{as} ds ) du \nonumber \\ {\mathbb V}\bigl[ \int^t_0 r(u) du \bigr] &= \sigma^2 \int^t_0 B(u,t)^2 du \nonumber \end{align*}




where throughout

    \[B(t,t') = {1 \over a}\bigl( 1 - e^{-a(t'-t)}\bigr)\]

and since

    \[{\mathbb E} \bigl[ e^x | x \sim {\mathbb N}(\mu,\sigma^2) \bigr] = e^{\mu + {1\over 2}\sigma^2}\]

we have

    \[P(0,t) = \exp \Bigl\{ -r_0 B(0,t) - \int^t_0 e^{-au} \bigl( \int_0^u \theta(s) e^{as} ds + {1 \over 2} \sigma^2 B(u,t)^2 \bigr) du \Bigr\}\]

Two differentiations of this expression give

    \[-{\partial \over \partial t} \ln P(0,t) = r_0 e^{-at} + e^{-at} \int^t_0 \theta(u) e^{au} du + {1\over 2} \sigma^2 B(0,t)^2\]

    \[-{\partial^2 \over \partial t^2} \ln P(0,t) = -ar_0 e^{-at} + \theta(t) - ae^{-at} \int^t_0 \theta(u) e^{au} du + \sigma^2 e^{-at} B(0,t)\]

and combining these equations gives an expression for \theta(t) that exactly fits the initial discount curve for the given currency

    \[\theta(t) = -{\partial^2 \over \partial t^2} \ln P(0,t) - a{\partial \over \partial t} \ln P(0,t) + {\sigma^2\over 2a}( 1 - e^{-2at} )\]

and since -{\partial \over \partial t} \ln P(0,t) = f(0,t) is simply the initial market observed forward rate to each time horizon coming from the discount curve, this can be compactly expressed as

    \[\theta(t) = {\partial \over \partial t} f(0,t) + a f(0,t) + {\sigma^2 \over 2a} ( 1 - e^{-2at} )\]

Today we’ve seen how a simple extension to the ‘basic’ Vasicek model allows us to match the initial discount curve seen in the market. Allowing the volatility parameter to vary will allow us to match market prices of other products such as swaptions (an option to enter into a swap), which I’ll discuss another time. But we’re gradually building up a suite of simple models that we can combine later to model much more complicated environments.

Interview Questions VI: The Drunkard’s Walk

A drunk man is stumbling home at night after closing time. Every step he takes moves him either 1 metre closer or 1 metre further away from his destination, with an equal chance of going in either direction (!). His home is 70 metres down the road, but unfortunately, there is a cliff 30 metres behind him at the other end of the street. What are the chances that he makes it to his house BEFORE tumbling over the edge of the cliff?




This is a fun question and quite heavy on conditional probability. We are trying to find the probability that the drunkard has moved 70 metres forward BEFORE he has ever moved 30 metres backward, or visa versa. There are several ways of attempting this, including some pretty neat martingale maths, but I’m going to attempt it here in the language of matrices and markov chains.

Essentially, there are 100 states that the drunkard can be in, from right beside the cliff all the way down the road to right beside his front door. Let’s label these from 1 to 100, in terms of the number of metres away from the cliff, so that he starts in state 30. At each step, he transitions from his current state to either one state higher or one state lower with probability 50%, and the process continues until he reaches either the cliff or the door, at which point the process will cease in either a good or a very bad night’s rest. We call these two states, 0 and 100, ‘absorbers’, because the process stops at this point and no transitions to new states can happen. A markov diagram that illustrates this process can be drawn like this:

A markov chain displaying the transition probabilities for each state in the drunkard's walk
A markov chain displaying the transition probabilities for each state in the drunkard’s walk

We can characterise each step in the process by a transition matrix acting on a state vector. The drunkard initially has a state of 30 metres, so his state vector is a long string of zeroes with a single 1 at the 30th position:

S_0 = \begin{pmatrix} 0 \\ \vdots \\ 1\\ \vdots\\ 0 \end{pmatrix}

This vector is probabalistic – a 1 indicates with 100% certainty that the drunkard is in the 30th state. However, with subsequent moves this probability density will be spread over the nearby states as his position’s probability density diffuses into other states. The transition matrix multiplies the drunkard’s state vector to give his state vector after another step:

P = \begin{pmatrix} 1 & 0.5 & 0 & \cdots & 0 & 0\\ 0 & 0 & 0.5 & \cdots & 0 & 0\\ 0 & 0.5 & 0 & \cdots & 0 & 0\\ 0 & 0 & 0.5 & & 0 & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots\\ 0 & 0 & 0 & \cdots & 0.5 & 1 \end{pmatrix}; \quad S_{i+1} = P \cdot S_i

So, after one step the drunkard’s state vector will have a 0.5 in the 29th and the 31st position and zeroes elsewhere, saying that he will be in either of these states with probability 0.5, and certainly nowhere else. Note that the 1’s at the top and bottom of the transition matrix will absorb and probability that arrives at that state for the rest of the process.

To keep things simple, let’s consider a much smaller problem with only six states, where state 1 is ‘down the cliff’ and state 6 is ‘home’; and we’ll come back to the larger problem at the end. We want to calculate the limit of the drunkard’s state as the transition matrix is applied a large number of times, ie. to calculate

S_n = P^n \cdot S_0

An efficient way to calculate powers of a large matrix is to first diagonalise it. We have \inline P = U A\ U^{-1}, where \inline A is a diagonal matrix whose diagonal elements are the eigenvalues of \inline P, and \inline U is the matrix whose columns are the eigenvectors of \inline P. Note that, as \inline P is not symmetric, \inline U^{-1} will have row vectors that the the LEFT eigenvectors of \inline P, which in general will be different to the right eigenvectors. It is easy to see how to raise \inline P to the power n

P^n = (U A\ U^{-1})^n = U A^n\ U^{-1}

since all of the \inline Us and \inline U^{-1}s in the middle cancel. Since \inline A is diagonal, to raise it to the power of n we simlpy raise each element (which are just the eigenvalues of \inline P) to the power of n. To calculate the eigenvalues of P we solve the characteristic equation

|P - \lambda_a I| = 0

with

P = \begin{pmatrix} 1 & 0.5 & 0 & 0 & 0 & 0\\ 0 & 0 & 0.5 & 0 & 0 & 0\\ 0 & 0.5 & 0 & 0.5 & 0 & 0\\ 0 & 0 & 0.5 & 0 & 0.5 & 0 \\ 0 & 0 & 0 & 0.5 & 0 & 0\\ 0 & 0 & 0 & 0 & 0.5 & 1 \end{pmatrix}

This gives six eigenvalues, two of which are one (these will turn out to correspond to the two absorbing states) and the remainder are strictly \inline 0\leq \lambda_a < 1. Consequently, when raised to the power n in the diagonal matrix above, all of the terms will disappear except for the first and the last eigenvalue which are 1 as n becomes large.

Calculating the eigenvectors is time consuming and we’d like to avoid it if possible. Luckily, in the limit that n gets large, we’ve seen that most of the eigenvalues raised to the power n will go to zero which will reduce significantly the amount we need to calculate. We have

P^n\cdot S = \bigl( U A^n \ U^{-1}\bigr)\cdot S

and in the limit that n gets large, \inline U \cdot A^n is just a matrix of zeroes with a one in the upper left and lower right entry. At first sight it looks as though calculating \inline U^{-1} will be required, which is itself an eigenvector problem, but in fact we only have to calculate a single eigenvector – the first (or equivalently the last), which will give the probability of evolving from an initial state S to a final state 0 (or 100).

\inline U^{-1} is the matrix of left eigenvectors of \inline P, each of which satisfies

x_a \cdot P = \lambda_a x_a

and we are looking for the eigenvectors corresponding to eigenvalue of 1, so we need to solve the matrix equation

x \cdot \begin{pmatrix} 0 & 0.5 & 0 & 0 & 0 & 0\\ 0 & -1 & 0.5 & 0 & 0 & 0\\ 0 & 0.5 & -1 & 0.5 & 0 & 0\\ 0 & 0 & 0.5 & -1 & 0.5 & 0 \\ 0 & 0 & 0 & 0.5 & -1 & 0\\ 0 & 0 & 0 & 0 & 0.5 & 0 \end{pmatrix} = 0

We know that if he starts in the first state (ie. over the cliff) he must finish there (trivially, after 0 steps) with probability 100%, so that \inline x_1 = 1 and \inline x_6 = 0. The solution to this is

x_n = {6 - n \over 5}

which is plotted here for each initial state

The drunkard's final state probabilities
The probability of the drunkard finishing in either final state given his initial state. Note that this sums to one in all cases as eventually he must finish in one or the other state.

which says that the probability of ending up in state 1 (down the cliff!) falls linearly with starting distance from the cliff. We can scale up this final matrix equation to the original 100 by 100 state space, and find that for someone starting in state 30, there is a 70/100 chance of ending up down the cliff, and consequently only a 30% chance of getting home!

This problem is basically the same as the Gambler’s ruin problem, where a gambler in a casino stakes $1 on each toss of a coin and leaves after reaching a goal of $N or when broke. There are some very neat methods for solving them via martingale methods that don’t use the mechanics above that I’ll look at in a future post.

Asian Options II: Monte Carlo

In a recent post, I introduced Asian options, an exotic option which pay the average of the underlying over a sequence of fixing dates.

The payoff of an Asian call is

C(T) = \Big({1 \over N}\sum^N_{i=0} S_{t_i} - K \Big)^+

and to price the option at an earlier time via risk-neutral valuation requires us to solve the integral

C(t) = P_{t,T}\ \big( \iint\cdots\int \big) \Big({1 \over N}\sum^N_{i=0} S_{t_i} - K \Big)^+ \phi(S_{t_1},S_{t_1},\cdots,S_{t_N}) dS_{t_1}dS_{t_2}\cdots dS_{t_N}

A simple way to do this is using Monte Carlo – we simulate a spot path drawn from the distribution \inline \phi(S_{t_1},S_{t_1},\cdots,S_{t_N}) and evaluate the payoff at expiry of that path, then average over a large number of paths to get an estimate of C(t).

It’s particularly easy to do this in Black-Scholes, all we have to do is simulate N gaussian variables, and evolve the underlying according to a geometric brownian motion.

I’ve updated the C++ and Excel Monte Carlo pricer on the blog to be able to price asian puts and calls by Monte Carlo – have a go and see how they behave relative to vanilla options. One subtlety is that we can no longer input a single expiry date, we now need an array of dates as our function input. If you have a look in the code, you’ll see that I’ve adjusted the function optionPricer to take a datatype MyArray, which is the default array used by XLW (it won’t be happy if you tell it to take std::vector). This array can be entered as a row or column of excel cells, and should be the dates, expressed as years from the present date, to average over.