Improving Monte Carlo: Control Variates

I’ve already discussed quite a lot about Monte Carlo in quantitative finance. MC can be used to value products for which an analytical price is not available in a given model, which includes most exotic derivatives in most models. However, two big problems are the time that it takes to run and the ‘Monte Carlo error’ in results.

One technique for improving MC is to use a ‘control variate’. The idea is to find a product whose price is strongly correlated to the product that we’re trying to price, but which is more easy to calculate or which we already know the price of. When we simulate a path in MC, it will almost surely give either an under-estimate or an over-estimate of the true price, but we don’t know which, and averaging all of these errors is what leads to the Monte Carlo error in the final result. The insight in the control variate technique is to use the knowledge given to us by the control variate to reduce this error. If the two prices are strongly correlated and a path produces an over-estimate of product price, it most likely also produces an over-estimate of the control variate and visa versa, which will allow us to improve our estimate of the product we’re trying to price.

The textbook example is the Asian Option. Although the arithmetic version of the asian option discussed in previous posts has no analytic expression in BS, a similar Geometric asian option does have an analytic price. So, for a given set of model parameters, we can calculate the price of the option. As a reminder, the payoff of an arithmetic asian option at expiry is

    \[C_{\rm arit}(T) = \Bigl({1\over N}\sum_{i=0}^{N-1} S(t_i) - K \Bigr)^+\]

and the payoff of the related geometric averaging asian is

    \[C_{\rm geo}(T) = \Bigl( \bigl(\prod_{i=0}^{N-1} S(t_i)\bigr)^{1\over N} - K \Bigr)^+\]

Denoting the price of the arithmetic option as X and the geometric option as Y, the traditional monte carlo approach is to generate N paths, and for each one calculate X_i (the realisation of the payoff along the path) and take the average over all paths, so that

    \[{\mathbb E}[X] = {1 \over N} \sum_{i=0}^{N-1} X_i\]

which will get closer to the true price as N \to \infty.

Using Y as a control variate, we instead calculate

    \[{\mathbb E}[X] = {1\over N} \sum_{i=0}^{N-1}\bigl( X_i - \lambda( Y_i - {\mathbb E}[Y] ) \bigr)\]

where {\mathbb E}[Y] is the price of the geometric option known from the analytical expression, and \lambda is a constant (in this case we will set it to 1).

What do we gain from this? Well, consider the variance of X_i - \lambda (Y_i - {\mathbb E } [Y])

    \[{\rm Var}\bigl( X_i - ( Y_i - {\mathbb E}[Y] ) \bigr) = {\rm Var}( X_i ) + \lambda^2 {\rm Var} ( Y_i ) - 2 \lambda {\rm Cov}( X_i Y_i )\]

(since {\mathbb E } [Y] is known, so has zero variance) which is minimal for \lambda = \sqrt{ {\rm Var}(X_i) \over {\rm Var}(Y_i) }\cdot \rho(X_i,Y_i) in which case

    \[{ {\rm Var}\bigl( X_i - ( Y_i - {\rm E}[Y] ) \bigr) \over {\rm Var}( X_i )} = 1 - \rho(X_i,Y_i)^2\]

that is, if the two prices are strongly correlated, the variance of the price calculated using the control variate will be a significantly smaller. I’ve plotted a sketch of the prices of the two types of average for 100 paths – the correlation is about 99.98%. Consequently, we expect to see a reduction in variance of about 2000 times for a given number of paths (although we now have to do a little more work on each path, as we need to calculate the geometric average as well as the arithmetic average of spots). This is roughly 45 times smaller standard error on pricing – well over an extra decimal place, which isn’t bad – and this is certainly much easier than running 2000 times as many paths to achieve the same result.

The relationship between payout for a geometric and an arithmetic asian option, which here demonstrate a 99.98% sample correlation
The relationship between payout for a geometric and an arithmetic asian option, which here demonstrate a 99.98% sample correlation. Parameters used were: r: 3%; vol: 10%; K: 105; S(0): 100; averaging dates: monthly intervals for a year

Interview Questions IV

Another question about correlations today, this time I thought we could have a look at a simple type of random walk, in which the distance travelled at each step either backwards or forwards and has a random length, and how to deal with the issues that come up.

Let’s say at each step we move a distance along the x-axis that is distributed randomly and uniformly between -1 and +1, and importantly that each step is independent of the others. So, after N steps the total distance travelled, L, is

L_N = \sum_{i=0}^{N} x_i\ ; \qquad x_i\sim {\mathbb U}[-1,+1]

where \inline x_i is the i-th step length.

Calculate:

i) the expected distance travelled after N steps

ii) the standard deviation of the distance travelled after N steps

iii) the autocorrelation between the distance travelled at N steps and the distance travelled at N+n steps

Since we’re dealing with uniform variables, it makes sense to start by calculating the expectation and variance of a single realisation of a variable of this type. The expectation is trivially 0, while the variance is

\begin{align*} {\rm Var}[x_i] & = {\mathbb E}[x_i^2] - {\mathbb E}[x_i]^2\\ & = \int_{-1}^{+1} x^2 dx - 0 \\ & = {2\over 3} \end{align}

We’ll also make use of the independence of the individual variables at several points, we recall that for independent variables x and y, that \inline {\mathbb E}[xy] = {\mathbb E}[x] {\mathbb E}[y]

 

i) This one is fairly straight-forward. Expectation is a linear operator, so we can take it inside the sum. We know the expectation of an individual variate, so the expectation of the sum is just the product of these

\begin{align*} {\mathbb E}\Big[\sum_{i=0}^N x_i \Big] & = \sum_{i=0}^N {\mathbb E}[ x_i ]\\ & = N\cdot 0\\ & = 0 \end{align}

 

ii) The standard deviation is the square root of the variance, which is the expectation of the square minus the square of the expectation. We know the second of these is 0, so we only need to calculate the first,

\begin{align*} {\rm Var}\Big[\sum_{i=0}^N x_i \Big] & = {\mathbb E}\Big[\Big(\sum_{i=0}^N x_i \Big)^2\Big]\\ & = {\mathbb E}\Big[\sum_{i,j=0}^N x_i x_j\Big]\\ &=\sum_{i,j=0}^N {\mathbb E} [x_i x_j] \end{align}

There are two types of term here. When i and j are not equal, we can use the independence criterion given above to express this as the product of the two individual expectations, which are both 0, so these terms don’t contribute. So we are left with

\begin{align*} {\rm Var}\Big[\sum_{i=0}^N x_i \Big] &=\sum_{i=0}^N {\mathbb E} [(x_i)^2] \\ &= N\cdot {2\over3} \end{align}and the standard deviation is simply the square root of this.

 

iii) This is where things get more interesting – the autocorrelation is the correlation of the sum at one time with its value at a later time. This is a quantity that quants are frequently interested in, since the value of a derivative that depends on values of an underlying stock at several times will depend sensitively on the autocorrelation. We recall the expression for correlation

\rho(x,y) = {{\rm Cov}(x,y) \over \sqrt{{\rm Var}(x){\rm Var}(y) } }

So we are trying to calculate

\begin{align*} \rho(L_N, L_{N+n}) = {\rm Cov}\Big[\sum_{i=0}^N x_i \cdot \sum_{j=0}^{N+n} x_j \Big] \cdot {3 \over 2\sqrt{N (N+n)}} \end{align}

where I’ve substituted in the already-calculated value of the variances of the two sums.

We can again use the independence property of the steps to separate the later sum into two, the earlier sum and the sum of the additional terms. Also, since the expectation of each sum is zero, the covariance of the sums is just the expectation of their product

\begin{align*} \rho(L_N, L_{N+n})&= {\rm Cov}\Big[\sum_{i=0}^N x_i \cdot \Big(\sum_{j=0}^{N} x_j + \sum_{j=N+1}^{N+n} x_j \Big) \Big] \cdot {3 \over 2\sqrt{N (N+n)}}\\&= {\mathbb E}\Big[\sum_{i=0}^N x_i \cdot \Big(\sum_{j=0}^{N} x_j + \sum_{j=N+1}^{N+n} x_j \Big) \Big] \cdot {3 \over 2\sqrt{N (N+n)}}\\&= {\mathbb E}\Big[\sum_{i,j=0}^N x_i x_j + \sum_{i=0}^N x_i \cdot\sum_{j=N+1}^{N+n} x_j \Big] \cdot {3 \over 2\sqrt{N (N+n)}} \end{align}and using the results above and the independence of the final two sums (because they are the sums of different sets of terms, and each term is independent to all the others) we know

{\mathbb E}\Big[\sum_{i,j=0}^N x_i x_j \Big] = {2 \over 3}N

{\mathbb E}\Big[\sum_{i=0}^N x_i \cdot\sum_{j=N+1}^{N+n} x_j \Big] ={\mathbb E}\Big[\sum_{i=0}^N x_i \Big]\cdot {\mathbb E}\Big[\sum_{j=N+1}^{N+n} x_j \Big] = 0

so

\begin{align*}\rho(L_N, L_{N+n}) & = {N\over \sqrt{N(N+n)}}\\ &= \sqrt{N\over N+n} \end{align*}

What does this tell us? Roughly that the sum of the sequence up to N+n terms is correlated to its value at earlier points, but as n gets larger the correlation decreases, as the new random steps blur out the position due to the initial N steps.

We can test our expressions using the RAND() function in excel. Try plotting a sequence of sets of random numbers and summing them, and then plotting the set of sums of 100 terms against the set of sums of 120 or 200 terms (nb. in excel, you probably want to turn auto-calculate off first to stop the randoms from refreshing every time you make a change – instructions can be found here for Excel 2010; for Excel 2013 I found the option inside the “FORMULAS” tab and at the far end – set the ‘Calculation Options’ to manual). I’ve done exactly that, and you can see the results below.

The sum of 100 terms vs. the sum of 120 terms. These are of course highly correlated, as the additional 20 terms usually don't affect the overall sum to a significant extent
The sum of 100 terms vs. the sum of 120 terms. These are of course highly correlated, as the additional 20 terms usually don’t affect the overall sum to a significant extent
The sum of the first 100 terms against the sum of 200 terms. We can see that the sums are slowly becoming less correlated
The sum of the first 100 terms against the sum of 200 terms. We can see that the sums are slowly becoming less correlated
This is the sum of the first 100 terms against the first 500. The correlation is much lower than in the graphs above, but not that from the formula we derived we still expect a correlation of around 45% despite the large number of extra terms in the second sum.
This is the sum of the first 100 terms against the first 500. The correlation is much lower than in the graphs above, but note that from the formula we derived we still expect a correlation of around 45% despite the large number of extra terms in the second sum.

You can also try calculating the correlation of the variables uing Excel’s CORREL() that you generate – these should tend towards the expression above as the number of sums that you compute gets large (if you press F9, all of the random numbers in your sheet will be recomputed and you can see the actual correlation jump around, but these jumps will be smaller as the number of sums gets larger).

Interview Quesions III

Today’s question will test some of the statistics and correlation I’ve discussed in the last couple of months. Assume throughout that x\sim {\mathbb N}(0,1) and y\sim {\mathbb N}(0,1) are jointly normally distributed such that {\mathbb E}[x \cdot y] = \rho

a) Calculate {\mathbb E}[\ e^x \ ]
b) Calculate {\mathbb E}[\ e^x \ | \ y = b\ ]

The first expectation is of a lognormal variate, and the second is of a lognormal variate conditional on some earlier value of the variate having been a particular value – these are very typical of the sorts of quantities that a quant deals with every day, so the solution will be quite instructive! Before reading the solution have a go at each one, the following posts may be useful: SDEs pt. 1, SDEs pt. 2, Results for Common Distributions

a) Here, we use the standard result for expectations

    \begin{align*} {\mathbb E}[ \ e^x \ ] &= \int^{\infty}_{-\infty} e^x \cdot f(x) \ dx \nonumber \\ \nonumber \\ &= {1 \over \sqrt{2 \pi}}\int^{\infty}_{-\infty} e^x \cdot e^{-{1 \over 2}x^2} \ dx \nonumber \\ \nonumber \\ &= {1 \over \sqrt{2 \pi}}\int^{\infty}_{-\infty} \exp\Bigl( -{1\over 2}\Bigl[ x^2 - 2x + 1 - 1 \Bigr] \Bigr) \ dx \nonumber \\ \nonumber \\ &= {e^{1\over 2} \over \sqrt{2 \pi}}\int^{\infty}_{-\infty} \exp\Bigl( -{1\over 2} (x-1)^2 \Bigr) \ dx \nonumber \\ \nonumber \\ &= e^{1\over 2} \nonumber \end{align}

b) This one is a little tougher, so first of all I’ll discuss what it means and some possible plans of attack. We want to calculate the expectation of e^x, given that y takes a value of b. Of course, if x and y were independent, this wouldn’t make any difference and the result would be the same. However, because they are correlated, the realised value of y will have an effect on the distribution of x.

To demonstrate this, I’ve plotted a few scatter-graphs illustrating the effect of specifying y on x, with x and y uncorrelated and then becoming increasing more correlated.

When x and y are uncorrelated, the realised value of y doesn't affect the distribution for x, which is still normally distributed around zero
When x and y are uncorrelated, the realised value of y doesn’t affect the distribution for x, which is still normally distributed around zero
When x and y are correlated, the realised value of y has an effect on the distribution of x, which is no longer centered on zero and has a smaller variance
When x and y are correlated, the realised value of y has an effect on the distribution of x, which is no longer centered on zero and has a smaller variance
When the correlation of x and y becomes high, the value of x is almost completely determined by y. Now, if y is specified then x is tightly centered around a value far from zero
When the correlation of x and y becomes high, the value of x is almost completely determined by y. Now, if y is specified then x is tightly centered around a value far from zero

The simplest way of attempting this calculation is to use the result for jointly normal variates given in an earlier post, which says that if x and y have correlation \rho, we can express x in terms of y and a new variate z \sim {\mathbb N}(0,1) which is uncorrelated with y

    \[x = \rho\cdot y + \sqrt{1 - \rho^2}\cdot z\]

so

    \[(\ e^x\ |\ y = b\ ) = e^{\rho y + \sqrt{1-\rho^2}z} = e^{\rho b}\cdot e^{\sqrt{1-\rho^2}z}\]

Since the value of y is already determined (ie. y = b), I’ve separated this term out and the only thing I have to calculate is the expectation of the second term in z. Since y and z are independent, we can calculate the expectation of the z, which is the same process as before but featuring slightly more complicated pre-factors

    \begin{align*} {\mathbb E}[ \ e^x \ |\ y = b\ ] &= e^{\rho b} \int^{\infty}_{-\infty} e^{\sqrt{1-\rho^2}z} \cdot f(z) \ dz \nonumber \\ \nonumber \\ &= {e^{\rho b} \over \sqrt{2 \pi}}\int^{\infty}_{-\infty} e^{\sqrt{1-\rho^2}z} \cdot e^{-{1 \over 2}z^2} \ dz \nonumber \\ \nonumber \\ &= {e^{\rho b} \over \sqrt{2 \pi}}\int^{\infty}_{-\infty} \exp\Bigl( -{1\over 2}\Bigl[ z^2 - 2\sqrt{1-\rho^2}z \nonumber \\ & \quad \quad \quad \quad \quad \quad \quad \quad + (1-\rho^2) - (1-\rho^2) \Bigr] \Bigr) \ dz \nonumber \\ \nonumber \\ &= {e^{\rho b} \over \sqrt{2 \pi}} e^{{1 \over 2} (1-\rho^2)} \int^{\infty}_{-\infty} \exp\Bigl( -{1\over 2} \bigl(z-\sqrt{1-\rho^2}\bigr)^2 \Bigr) \ dz \nonumber \\ \nonumber \\ &= e^{{1\over 2}(1-\rho^2)+\rho b} \nonumber \end{align}

We can check the limiting values of this – if \rho = 0 then x and y are independent [this is not a general result by the way – see wikipedia for example – but it IS true for jointly normally distributed variables], in this case {\mathbb E}[ \ e^x \ |\ y = b\ ] = e^{0.5} just as above. If \rho = \pm 1, {\mathbb E} [\ e^x \ |\ y=b\ ] = e^{\pm b}, which also makes sense since in this case x = \pm y = \pm b, so y fully determines the expectation of x.

The more general way to solve this is to use the full 2D joint normal distribution as given in the previous post mentioned before,

    \[f(x,y) = {1 \over {2\pi \sqrt{1-\rho^2}}} \exp{\Bigl(-{1 \over 2(1-\rho^2)}(x^2 + y^2 - 2\rho xy)\Bigr)}\]

This is the joint probability function of x and y, but it’s not quite what we need – the expectation we are trying to calculate is

    \[{\mathbb E}[ \ e^x \ |\ y \ ] = \int^{\infty}_{-\infty} e^x \cdot f(\ x \ |\ y \ ) \ dx\]

So we need to calculate the conditional expectation of x given y, for which we need Bayes’ theorem

    \[f(x,y) = f( \ x \ | \ y \ ) \cdot f(y)\]

Putting this together, we have

    \[{\mathbb E}[ \ e^x \ |\ y = b\ ] = \int^{\infty}_{-\infty} e^x \cdot {f(x,y) \over f(y)}\ dx\]

    \[={1 \over {2\pi \sqrt{1-\rho^2}}} \int^{\infty}_{-\infty} { e^x \over e^{-{1\over 2}y^2}} \exp{\Bigl(-{1 \over 2(1-\rho^2)}(x^2 + y^2 - 2\rho xy)\Bigr)} dx\]

    \[={e^{{1\over 2}b^2} \over {2\pi \sqrt{1-\rho^2}}} \int^{\infty}_{-\infty} e^x \exp{\Bigl(-{1 \over 2(1-\rho^2)}(x^2 + b^2 - 2\rho xb)\Bigr)} dx\]

This integral is left as an exercise to the reader, but it is very similar to those given above and should give the same answer as the previous expression for {\mathbb E}[\ e^x \ | \ y = b\ ] after some simplification!