I’ve already discussed quite a lot about Monte Carlo in quantitative finance. MC can be used to value products for which an analytical price is not available in a given model, which includes most exotic derivatives in most models. However, two big problems are the time that it takes to run and the ‘Monte Carlo error’ in results.
One technique for improving MC is to use a ‘control variate’. The idea is to find a product whose price is strongly correlated to the product that we’re trying to price, but which is more easy to calculate or which we already know the price of. When we simulate a path in MC, it will almost surely give either an under-estimate or an over-estimate of the true price, but we don’t know which, and averaging all of these errors is what leads to the Monte Carlo error in the final result. The insight in the control variate technique is to use the knowledge given to us by the control variate to reduce this error. If the two prices are strongly correlated and a path produces an over-estimate of product price, it most likely also produces an over-estimate of the control variate and visa versa, which will allow us to improve our estimate of the product we’re trying to price.
The textbook example is the Asian Option. Although the arithmetic version of the asian option discussed in previous posts has no analytic expression in BS, a similar Geometric asian option does have an analytic price. So, for a given set of model parameters, we can calculate the price of the option. As a reminder, the payoff of an arithmetic asian option at expiry is
and the payoff of the related geometric averaging asian is
Denoting the price of the arithmetic option as and the geometric option as , the traditional monte carlo approach is to generate paths, and for each one calculate (the realisation of the payoff along the path) and take the average over all paths, so that
which will get closer to the true price as .
Using as a control variate, we instead calculate
where is the price of the geometric option known from the analytical expression, and is a constant (in this case we will set it to 1).
What do we gain from this? Well, consider the variance of
(since is known, so has zero variance) which is minimal for in which case
that is, if the two prices are strongly correlated, the variance of the price calculated using the control variate will be a significantly smaller. I’ve plotted a sketch of the prices of the two types of average for 100 paths – the correlation is about 99.98%. Consequently, we expect to see a reduction in variance of about 2000 times for a given number of paths (although we now have to do a little more work on each path, as we need to calculate the geometric average as well as the arithmetic average of spots). This is roughly 45 times smaller standard error on pricing – well over an extra decimal place, which isn’t bad – and this is certainly much easier than running 2000 times as many paths to achieve the same result.
A simple way to do this is using Monte Carlo – we simulate a spot path drawn from the distribution and evaluate the payoff at expiry of that path, then average over a large number of paths to get an estimate of C(t).
It’s particularly easy to do this in Black-Scholes, all we have to do is simulate N gaussian variables, and evolve the underlying according to a geometric brownian motion.
I’ve updated the C++ and Excel Monte Carlo pricer on the blog to be able to price asian puts and calls by Monte Carlo – have a go and see how they behave relative to vanilla options. One subtlety is that we can no longer input a single expiry date, we now need an array of dates as our function input. If you have a look in the code, you’ll see that I’ve adjusted the function optionPricer to take a datatype MyArray, which is the default array used by XLW (it won’t be happy if you tell it to take std::vector). This array can be entered as a row or column of excel cells, and should be the dates, expressed as years from the present date, to average over.
Now that I’ve got some Monte Carlo code up, it’s inevitable that I will eventually need an implementation of the Inverse of the Normal Cumulative Density Function (CDF). The inverse of a CDF is called a Quantile function by the way, so I’ll often refer to this as the Normal Quantile function. The reason this is so important is because, as I discussed in my post on Random Number Generators, it can be used to convert uniformly distributed random numbers into normally distributed random numbers, which we will need for Monte Carlo simulations if we believe that underlyings are distributed according to a lognormal distribution.
As a reminder, I’ve repeated the graph from the previous post here to show the behaviour of the Normal Quantile:
As with the implementation of an algorithm for the Normal CDF (discussed here), there were several possibilities for implementation. In the end I’ve decided to go with the analytic approximation due to Beasley, Springer and Moro as presented in Joshi’s text on computational derivative pricing:
A polynomial form is used for the central region of the quantile, where :
And if or then:
With a plus at the front if and a minus in front if .
This is fairly swift as an algorithm, and the quantile method of generating normal variates will turn out to have some key advantages over the various other methods discussed before which will become apparent in future. In a future post I’ll show how to integrate this method of converting uniform variates to gaussian variates into the Monte Carlo code that I’ve made available here, thanks to the factory pattern that I’ve used it will turn out to be incredibly straight-forward!
Finally, if you are wondering why the numbers in these numerical algorithms seem so arbitrary, it’s because they [sort-of] are! Once I’ve implemented some optimisation routines (this might not be for a while…) I’ll have a look at how we can find our own expressions for these functions, and hopefully be able to back out some of the numbers above!!
I’m in the process of preparing some C++ code for the blog to allow user-downloadable and -compilable Monte Carlo pricers that won’t overload my server. That’s got me thinking about random number generators, and I’m going to talk a bit about them here [In fact, all of the algorithms discussed below are entirely deterministic, so I should really call them pseudo-random number generators. Whether or not particular algorithms are suitable for use in applications requiring random numbers is an area of active research and occasional catastrophic failure].
Generally, in Monte Carlo simulations we want to be able to generate many ‘sample paths’, each of which is a realisation of a possible evolution of the underlying price, and evaluate the payoff for that path. As I discussed in an earlier post, we usually model these quantities as depending on brownian motions and results are normally distributed, which means we need to generate random variables that are also normally distributed (so, for example, we are more likely to get values near to zero than at large values).
Generating these directly is quite tricky, so the first step is usually to generate uniform random variables and then transform these to gaussian variates. A uniform random variable is a variable with a constant probability of being anywhere within a certain range, typically [0,1] or [-1,1] – for the rest of this post, I denote a uniform variable as U[…], and define u~[0,1] and v~[-1,1]. The pdf of u and v are shown here:
A straight-forward method of generating these numbers uses modular multiplication [this is variously called a Park-Miller generator, a linear congruential generator and a Lehmer generator]:
For good choices of the variables, this will generate a sequence with a period of n that is uniformly distributed across the interval [0,n], so dividing by n rescales the variables to be distributed roughly according to u. The choices of g, n and (the ‘seed’) are non-trivial and several are given in the literature, as usual I’ve followed the advice given on wikipedia and lumped for n = 2^32 – 5, g = 279470273, and the choice of seed is left to the user but should be coprime to modulus n. Note that this will have a period of about 2^32, so the upper limit on the number of monte-carlo evaluations that can meaningfully be done is around 4 billion steps. It is also easy to see how the generator could be hampered by a poor choice of parameters – if the multiplier and the modulus are not co-prime, for example, the actual period of the sequence can be much less than n.
The more advanced Marsenne Twister algorithm aims to correct many of the problems with the Park-Miller generator, such as increasing the period length of the sequence and improving the statistical randomness of the values produced. It’s too complicated to go through here (it’s a full post in itself!) in full detail but many good summaries of the theory and performance of the twister algorithm can be found online.
One other thing to point out is that if initilised in the same way, each time one of the above algorithms is run it will return the same sequence of numbers. At first this might seem like an unwanted property as it deviates from ‘true’ randomness. However, in our context it will be extremely valuable as it allows us to test code using the same sequence of numbers each time. This is vital for calculating greeks by bump-and-revalue using monte carlo as otherwise monte carlo error in price will usually swamp small changes due to greeks, and it also helpful when optimising code to make sure that results aren’t being affected.
Once we have a source of uniform random variates, we need to consider methods for converting them to gaussian variates. The simplest and most direct is the inversion method, which involves finding a mapping between uniform random variate u and the standard normal random variate z. Consider first the CDF of the normal variable (I have discussed this before here), shown below:
This shows us the likelihood that z is below a certain value X, and is equal to the integral of the PDF from to X. Because the PDF is normalised to integrate to 1, the CDF maps each possible outcome of z to a value between 0 and 1, with the most likely outcomes being where the CDF is steepest (since these cover the y-axis values more quickly) which is exactly where the PDF is largest. This is exactly the opposite of what we want, so we consider instead the inverse CDF shown here:
This mapping is exactly what we’re after [although there are others – it’s not unique]. It maps uniformly distributed variates to normally distributed variates and they’re in a 1-to-1 correspondence. This procedure is robust and in principle it is exact. However, it is not always the fastest method, and it requires us to know the inverse CDF, which to evaluate in a reasonable time requires us to make approximations (this technique works for any distribution – although there is no closed form CDF for the normal distribution, for many other distributions the inverse CDF can be expressed in closed form, in which case this is almost certainly the best method).
There are several other techniques for obtaining normal distributions from uniform variates, a few of which I have implemented and describe them here because they’re quite interesting. The first is called the Box-Muller technique, which converts two independent uniform variates into two independent normal variates by combining the inversion technique described above, the trick that we use to calculate a gaussian integral and the change of variables law for Pdfs that I discussed here.
Consider two independent normally-distributed variables x and y. The joint PDF of the two is
As for gaussian integration, we can re-express this in polar co-ordinates remembering that
If we can simulate a value of r and theta using uniforms that obeys these densities, we will be able to transform them to normals x and y straightforwardly. There is radial symmetry so we can integrate over to give us the radial cumulative probability density (ie. radial CDF)
Unlike x or y, we can calculate this in closed form and invert it
This quantile function allows us to select a point with the correct radial distribution using a first uniform variate, and we then need to select a theta from a uniform distribution around angles – this can be done by multiplying a second uniform variate by [nb. we can simplify the radial quantile further – note that if u is uniformly distributed across [0,1], then so is (1-u), so we can substitute u for (1-u)]. The values of normally distributed variates are given by the x- and y-coordinates of the resulting point, ie.
This is a really useful trick, quickly converting two uniform variates into two gaussian variates. We can improve the speed still further if we can get rid of the computationally expensive sine and cosine functions, for which there is another nifty trick using rejection. In the final step, we inserted two uniform variates to produce and . We can use the following transformation to turn our initial uniform variates into two new variates that avoid the need for a sine or cosine:
Instead of taking and , consider instead two uniform variables and across [-1,1]. Let , and if s > 1 then reject the values and start again. The variables now lie within the unit sphere as shown below:
We can use change of variables to show that after rejecting external points, both theta and s are also uniform on [0,1]:
Above, we used and to generate a value of r and angle, but since the s and that we’ve generated here are also uniform on [0,1] they can be used just as well. Because of our definitions, , so new expressions are:
We’ve got an even more efficient set of formulae, at the cost of having to throw out around 20% of our values. It will turn out that this is often quicker than the basic version of the algorithm, but is totally useless if we move to pseudo-random numbers, as I will discuss another time.
The final technique I want to talk about is a generalisation of acceptance-rejection, which we’ve just touched on and was also used implicitly in my first post on Monte Carlo.
The idea here is once again to simulate a distribution f(x) by sampling instead from another distribution g(x) that overlaps it everywhere (we can rescale g(x) by a constant A to ensure this), and rejecting values that fall inside A.g(x) but outside of f(x). This is usually used when we can simulate g(x) directly from uniforms by using its inverse CDF – perhaps the best way to demonstrate how this works is through another example. We let f(x) be the normal PDF, and g(x) the exponential distribution, such that
This only has support on positive x, so we redefine it to cover positive and negative x, and rescale it so that it is everywhere larger than the normal pdf
and calculating the inverse CDF gives
We generate two independent uniform draws and . The first is used to simulate a variate distributed according to A.g(x) via the inverse CDF given here. Then we compare the values of A.g(x) at that point to the value of f(x) evaluated at that point. The second uniform variate is compared to the ratio of the two – if it is GREATER than the ratio, our point has fallen ‘outside’ of f(x), so it is rejected and we start again. If it is LOWER than the ratio, we accept the value of as a realisation of our target distribution f(x).
An obvious problem with this technique is that if the distributions don’t overlap well, many variates will be rejected, so we need a good uniform generator to provide enough independent variables for the algorithm to work.
The algorithm can be made arbitrarily complex by designing trial distributions g(x) that match f(x) as closely as possible, such as the ‘Ziggaraut algorithm’ which optimises the approach by breaking g(x) into many parts matched to different sections of f(x), which and is often used to generate normal random variates. As with the Marsenne Twister, there are many good guides to this algorithm available online.
One of the nice things about this job is the variety. There’s lots of maths, but the topics aren’t limited to set areas. So one day I might be working with stochastic calculus or statistics, and the next I have to delve into numerical techniques to solve a problem and then code it up.
I hinted about this briefly in my last post about implied vol. In order to do this, I had to solve an equation of the form
by varying sigma, and because the form of C in general isn’t tractable, numerical techniques must be used.
The easiest solver to implement is a bisection solver. This needs to be given two points on either side of the root, so that is negative and is positive (for vols, sensible values might be = 0.01% and = 200%). Then it simply takes the mid-point of these two (call it ) and evaluates the function there. If it’s positive, becomes the new , and if negative becomes the new . At each iteration this halves the remaining interval between the values. It’s very robust and will always home in to the root, but is considered to be relatively slow. The difference between the n-th value and the true value obeys
where b and a are the initial values. If we want this difference to be less than we have
since . This means that for each additional decimal point of accuracy, a further 3.32 iterations will be needed on average.
Faster ways are less reliable. For example, the secant method is a finite difference version of Newton-Raphson. The first points chosen don’t need to be on opposite sides of the root, but they should be near to the root. The expression used for successive approximations to the root is
This is an interpolation technique, it works roughly as shown in the figures below. It’s speed of convergence is much faster than the bisection technique when close to the root, but like Newton-Raphson, it is not guaranteed to converge. This isn’t much use to us – we’d like to combine the benefits of both techniques.
The industry standard technique for this is the Brent Method. The algorithm is a little complicated but it essentially combines these two approaches, using the secant method but checking that it performs better than the bisection method, and falling back on that if not. It also includes a few extra provisions for added speed and stability – once again, Wikipedia is a great reference for this. I’ve coded up a version of this for the pricer, and there’s a demonstration of how this outperforms the bisection method below. Choose any of the functions in the list, enter some points and see how many iterations each one takes to find the root!
As discussed in my last post, I had some issues with an implementation of the standard normal cdf function, which I address here. This post will be my first test of the equation editor that I’ve just installed!
The standard normal cdf looks like this:
I had a few thoughts as to how to go about this. It’s symmetrical around x = 0, so we only need to worry about solving for positive x, and then we can calculate negative x values by reflection. Possibilities were:
The problem with this expression is that the terms oscillate between positive and negative, which means that for large enough x it will explode to either very big positive or negative values. We could include a fixing point and build in some sort of curve above a certain x, but I’m looking for a fairly robust solution that doesn’t involve too much testing for the moment, and this sounds like a lot of work!
Again, this solution would be a lot of work to set up, but would have a guaranteed level of accuracy and would find the answer in a fixed amount of time. A vector could be set up containing the value of x that gives cdf(x) = 0.5+n*delta; delta might be one millionth or so, the array would have 2^19 members and a binary search would locate the closest value to x in approximately 18 operations (assuming constant time to check a vector). The index of this value would then give the value of the cdf, and an interpolation scheme between adjacent values could improve accuracy further.
Pros of this solution are efficiency and accuracy, but there would be a large setup cost and possibly memory cost to load up the look up table. If we had to run the function inside a Monte Carlo regime or numerical integration, this might be justified by the runtime savings, but for the moment it’s a bit much work!
To get the pricer described in my last post running, I implemented the following approximation to the error function:
Well, it got the pricer working but turns out to be not too accurate. For example, I got this running on excel and compared it to the inbuild NORMSDIST implementation, for some typical numbers (forward = 100, strike = 90, vol=10%, dcf = 1, time = 1). For NORMSDIST(d1) in excel, I get 0.86512, while for my Phi(d1) I get 0.86489, so the error is in the 4th sf. This leads to a price difference of 10.7124 for excel and 10.7095 for my implementation, and I’d like to do a bit better. Since I’d ruled out the possibilities described above, I went back to wikipedia and implemented the following version of the cdf, which gave a much more agreeable price (similar to excel’s now to 7sf):
It’s worth noting that although this implementation seems to give better accuracy for most positive x, it relies on some library functions (square root and exponential) itself, so we’re leaving ourselves slightly at the mercy of how those are programmed. It’s also going to take much longer than the previous implementation, which only used polynomial arithmetic. Since we only need a couple of evaluations in a vanilla pricer, it’s not a big concern, but for applications that call it repeatedly we might want to do better.