Improving Monte Carlo: Control Variates

I’ve already discussed quite a lot about Monte Carlo in quantitative finance. MC can be used to value products for which an analytical price is not available in a given model, which includes most exotic derivatives in most models. However, two big problems are the time that it takes to run and the ‘Monte Carlo error’ in results.

One technique for improving MC is to use a ‘control variate’. The idea is to find a product whose price is strongly correlated to the product that we’re trying to price, but which is more easy to calculate or which we already know the price of. When we simulate a path in MC, it will almost surely give either an under-estimate or an over-estimate of the true price, but we don’t know which, and averaging all of these errors is what leads to the Monte Carlo error in the final result. The insight in the control variate technique is to use the knowledge given to us by the control variate to reduce this error. If the two prices are strongly correlated and a path produces an over-estimate of product price, it most likely also produces an over-estimate of the control variate and visa versa, which will allow us to improve our estimate of the product we’re trying to price.

The textbook example is the Asian Option. Although the arithmetic version of the asian option discussed in previous posts has no analytic expression in BS, a similar Geometric asian option does have an analytic price. So, for a given set of model parameters, we can calculate the price of the option. As a reminder, the payoff of an arithmetic asian option at expiry is

    \[C_{\rm arit}(T) = \Bigl({1\over N}\sum_{i=0}^{N-1} S(t_i) - K \Bigr)^+\]

and the payoff of the related geometric averaging asian is

    \[C_{\rm geo}(T) = \Bigl( \bigl(\prod_{i=0}^{N-1} S(t_i)\bigr)^{1\over N} - K \Bigr)^+\]

Denoting the price of the arithmetic option as X and the geometric option as Y, the traditional monte carlo approach is to generate N paths, and for each one calculate X_i (the realisation of the payoff along the path) and take the average over all paths, so that

    \[{\mathbb E}[X] = {1 \over N} \sum_{i=0}^{N-1} X_i\]

which will get closer to the true price as N \to \infty.

Using Y as a control variate, we instead calculate

    \[{\mathbb E}[X] = {1\over N} \sum_{i=0}^{N-1}\bigl( X_i - \lambda( Y_i - {\mathbb E}[Y] ) \bigr)\]

where {\mathbb E}[Y] is the price of the geometric option known from the analytical expression, and \lambda is a constant (in this case we will set it to 1).

What do we gain from this? Well, consider the variance of X_i - \lambda (Y_i - {\mathbb E } [Y])

    \[{\rm Var}\bigl( X_i - ( Y_i - {\mathbb E}[Y] ) \bigr) = {\rm Var}( X_i ) + \lambda^2 {\rm Var} ( Y_i ) - 2 \lambda {\rm Cov}( X_i Y_i )\]

(since {\mathbb E } [Y] is known, so has zero variance) which is minimal for \lambda = \sqrt{ {\rm Var}(X_i) \over {\rm Var}(Y_i) }\cdot \rho(X_i,Y_i) in which case

    \[{ {\rm Var}\bigl( X_i - ( Y_i - {\rm E}[Y] ) \bigr) \over {\rm Var}( X_i )} = 1 - \rho(X_i,Y_i)^2\]

that is, if the two prices are strongly correlated, the variance of the price calculated using the control variate will be a significantly smaller. I’ve plotted a sketch of the prices of the two types of average for 100 paths – the correlation is about 99.98%. Consequently, we expect to see a reduction in variance of about 2000 times for a given number of paths (although we now have to do a little more work on each path, as we need to calculate the geometric average as well as the arithmetic average of spots). This is roughly 45 times smaller standard error on pricing – well over an extra decimal place, which isn’t bad – and this is certainly much easier than running 2000 times as many paths to achieve the same result.

The relationship between payout for a geometric and an arithmetic asian option, which here demonstrate a 99.98% sample correlation
The relationship between payout for a geometric and an arithmetic asian option, which here demonstrate a 99.98% sample correlation. Parameters used were: r: 3%; vol: 10%; K: 105; S(0): 100; averaging dates: monthly intervals for a year

Asian Options II: Monte Carlo

In a recent post, I introduced Asian options, an exotic option which pay the average of the underlying over a sequence of fixing dates.

The payoff of an Asian call is

C(T) = \Big({1 \over N}\sum^N_{i=0} S_{t_i} - K \Big)^+

and to price the option at an earlier time via risk-neutral valuation requires us to solve the integral

C(t) = P_{t,T}\ \big( \iint\cdots\int \big) \Big({1 \over N}\sum^N_{i=0} S_{t_i} - K \Big)^+ \phi(S_{t_1},S_{t_1},\cdots,S_{t_N}) dS_{t_1}dS_{t_2}\cdots dS_{t_N}

A simple way to do this is using Monte Carlo – we simulate a spot path drawn from the distribution \inline \phi(S_{t_1},S_{t_1},\cdots,S_{t_N}) and evaluate the payoff at expiry of that path, then average over a large number of paths to get an estimate of C(t).

It’s particularly easy to do this in Black-Scholes, all we have to do is simulate N gaussian variables, and evolve the underlying according to a geometric brownian motion.

I’ve updated the C++ and Excel Monte Carlo pricer on the blog to be able to price asian puts and calls by Monte Carlo – have a go and see how they behave relative to vanilla options. One subtlety is that we can no longer input a single expiry date, we now need an array of dates as our function input. If you have a look in the code, you’ll see that I’ve adjusted the function optionPricer to take a datatype MyArray, which is the default array used by XLW (it won’t be happy if you tell it to take std::vector). This array can be entered as a row or column of excel cells, and should be the dates, expressed as years from the present date, to average over.

Inverse Normal CDF

Now that I’ve got some Monte Carlo code up, it’s inevitable that I will eventually need an implementation of the Inverse of the Normal Cumulative Density Function (CDF). The inverse of a CDF is called a Quantile function by the way, so I’ll often refer to this as the Normal Quantile function. The reason this is so important is because, as I discussed in my post on Random Number Generators, it can be used to convert uniformly distributed random numbers into normally distributed random numbers, which we will need for Monte Carlo simulations if we believe that underlyings are distributed according to a lognormal distribution.

As a reminder, I’ve repeated the graph from the previous post here to show the behaviour of the Normal Quantile:

The inverse CDF - called the quantile function
The standard Normal Quantile function. Simple transformations of the normal distribution will allow any non-standard normal quantile to be expressed using the standard version.

 

As with the implementation of an algorithm for the Normal CDF (discussed here), there were several possibilities for implementation. In the end I’ve decided to go with the analytic approximation due to Beasley, Springer and Moro as presented in Joshi’s text on computational derivative pricing:

a1 = 2.50662823884
a2 = -18.61500062529
a3 = 41.39119773534
a4 = -25.44106049637

b1 = -8.47351093090
b2 = 23.08336743743
b3 = -21.06224101826
b4 = 3.13082909833

c1 = 0.3374754822726147
c2 = 0.9761690190917186
c3 = 0.1607979714918209
c4 = 0.0276438810333863
c5 = 0.0038405729373609
c6= 0.0003951896511919
c7 = 0.0000321767881768
c8 = 0.0000002888167364
c9 = 0.0000003960315187

 

A polynomial form is used for the central region of the quantile, where \inline 0.08 < x < 0.92:

y = x - 0.5 \ ; \quad z = y^2

\Phi^{-1}(x) \simeq y \cdot { (a_1 + a_2 z + a_3 z^2 + a_4 z^3 ) \over (1 + b_1 z + b_2 z^2 + b_3 z^3 + b_4 z^4 ) }

 

And if \inline x \leq 0.08 or \inline x \geq 0.92 then:

y = x - 0.5 \ ; \quad z = \Bigl\{ \begin{matrix} x & y\leq 0\\ 1 - x & y > 0 \end{matrix} \ ; \quad \kappa = \ln (-\ln z)

\Phi^{-1}(x) = \pm (c_1 + c_2\kappa + c_3\kappa^2 + c_4\kappa^3 + c_5\kappa^4 + c_6\kappa^5 + c_7\kappa^6 + c_8\kappa^7 + c_9\kappa^8)

With a plus at the front if \inline y \geq 0 and a minus in front if \inline y < 0.

This is fairly swift as an algorithm, and the quantile method of generating normal variates will turn out to have some key advantages over the various other methods discussed before which will become apparent in future. In a future post I’ll show how to integrate this method of converting uniform variates to gaussian variates into the Monte Carlo code that I’ve made available here, thanks to the factory pattern that I’ve used it will turn out to be incredibly straight-forward!

Finally, if you are wondering why the numbers in these numerical algorithms seem so arbitrary, it’s because they [sort-of] are! Once I’ve implemented some optimisation routines (this might not be for a while…) I’ll have a look at how we can find our own expressions for these functions, and hopefully be able to back out some of the numbers above!!

Random Number Generators

I’m in the process of preparing some C++ code for the blog to allow user-downloadable and -compilable Monte Carlo pricers that won’t overload my server. That’s got me thinking about random number generators, and I’m going to talk a bit about them here [In fact, all of the algorithms discussed below are entirely deterministic, so I should really call them pseudo-random number generators. Whether or not particular algorithms are suitable for use in applications requiring random numbers is an area of active research and occasional catastrophic failure].

Uniform RNGs

Generally, in Monte Carlo simulations we want to be able to generate many ‘sample paths’, each of which is a realisation of a possible evolution of the underlying price, and evaluate the payoff for that path. As I discussed in an earlier post, we usually model these quantities as depending on brownian motions and results are normally distributed, which means we need to generate random variables that are also normally distributed (so, for example, we are more likely to get values near to zero than at large values).

Generating these directly is quite tricky, so the first step is usually to generate uniform random variables and then transform these to gaussian variates. A uniform random variable is a variable with a constant probability of being anywhere within a certain range, typically [0,1] or [-1,1] – for the rest of this post, I denote a uniform variable as U[…], and define u~[0,1] and v~[-1,1]. The pdf of u and v are shown here:

The Pdfs of two uniform random variables
The Pdfs of two uniform random variables, over ranges [0,1] and [-1,1]

A straight-forward method of generating these numbers uses modular multiplication [this is variously called a Park-Miller generator, a linear congruential generator and a Lehmer generator]:

x_{k+1} = g \cdot x_k \mod n

For good choices of the variables, this will generate a sequence with a period of n that is uniformly distributed across the interval [0,n], so dividing by n rescales the variables to be distributed roughly according to u. The choices of g, n and \inline x_0 (the ‘seed’) are non-trivial and several are given in the literature, as usual I’ve followed the advice given on wikipedia and lumped for n = 2^32 – 5, g = 279470273, and the choice of seed is left to the user but should be coprime to modulus n. Note that this will have a period of about 2^32, so the upper limit on the number of monte-carlo evaluations that can meaningfully be done is around 4 billion steps. It is also easy to see how the generator could be hampered by a poor choice of parameters – if the multiplier and the modulus are not co-prime, for example, the actual period of the sequence can be much less than n.

The more advanced Marsenne Twister algorithm aims to correct many of the problems with the Park-Miller generator, such as increasing the period length of the sequence and improving the statistical randomness of the values produced. It’s too complicated to go through here (it’s a full post in itself!) in full detail but many good summaries of the theory and performance of the twister algorithm can be found online.

One other thing to point out is that if initilised in the same way, each time one of the above algorithms is run it will return the same sequence of numbers. At first this might seem like an unwanted property as it deviates from ‘true’ randomness. However, in our context it will be extremely valuable as it allows us to test code using the same sequence of numbers each time. This is vital for calculating greeks by bump-and-revalue using monte carlo as otherwise monte carlo error in price will usually swamp small changes due to greeks, and it also helpful when optimising code to make sure that results aren’t being affected.

Gaussian RNGs

Once we have a source of uniform random variates, we need to consider methods for converting them to gaussian variates. The simplest and most direct is the inversion method, which involves finding a mapping between uniform random variate u and the standard normal random variate z. Consider first the CDF of the normal variable (I have discussed this before here), shown below:

Cdf of the normal distribution
CDF of the normal distribution, which gives the probability of the variable being less-than-or-equal-to the value of the CDF at each point

This shows us the likelihood that z is below a certain value X, and is equal to the integral of the PDF from \inline -\infty to X. Because the PDF is normalised to integrate to 1, the CDF maps each possible outcome of z to a value between 0 and 1, with the most likely outcomes being where the CDF is steepest (since these cover the y-axis values more quickly) which is exactly where the PDF is largest. This is exactly the opposite of what we want, so we consider instead the inverse CDF shown here:

The inverse CDF - called the quantile function
The inverse CDF – called the quantile function – of the normal distribution (as usual, the inverse function is the function reflected in the line x=y). This function allows us to map a uniform variate onto the distribution in question. For example, the graph shows a uniform variate draw of 0.8 would map to a value of z = 0.8416 for z distributed according to the standard normal distribution

This mapping is exactly what we’re after [although there are others – it’s not unique]. It maps uniformly distributed variates to normally distributed variates and they’re in a 1-to-1 correspondence. This procedure is robust and in principle it is exact. However, it is not always the fastest method, and it requires us to know the inverse CDF, which to evaluate in a reasonable time requires us to make approximations (this technique works for any distribution – although there is no closed form CDF for the normal distribution, for many other distributions the inverse CDF can be expressed in closed form, in which case this is almost certainly the best method).

There are several other techniques for obtaining normal distributions from uniform variates, a few of which I have implemented and describe them here because they’re quite interesting. The first is called the Box-Muller technique, which converts two independent uniform variates into two independent normal variates by combining the inversion technique described above, the trick that we use to calculate a gaussian integral and the change of variables law for Pdfs that I discussed here.

Consider two independent normally-distributed variables x and y. The joint PDF of the two is

p(x,y)dxdy = p_z(x)p_z(y)dxdy = {1 \over 2 \pi}\exp{ \Bigl( -{1 \over 2}(x^2 + y^2)\Bigr) }dxdy

As for gaussian integration, we can re-express this in polar co-ordinates remembering that \inline dxdy = rdrd\theta

{1 \over 2 \pi}\exp{ \Bigl( -{1 \over 2}(x^2 + y^2)\Bigr) }dxdy = {1 \over 2 \pi}e^{-{1\over 2}r^2} rdrd\theta = p_R(r)p_{\theta}(\theta)rdrd\theta

If we can simulate a value of r and theta using uniforms that obeys these densities, we will be able to transform them to normals x and y straightforwardly. There is radial symmetry so we can integrate over \inline d\theta to give us the radial cumulative probability density (ie. radial CDF)

P(r\leq R) = \int^R_0 r' e^{-{1\over 2}r'^2} dr'

Unlike x or y, we can calculate this in closed form and invert it

\int^R_0 r' e^{-{1\over 2}r'^2} dr' = \Bigl[ -e^{-{1\over 2}r'^2} \Bigr]^R_0 = 1 - e^{-{1\over 2}R^2}

P^{-1}(u) = \sqrt{-2 \ln(1 - u)}

This quantile function allows us to select a point with the correct radial distribution using a first uniform variate, and we then need to select a theta from a uniform distribution around angles – this can be done by multiplying a second uniform variate by \inline 2\pi [nb. we can simplify the radial quantile further – note that if u is uniformly distributed across [0,1], then so is (1-u), so we can substitute u for (1-u)]. The values of normally distributed variates are given by the x- and y-coordinates of the resulting point, ie.

z_1 = \sqrt{-2 \ln u_1}\cdot \sin(2\pi u_2)

z_2 = \sqrt{-2 \ln u_1}\cdot \cos(2\pi u_2)

This is a really useful trick, quickly converting two uniform variates into two gaussian variates. We can improve the speed still further if we can get rid of the computationally expensive sine and cosine functions, for which there is another nifty trick using rejection. In the final step, we inserted two uniform variates to produce \inline z_1 and \inline z_2. We can use the following transformation to turn our initial uniform variates into two new variates that avoid the need for a sine or cosine:

Instead of taking \inline u_1 and \inline u_2, consider instead two uniform variables \inline v_1 and \inline v_2 across [-1,1]. Let \inline s = r^2 = v_1^2 + v_2^2, and if s > 1 then reject the values and start again. The variables now lie within the unit sphere as shown below:

Polar Box Muller procedure
Polar Box Muller procedure. Two alternative values s and theta are generated from the initial random variables U1 and U2. If these fall inside the unit circle, it can be shown that they are also uniformly distributed, so are equally suitable for use in the Box-Muller formulae. The advantage is algebraic simplicity – because theta is an angle, the ratio of s and the initial variates is already a sine or cosine, so we don’t need to calculate these explicitly

We can use change of variables to show that after rejecting external points, both theta and s are also uniform on [0,1]:

p(s)ds = p(r)rdr = 2rdr

p(s) = 2r{dr \over ds} = 2r \cdot {1 \over 2r} = 1

Above, we used \inline u_1 and \inline u_2 to generate a value of r and angle, but since the s and \inline \theta that we’ve generated here are also uniform on [0,1] they can be used just as well. Because of our definitions, \inline \sin(2\pi \theta) = {v_1 \over r}, so new expressions are:

z_1 = \sqrt{-2 \ln s}\cdot {v_1 \over r}

z_2 = \sqrt{-2 \ln s}\cdot {v_2 \over r}

We’ve got an even more efficient set of formulae, at the cost of having to throw out around 20% of our values. It will turn out that this is often quicker than the basic version of the algorithm, but is totally useless if we move to pseudo-random numbers, as I will discuss another time.

The final technique I want to talk about is a generalisation of acceptance-rejection, which we’ve just touched on and was also used implicitly in my first post on Monte Carlo.

The idea here is once again to simulate a distribution f(x) by sampling instead from another distribution g(x) that overlaps it everywhere (we can rescale g(x) by a constant A to ensure this), and rejecting values that fall inside A.g(x) but outside of f(x). This is usually used when we can simulate g(x) directly from uniforms by using its inverse CDF – perhaps the best way to demonstrate how this works is through another example. We let f(x) be the normal PDF, and g(x) the exponential distribution, such that

g(x) = \lambda e^{-\lambda x}

This only has support on positive x, so we redefine it to cover positive and negative x, and rescale it so that it is everywhere larger than the normal pdf

A\cdot g(x) = e^1 \cdot e^{-|x|}

and calculating the inverse CDF gives

G^{-1}(u) = \begin{matrix} -\ln{2u} & 0.5 \leq u < 1 \\ \ln{2u} & 0 \leq u < 0.5 \end{matrix}

We generate two independent uniform draws \inline u_1 and \inline u_2. The first is used to simulate a variate distributed according to A.g(x) via the inverse CDF given here. Then we compare the values of A.g(x) at that point to the value of f(x) evaluated at that point. The second uniform variate is compared to the ratio of the two – if it is GREATER than the ratio, our point has fallen ‘outside’ of f(x), so it is rejected and we start again. If it is LOWER than the ratio, we accept the value of \inline G^{-1}(u) as a realisation of our target distribution f(x).

An example of Acceptance Sampling
An example of Acceptance Sampling. Because of the form of g(x), we are able to write the inverse CDF in closed form and use it to generate variates distributed according to g(x) from uniform variates. For each variable, we compare the ratio of f(x) and g(x), and take another uniform variate to compare to. If it is greater than the ratio, all of the variables are discarded, while if it is less, we include the original value taken from g(x) in our set of random numbers. This set is distributed according to f(x), but we may need to sample many times if the functions f(x) and g(x) aren’t similar (in which case we will have numerous rejections)

An obvious problem with this technique is that if the distributions don’t overlap well, many variates will be rejected, so we need a good uniform generator to provide enough independent variables for the algorithm to work.

The algorithm can be made arbitrarily complex by designing trial distributions g(x) that match f(x) as closely as possible, such as the ‘Ziggaraut algorithm’ which optimises the approach by breaking g(x) into many parts matched to different sections of f(x), which and is often used to generate normal random variates. As with the Marsenne Twister, there are many good guides to this algorithm available online.

Root Finders

One of the nice things about this job is the variety. There’s lots of maths, but the topics aren’t limited to set areas. So one day I might be working with stochastic calculus or statistics, and the next I have to delve into numerical techniques to solve a problem and then code it up.



I hinted about this briefly in my last post about implied vol. In order to do this, I had to solve an equation of the form

C_{BS}}({\rm Fwd},K,\sigma,t,\phi) - {\rm Market\_ Price} = 0

by varying sigma, and because the form of C in general isn’t tractable, numerical techniques must be used.

The easiest solver to implement is a bisection solver. This needs to be given two points on either side of the root, so that C_{BS}(...,\sigma_1,...) is negative and C_{BS}(...,\sigma_2,...) is positive (for vols, sensible values might be \inline \sigma_1 = 0.01% and \inline \sigma_2 = 200%). Then it simply takes the mid-point of these two (call it \inline \sigma_3) and evaluates the function there. If it’s positive, \inline \sigma_3 becomes the new \inline \sigma_2, and if negative \inline \sigma_3 becomes the new \inline \sigma_1. At each iteration this halves the remaining interval between the values. It’s very robust and will always home in to the root, but is considered to be relatively slow. The difference between the n-th value \inline c_n and the true value \inline c obeys

| c_n - c | \leq {| b - a| \over 2^n}

where b and a are the initial values. If we want this difference to be less than \inline 1\times 10^d we have

1\times 10^d \leq {|b-a|\over 2^n}

n \geq \ln_2 |b-a| + 3.32\cdot d

since \inline \ln_2 10 \simeq 3.32. This means that for each additional decimal point of accuracy, a further 3.32 iterations will be needed on average.

Faster ways are less reliable. For example, the secant method is a finite difference version of Newton-Raphson. The first points chosen don’t need to be on opposite sides of the root, but they should be near to the root. The expression used for successive approximations to the root is

x_n = x_{n-1} - f(x_{n-1}){x_{n-1}-x_{n-2} \over f(x_{n-1}) - f(x_{n-2})}

This is an interpolation technique, it works roughly as shown in the figures below. It’s speed of convergence is much faster than the bisection technique when close to the root, but like Newton-Raphson, it is not guaranteed to converge. This isn’t much use to us – we’d like to combine the benefits of both techniques.

A successful setup for the secant method
Here we can see the secant method working. The first step interpolates between the function values f(x1) and f(x2) and finds a better approximation x3. The next step intertpolates f(x2) and f(x3), leading to a new approximation x4 outside the interval [x2,x3]. From this point the procedure proceeds rapidly – it takes 14 steps to achieve 16 d.pds of accuracy (cf. about 55 steps for similar accuracy from the bisection method).
Here we see the secant method failing, due to the curvature of the function and the poor initial choice of x1 and x2. f(x2) and f(x3) are interpolated and lead to a chord that may or may not cross the function (depending on its behaviour at negative x). Clearly secant is not robust to poor choices of initial values or pathological function behaviour.

The industry standard technique for this is the Brent Method. The algorithm is a little complicated but it essentially combines these two approaches, using the secant method but checking that it performs better than the bisection method, and falling back on that if not. It also includes a few extra provisions for added speed and stability – once again, Wikipedia is a great reference for this. I’ve coded up a version of this for the pricer, and there’s a demonstration of how this outperforms the bisection method below. Choose any of the functions in the list, enter some points and see how many iterations each one takes to find the root!

Just a brief note on the coding as well, since I said I’d cover everything here. I’m using javascript at the moment, mostly for speed. Javascript is very forgiving (although consequently a pain to debug!), it allows variables to be defined as functions and passed around as parameters. I created a function

 BrentSolver( functionToSolve, lowerBound, upperBound, tolerance )

which takes a function of one argument functionToSolve(z) that returns the value of the function at z; and finds a root of that function between the lower and upper bounds. Usually we will need this function to have more than one parameter, but again javascript is kind to us – in the block of code that calls it, we call the parameters we need, and define a new function dynamically:

price = 5;
fwd = 100;
strike = 100;
time = 1;
optionStyle = -1;

functionToSolve = function ( vol ) {
  return vanillaPrice( fwd, strike, vol, time, optionStyle ) - price;
}

brentSolver( functionToSolve, 0.001, 200, 0.00000001 );

Much easier than in C++!! Below is a basic doodle to let you try out the different solvers on fairly well-known functions – have a play and see what happens.

Function: 

Root Finder: 

Lower Bound:
Upper Bound:
Accuracy:

Root:
# Iterations:

Cumulative Normal Distribution

As discussed in my last post, I had some issues with an implementation of the standard normal cdf function, which I address here. This post will be my first test of the equation editor that I’ve just installed!

The standard normal cdf looks like this:

\Phi(x) = {1 \over \sqrt{2\pi}}\int ^{x}_{-\infty} e^{-{1 \over 2} y^2 } dy

I had a few thoughts as to how to go about this. It’s symmetrical around x = 0, so we only need to worry about solving for positive x, and then we can calculate negative x values by reflection. Possibilities were:

  • Series Approximation

e^x = \sum_{n = 0}^{\infty} {1 \over n!} x^n

{1\over \sqrt{2\pi}}\int_{-\infty}^x e^{-{1 \over 2}y^2} dy = {1 \over 2} + \sum_{n = 0}^{\infty} \int_{0}^x {-1^n \over 2^n n!} {y^{2n}} dy = {1 \over 2} + \sum_{n = 0}^{\infty} {-1^n \over (2n+1) 2^n n!} {x^{2n+1}}

The problem with this expression is that the terms oscillate between positive and negative, which means that for large enough x it will explode to either very big positive or negative values. We could include a fixing point and build in some sort of curve above a certain x, but I’m looking for a fairly robust solution that doesn’t involve too much testing for the moment, and this sounds like a lot of work!

  • Lookup Table

Again, this solution would be a lot of work to set up, but would have a guaranteed level of accuracy and would find the answer in a fixed amount of time. A vector could be set up containing the value of x that gives cdf(x) = 0.5+n*delta; delta might be one millionth or so, the array would have 2^19 members and a binary search would locate the closest value to x in approximately 18 operations (assuming constant time to check a vector). The index of this value would then give the value of the cdf, and an interpolation scheme between adjacent values could improve accuracy further.

Pros of this solution are efficiency and accuracy, but there would be a large setup cost and possibly memory cost to load up the look up table. If we had to run the function inside a Monte Carlo regime or numerical integration, this might be justified by the runtime savings, but for the moment it’s a bit much work!

  • Analytic Approximation

To get the pricer described in my last post running, I implemented the following approximation to the error function:

a1 = 0.278393
a2 = 0.230389
a3 = 0.000972
a4 = 0.078108

{\rm erf}(x) = 1 - {1 \over (a_1x + a_2 x^2 + a_3 x^3 + a_4 x^4)^4}

\Phi(x) = 0.5\Bigl[ 1 + {\rm erf}\Bigl({x \over \sqrt{2}}\Bigr) \Bigr]

Well, it got the pricer working but turns out to be not too accurate. For example, I got this running on excel and compared it to the inbuild NORMSDIST implementation, for some typical numbers (forward = 100, strike = 90, vol=10%, dcf = 1, time = 1). For NORMSDIST(d1) in excel, I get 0.86512, while for my Phi(d1) I get 0.86489, so the error is in the 4th sf. This leads to a price difference of 10.7124 for excel and 10.7095 for my implementation, and I’d like to do a bit better. Since I’d ruled out the possibilities described above, I went back to wikipedia and implemented the following version of the cdf, which gave a much more agreeable price (similar to excel’s now to 7sf):

b0 = 0.2316419;
b1 = 0.31938153;
b2 = -0.356563782;
b3 = 1.781477937;
b4 = -1.821255978;
b5 = 1.330274429;

t = {1 \over 1+b_0 x}

\phi(x) = {1 \over \sqrt{2\pi}} e^{-{1 \over 2}x^2}

\Phi(x) = 1 - \phi(x)\cdot(b_1t + b_2t^2 +b_3t^3+b_4t^4+b_5t^5)

It’s worth noting that although this implementation seems to give better accuracy for most positive x, it relies on some library functions (square root and exponential) itself, so we’re leaving ourselves slightly at the mercy of how those are programmed. It’s also going to take much longer than the previous implementation, which only used polynomial arithmetic. Since we only need a couple of evaluations in a vanilla pricer, it’s not a big concern, but for applications that call it repeatedly we might want to do better.

That’s all for today!

-QuantoDrifer