Random Number Generators

I’m in the process of preparing some C++ code for the blog to allow user-downloadable and -compilable Monte Carlo pricers that won’t overload my server. That’s got me thinking about random number generators, and I’m going to talk a bit about them here [In fact, all of the algorithms discussed below are entirely deterministic, so I should really call them pseudo-random number generators. Whether or not particular algorithms are suitable for use in applications requiring random numbers is an area of active research and occasional catastrophic failure].

Uniform RNGs

Generally, in Monte Carlo simulations we want to be able to generate many ‘sample paths’, each of which is a realisation of a possible evolution of the underlying price, and evaluate the payoff for that path. As I discussed in an earlier post, we usually model these quantities as depending on brownian motions and results are normally distributed, which means we need to generate random variables that are also normally distributed (so, for example, we are more likely to get values near to zero than at large values).

Generating these directly is quite tricky, so the first step is usually to generate uniform random variables and then transform these to gaussian variates. A uniform random variable is a variable with a constant probability of being anywhere within a certain range, typically [0,1] or [-1,1] – for the rest of this post, I denote a uniform variable as U[…], and define u~[0,1] and v~[-1,1]. The pdf of u and v are shown here:

The Pdfs of two uniform random variables
The Pdfs of two uniform random variables, over ranges [0,1] and [-1,1]

A straight-forward method of generating these numbers uses modular multiplication [this is variously called a Park-Miller generator, a linear congruential generator and a Lehmer generator]:

x_{k+1} = g \cdot x_k \mod n

For good choices of the variables, this will generate a sequence with a period of n that is uniformly distributed across the interval [0,n], so dividing by n rescales the variables to be distributed roughly according to u. The choices of g, n and \inline x_0 (the ‘seed’) are non-trivial and several are given in the literature, as usual I’ve followed the advice given on wikipedia and lumped for n = 2^32 – 5, g = 279470273, and the choice of seed is left to the user but should be coprime to modulus n. Note that this will have a period of about 2^32, so the upper limit on the number of monte-carlo evaluations that can meaningfully be done is around 4 billion steps. It is also easy to see how the generator could be hampered by a poor choice of parameters – if the multiplier and the modulus are not co-prime, for example, the actual period of the sequence can be much less than n.

The more advanced Marsenne Twister algorithm aims to correct many of the problems with the Park-Miller generator, such as increasing the period length of the sequence and improving the statistical randomness of the values produced. It’s too complicated to go through here (it’s a full post in itself!) in full detail but many good summaries of the theory and performance of the twister algorithm can be found online.

One other thing to point out is that if initilised in the same way, each time one of the above algorithms is run it will return the same sequence of numbers. At first this might seem like an unwanted property as it deviates from ‘true’ randomness. However, in our context it will be extremely valuable as it allows us to test code using the same sequence of numbers each time. This is vital for calculating greeks by bump-and-revalue using monte carlo as otherwise monte carlo error in price will usually swamp small changes due to greeks, and it also helpful when optimising code to make sure that results aren’t being affected.

Gaussian RNGs

Once we have a source of uniform random variates, we need to consider methods for converting them to gaussian variates. The simplest and most direct is the inversion method, which involves finding a mapping between uniform random variate u and the standard normal random variate z. Consider first the CDF of the normal variable (I have discussed this before here), shown below:

Cdf of the normal distribution
CDF of the normal distribution, which gives the probability of the variable being less-than-or-equal-to the value of the CDF at each point

This shows us the likelihood that z is below a certain value X, and is equal to the integral of the PDF from \inline -\infty to X. Because the PDF is normalised to integrate to 1, the CDF maps each possible outcome of z to a value between 0 and 1, with the most likely outcomes being where the CDF is steepest (since these cover the y-axis values more quickly) which is exactly where the PDF is largest. This is exactly the opposite of what we want, so we consider instead the inverse CDF shown here:

The inverse CDF - called the quantile function
The inverse CDF – called the quantile function – of the normal distribution (as usual, the inverse function is the function reflected in the line x=y). This function allows us to map a uniform variate onto the distribution in question. For example, the graph shows a uniform variate draw of 0.8 would map to a value of z = 0.8416 for z distributed according to the standard normal distribution

This mapping is exactly what we’re after [although there are others – it’s not unique]. It maps uniformly distributed variates to normally distributed variates and they’re in a 1-to-1 correspondence. This procedure is robust and in principle it is exact. However, it is not always the fastest method, and it requires us to know the inverse CDF, which to evaluate in a reasonable time requires us to make approximations (this technique works for any distribution – although there is no closed form CDF for the normal distribution, for many other distributions the inverse CDF can be expressed in closed form, in which case this is almost certainly the best method).

There are several other techniques for obtaining normal distributions from uniform variates, a few of which I have implemented and describe them here because they’re quite interesting. The first is called the Box-Muller technique, which converts two independent uniform variates into two independent normal variates by combining the inversion technique described above, the trick that we use to calculate a gaussian integral and the change of variables law for Pdfs that I discussed here.

Consider two independent normally-distributed variables x and y. The joint PDF of the two is

p(x,y)dxdy = p_z(x)p_z(y)dxdy = {1 \over 2 \pi}\exp{ \Bigl( -{1 \over 2}(x^2 + y^2)\Bigr) }dxdy

As for gaussian integration, we can re-express this in polar co-ordinates remembering that \inline dxdy = rdrd\theta

{1 \over 2 \pi}\exp{ \Bigl( -{1 \over 2}(x^2 + y^2)\Bigr) }dxdy = {1 \over 2 \pi}e^{-{1\over 2}r^2} rdrd\theta = p_R(r)p_{\theta}(\theta)rdrd\theta

If we can simulate a value of r and theta using uniforms that obeys these densities, we will be able to transform them to normals x and y straightforwardly. There is radial symmetry so we can integrate over \inline d\theta to give us the radial cumulative probability density (ie. radial CDF)

P(r\leq R) = \int^R_0 r' e^{-{1\over 2}r'^2} dr'

Unlike x or y, we can calculate this in closed form and invert it

\int^R_0 r' e^{-{1\over 2}r'^2} dr' = \Bigl[ -e^{-{1\over 2}r'^2} \Bigr]^R_0 = 1 - e^{-{1\over 2}R^2}

P^{-1}(u) = \sqrt{-2 \ln(1 - u)}

This quantile function allows us to select a point with the correct radial distribution using a first uniform variate, and we then need to select a theta from a uniform distribution around angles – this can be done by multiplying a second uniform variate by \inline 2\pi [nb. we can simplify the radial quantile further – note that if u is uniformly distributed across [0,1], then so is (1-u), so we can substitute u for (1-u)]. The values of normally distributed variates are given by the x- and y-coordinates of the resulting point, ie.

z_1 = \sqrt{-2 \ln u_1}\cdot \sin(2\pi u_2)

z_2 = \sqrt{-2 \ln u_1}\cdot \cos(2\pi u_2)

This is a really useful trick, quickly converting two uniform variates into two gaussian variates. We can improve the speed still further if we can get rid of the computationally expensive sine and cosine functions, for which there is another nifty trick using rejection. In the final step, we inserted two uniform variates to produce \inline z_1 and \inline z_2. We can use the following transformation to turn our initial uniform variates into two new variates that avoid the need for a sine or cosine:

Instead of taking \inline u_1 and \inline u_2, consider instead two uniform variables \inline v_1 and \inline v_2 across [-1,1]. Let \inline s = r^2 = v_1^2 + v_2^2, and if s > 1 then reject the values and start again. The variables now lie within the unit sphere as shown below:

Polar Box Muller procedure
Polar Box Muller procedure. Two alternative values s and theta are generated from the initial random variables U1 and U2. If these fall inside the unit circle, it can be shown that they are also uniformly distributed, so are equally suitable for use in the Box-Muller formulae. The advantage is algebraic simplicity – because theta is an angle, the ratio of s and the initial variates is already a sine or cosine, so we don’t need to calculate these explicitly

We can use change of variables to show that after rejecting external points, both theta and s are also uniform on [0,1]:

p(s)ds = p(r)rdr = 2rdr

p(s) = 2r{dr \over ds} = 2r \cdot {1 \over 2r} = 1

Above, we used \inline u_1 and \inline u_2 to generate a value of r and angle, but since the s and \inline \theta that we’ve generated here are also uniform on [0,1] they can be used just as well. Because of our definitions, \inline \sin(2\pi \theta) = {v_1 \over r}, so new expressions are:

z_1 = \sqrt{-2 \ln s}\cdot {v_1 \over r}

z_2 = \sqrt{-2 \ln s}\cdot {v_2 \over r}

We’ve got an even more efficient set of formulae, at the cost of having to throw out around 20% of our values. It will turn out that this is often quicker than the basic version of the algorithm, but is totally useless if we move to pseudo-random numbers, as I will discuss another time.

The final technique I want to talk about is a generalisation of acceptance-rejection, which we’ve just touched on and was also used implicitly in my first post on Monte Carlo.

The idea here is once again to simulate a distribution f(x) by sampling instead from another distribution g(x) that overlaps it everywhere (we can rescale g(x) by a constant A to ensure this), and rejecting values that fall inside A.g(x) but outside of f(x). This is usually used when we can simulate g(x) directly from uniforms by using its inverse CDF – perhaps the best way to demonstrate how this works is through another example. We let f(x) be the normal PDF, and g(x) the exponential distribution, such that

g(x) = \lambda e^{-\lambda x}

This only has support on positive x, so we redefine it to cover positive and negative x, and rescale it so that it is everywhere larger than the normal pdf

A\cdot g(x) = e^1 \cdot e^{-|x|}

and calculating the inverse CDF gives

G^{-1}(u) = \begin{matrix} -\ln{2u} & 0.5 \leq u < 1 \\ \ln{2u} & 0 \leq u < 0.5 \end{matrix}

We generate two independent uniform draws \inline u_1 and \inline u_2. The first is used to simulate a variate distributed according to A.g(x) via the inverse CDF given here. Then we compare the values of A.g(x) at that point to the value of f(x) evaluated at that point. The second uniform variate is compared to the ratio of the two – if it is GREATER than the ratio, our point has fallen ‘outside’ of f(x), so it is rejected and we start again. If it is LOWER than the ratio, we accept the value of \inline G^{-1}(u) as a realisation of our target distribution f(x).

An example of Acceptance Sampling
An example of Acceptance Sampling. Because of the form of g(x), we are able to write the inverse CDF in closed form and use it to generate variates distributed according to g(x) from uniform variates. For each variable, we compare the ratio of f(x) and g(x), and take another uniform variate to compare to. If it is greater than the ratio, all of the variables are discarded, while if it is less, we include the original value taken from g(x) in our set of random numbers. This set is distributed according to f(x), but we may need to sample many times if the functions f(x) and g(x) aren’t similar (in which case we will have numerous rejections)

An obvious problem with this technique is that if the distributions don’t overlap well, many variates will be rejected, so we need a good uniform generator to provide enough independent variables for the algorithm to work.

The algorithm can be made arbitrarily complex by designing trial distributions g(x) that match f(x) as closely as possible, such as the ‘Ziggaraut algorithm’ which optimises the approach by breaking g(x) into many parts matched to different sections of f(x), which and is often used to generate normal random variates. As with the Marsenne Twister, there are many good guides to this algorithm available online.

Leave a Reply

Your email address will not be published. Required fields are marked *