Digital Options

Today I’m going to talk about the valuation of another type of option, the digital (or binary) option. This can be seen as a bit of a case study, as I’ll present the option payoff and the analytical price and greeks under BS assumptions, and give add-ons to allow pricing with the MONTE CARLO pricer. I’ve also updated the ANALYTICAL pricer to calculate the price and greeks of these options.

Digital options are very straight-forward, they are written on an underlying S and expire at a particular date t, at which point digital calls pay $1 if S is greater than a certain strike K or $0 if it is below that, and digital puts pay the reverse – ie. the payoff is

P_{\rm dig\ call} = \Big\{ \ \begin{matrix} \$1 \quad S \geq K\\ \$0 \quad S < K\end{matrix}

We can calculate the price exactly in the BS approximation using the same method that I used to calculate vanilla option prices by risk-neutral valuation as follows

C_{\rm dig\ call}(0) = \delta(0,t)\ {\mathbb E}[ C_{\rm dig\ call}(t) ]

where \inline C_{\rm dig\ call}(t') is the price of the option at time \inline t', and we know that this must converge to the payoff as \inline t \to t', so \inline C_{\rm dig\ call}(t) = P_{\rm dig\ call}

= \delta(0,t)\ \int^{\infty}_{-\infty} P_{\rm dig\ call} \cdot e^{-{1\over 2}x^2} dx

\inline P_{\rm dig\ call} is zero if \inline S = S_0 e^{(r - {1\over 2}\sigma^2)t + \sigma \sqrt{t} x} < K which corresponds to \inline x < {\ln{K \over S_0} - (r-{1\over 2}\sigma^2)t \over \sigma \sqrt{t}} = -d_2

= \delta(0,t)\ \cdot \$1 \cdot \int^{\infty}_{-d_2} e^{-{1\over 2}x^2} dx

= \$ \delta(0,t)\cdot \Phi(d_2)


d_1 = {\ln{S\over K}+(r+{1\over 2}\sigma^2)t \over \sigma \sqrt{t}} \quad ;\quad d_2 = {\ln{S\over K}+(r-{1\over 2}\sigma^2)t \over \sigma \sqrt{t}}

Since we have an analytical price, we can also calculate an expression for the GREEKS of this option by differentiating by the various parameters that appear in the price. Analytical expressions for a digital call’s greeks are:

\Delta = {\partial C \over \partial S} = e^{-rt}\cdot {\phi(d_2)\over S\sigma \sqrt{t}}

\nu = {\partial C \over \partial \sigma} = -e^{-rt}\cdot {d_1 \ \phi(d_2)\over \sigma}

\gamma = {\partial^2 C\over \partial S^2} = -e^{-rt}\cdot {d_1 \ \phi(d_2)\over S^2 \sigma^2 t} = -e^{-rt}\cdot {d_1\ \phi(d_1)\over S K \sigma^2 t}

{\rm Vanna} = {\partial^2 C \over \partial S \partial \sigma} =e^{-rt}\cdot {\phi(d_2)\over S \sigma^2 \sqrt{t}}\Big( d_1 d_2 - 1 \Big)

{\rm Volga} = {\partial^2 C \over \partial \sigma^2} = e^{-rt}\cdot {\phi(d_2)\over \sigma^2}\cdot \Big( d_1 + d_2 - d_1^2 d_2 \Big)


\phi(d_1) = {1\over \sqrt{2 \pi}}\ {e^{-{1\over 2}d_1^2} }

Holding a binary put and a binary call with the same strike is just the same as holding a zero-coupon bond, since we are guaranteed to receive $1 wherever the spot ends up, so the price of a binary put must be

e^{-rt} = e^{-rt}\cdot \Phi(d_2) + C_{\rm dig\ put}(t'=0)\quad \quad \quad \quad [1]

C_{\rm dig\ put}(0) = e^{-rt} \cdot \Phi(-d_2)

Moreover, differentiating equation [1] above shows that the greeks of a digital put are simply the negative of the greeks of a digital call with the same strike.

Graphs of these are shown for a typical binary option in the following graphs. Unlike vanilla options, these option prices aren’t monotonic in volatility: if they’re in-the-money, increasing vol will actually DECREASE the price since it makes them more likely to end out-of-the-money!

Price and first-order greeks for a digital call option.
Price and first-order greeks for a digital call option.
Second-order greeks for a digital call option. Greeks for digital puts are simply the negative of these values

One final point on pricing, note that the payoff of a digital call is the negative of the derivative of a vanilla call payoff wrt. strike; and the payoff of a digital put is the positive of the derivative of a vanilla put payoff wrt. strike. This means that any binary greek can be calculated from the corresponding vanilla greek as follows

\Omega_{\rm dig\ call} = -{\partial \Omega_{\rm call}\over \partial K}

\Omega_{\rm dig\ put} = {\partial \Omega_{\rm put}\over \partial K}

where here \inline \Omega represents a general greek.

If you haven’t yet installed the MONTE CARLO pricer, you can find some instructions for doing so in a previous post. The following links give the header and source files for binary calls and puts which can be dropped in to the project in your C++ development environment

These will register the option types with the option factory and allow monte carlo pricing of the options (so far, all of the options in the factory also have analytical expressions, but I’ll soon present some options that can only be priced by Monte Carlo).

Interview Quesions III

Today’s question will test some of the statistics and correlation I’ve discussed in the last couple of months. Assume throughout that x\sim {\mathbb N}(0,1) and y\sim {\mathbb N}(0,1) are jointly normally distributed such that {\mathbb E}[x \cdot y] = \rho

a) Calculate {\mathbb E}[\ e^x \ ]
b) Calculate {\mathbb E}[\ e^x \ | \ y = b\ ]

The first expectation is of a lognormal variate, and the second is of a lognormal variate conditional on some earlier value of the variate having been a particular value – these are very typical of the sorts of quantities that a quant deals with every day, so the solution will be quite instructive! Before reading the solution have a go at each one, the following posts may be useful: SDEs pt. 1, SDEs pt. 2, Results for Common Distributions

a) Here, we use the standard result for expectations

    \begin{align*} {\mathbb E}[ \ e^x \ ] &= \int^{\infty}_{-\infty} e^x \cdot f(x) \ dx \nonumber \\ \nonumber \\ &= {1 \over \sqrt{2 \pi}}\int^{\infty}_{-\infty} e^x \cdot e^{-{1 \over 2}x^2} \ dx \nonumber \\ \nonumber \\ &= {1 \over \sqrt{2 \pi}}\int^{\infty}_{-\infty} \exp\Bigl( -{1\over 2}\Bigl[ x^2 - 2x + 1 - 1 \Bigr] \Bigr) \ dx \nonumber \\ \nonumber \\ &= {e^{1\over 2} \over \sqrt{2 \pi}}\int^{\infty}_{-\infty} \exp\Bigl( -{1\over 2} (x-1)^2 \Bigr) \ dx \nonumber \\ \nonumber \\ &= e^{1\over 2} \nonumber \end{align}

b) This one is a little tougher, so first of all I’ll discuss what it means and some possible plans of attack. We want to calculate the expectation of e^x, given that y takes a value of b. Of course, if x and y were independent, this wouldn’t make any difference and the result would be the same. However, because they are correlated, the realised value of y will have an effect on the distribution of x.

To demonstrate this, I’ve plotted a few scatter-graphs illustrating the effect of specifying y on x, with x and y uncorrelated and then becoming increasing more correlated.

When x and y are uncorrelated, the realised value of y doesn't affect the distribution for x, which is still normally distributed around zero
When x and y are uncorrelated, the realised value of y doesn’t affect the distribution for x, which is still normally distributed around zero
When x and y are correlated, the realised value of y has an effect on the distribution of x, which is no longer centered on zero and has a smaller variance
When x and y are correlated, the realised value of y has an effect on the distribution of x, which is no longer centered on zero and has a smaller variance
When the correlation of x and y becomes high, the value of x is almost completely determined by y. Now, if y is specified then x is tightly centered around a value far from zero
When the correlation of x and y becomes high, the value of x is almost completely determined by y. Now, if y is specified then x is tightly centered around a value far from zero

The simplest way of attempting this calculation is to use the result for jointly normal variates given in an earlier post, which says that if x and y have correlation \rho, we can express x in terms of y and a new variate z \sim {\mathbb N}(0,1) which is uncorrelated with y

    \[x = \rho\cdot y + \sqrt{1 - \rho^2}\cdot z\]


    \[(\ e^x\ |\ y = b\ ) = e^{\rho y + \sqrt{1-\rho^2}z} = e^{\rho b}\cdot e^{\sqrt{1-\rho^2}z}\]

Since the value of y is already determined (ie. y = b), I’ve separated this term out and the only thing I have to calculate is the expectation of the second term in z. Since y and z are independent, we can calculate the expectation of the z, which is the same process as before but featuring slightly more complicated pre-factors

    \begin{align*} {\mathbb E}[ \ e^x \ |\ y = b\ ] &= e^{\rho b} \int^{\infty}_{-\infty} e^{\sqrt{1-\rho^2}z} \cdot f(z) \ dz \nonumber \\ \nonumber \\ &= {e^{\rho b} \over \sqrt{2 \pi}}\int^{\infty}_{-\infty} e^{\sqrt{1-\rho^2}z} \cdot e^{-{1 \over 2}z^2} \ dz \nonumber \\ \nonumber \\ &= {e^{\rho b} \over \sqrt{2 \pi}}\int^{\infty}_{-\infty} \exp\Bigl( -{1\over 2}\Bigl[ z^2 - 2\sqrt{1-\rho^2}z \nonumber \\ & \quad \quad \quad \quad \quad \quad \quad \quad + (1-\rho^2) - (1-\rho^2) \Bigr] \Bigr) \ dz \nonumber \\ \nonumber \\ &= {e^{\rho b} \over \sqrt{2 \pi}} e^{{1 \over 2} (1-\rho^2)} \int^{\infty}_{-\infty} \exp\Bigl( -{1\over 2} \bigl(z-\sqrt{1-\rho^2}\bigr)^2 \Bigr) \ dz \nonumber \\ \nonumber \\ &= e^{{1\over 2}(1-\rho^2)+\rho b} \nonumber \end{align}

We can check the limiting values of this – if \rho = 0 then x and y are independent [this is not a general result by the way – see wikipedia for example – but it IS true for jointly normally distributed variables], in this case {\mathbb E}[ \ e^x \ |\ y = b\ ] = e^{0.5} just as above. If \rho = \pm 1, {\mathbb E} [\ e^x \ |\ y=b\ ] = e^{\pm b}, which also makes sense since in this case x = \pm y = \pm b, so y fully determines the expectation of x.

The more general way to solve this is to use the full 2D joint normal distribution as given in the previous post mentioned before,

    \[f(x,y) = {1 \over {2\pi \sqrt{1-\rho^2}}} \exp{\Bigl(-{1 \over 2(1-\rho^2)}(x^2 + y^2 - 2\rho xy)\Bigr)}\]

This is the joint probability function of x and y, but it’s not quite what we need – the expectation we are trying to calculate is

    \[{\mathbb E}[ \ e^x \ |\ y \ ] = \int^{\infty}_{-\infty} e^x \cdot f(\ x \ |\ y \ ) \ dx\]

So we need to calculate the conditional expectation of x given y, for which we need Bayes’ theorem

    \[f(x,y) = f( \ x \ | \ y \ ) \cdot f(y)\]

Putting this together, we have

    \[{\mathbb E}[ \ e^x \ |\ y = b\ ] = \int^{\infty}_{-\infty} e^x \cdot {f(x,y) \over f(y)}\ dx\]

    \[={1 \over {2\pi \sqrt{1-\rho^2}}} \int^{\infty}_{-\infty} { e^x \over e^{-{1\over 2}y^2}} \exp{\Bigl(-{1 \over 2(1-\rho^2)}(x^2 + y^2 - 2\rho xy)\Bigr)} dx\]

    \[={e^{{1\over 2}b^2} \over {2\pi \sqrt{1-\rho^2}}} \int^{\infty}_{-\infty} e^x \exp{\Bigl(-{1 \over 2(1-\rho^2)}(x^2 + b^2 - 2\rho xb)\Bigr)} dx\]

This integral is left as an exercise to the reader, but it is very similar to those given above and should give the same answer as the previous expression for {\mathbb E}[\ e^x \ | \ y = b\ ] after some simplification!

Inverse Normal CDF

Now that I’ve got some Monte Carlo code up, it’s inevitable that I will eventually need an implementation of the Inverse of the Normal Cumulative Density Function (CDF). The inverse of a CDF is called a Quantile function by the way, so I’ll often refer to this as the Normal Quantile function. The reason this is so important is because, as I discussed in my post on Random Number Generators, it can be used to convert uniformly distributed random numbers into normally distributed random numbers, which we will need for Monte Carlo simulations if we believe that underlyings are distributed according to a lognormal distribution.

As a reminder, I’ve repeated the graph from the previous post here to show the behaviour of the Normal Quantile:

The inverse CDF - called the quantile function
The standard Normal Quantile function. Simple transformations of the normal distribution will allow any non-standard normal quantile to be expressed using the standard version.


As with the implementation of an algorithm for the Normal CDF (discussed here), there were several possibilities for implementation. In the end I’ve decided to go with the analytic approximation due to Beasley, Springer and Moro as presented in Joshi’s text on computational derivative pricing:

a1 = 2.50662823884
a2 = -18.61500062529
a3 = 41.39119773534
a4 = -25.44106049637

b1 = -8.47351093090
b2 = 23.08336743743
b3 = -21.06224101826
b4 = 3.13082909833

c1 = 0.3374754822726147
c2 = 0.9761690190917186
c3 = 0.1607979714918209
c4 = 0.0276438810333863
c5 = 0.0038405729373609
c6= 0.0003951896511919
c7 = 0.0000321767881768
c8 = 0.0000002888167364
c9 = 0.0000003960315187


A polynomial form is used for the central region of the quantile, where \inline 0.08 < x < 0.92:

y = x - 0.5 \ ; \quad z = y^2

\Phi^{-1}(x) \simeq y \cdot { (a_1 + a_2 z + a_3 z^2 + a_4 z^3 ) \over (1 + b_1 z + b_2 z^2 + b_3 z^3 + b_4 z^4 ) }


And if \inline x \leq 0.08 or \inline x \geq 0.92 then:

y = x - 0.5 \ ; \quad z = \Bigl\{ \begin{matrix} x & y\leq 0\\ 1 - x & y > 0 \end{matrix} \ ; \quad \kappa = \ln (-\ln z)

\Phi^{-1}(x) = \pm (c_1 + c_2\kappa + c_3\kappa^2 + c_4\kappa^3 + c_5\kappa^4 + c_6\kappa^5 + c_7\kappa^6 + c_8\kappa^7 + c_9\kappa^8)

With a plus at the front if \inline y \geq 0 and a minus in front if \inline y < 0.

This is fairly swift as an algorithm, and the quantile method of generating normal variates will turn out to have some key advantages over the various other methods discussed before which will become apparent in future. In a future post I’ll show how to integrate this method of converting uniform variates to gaussian variates into the Monte Carlo code that I’ve made available here, thanks to the factory pattern that I’ve used it will turn out to be incredibly straight-forward!

Finally, if you are wondering why the numbers in these numerical algorithms seem so arbitrary, it’s because they [sort-of] are! Once I’ve implemented some optimisation routines (this might not be for a while…) I’ll have a look at how we can find our own expressions for these functions, and hopefully be able to back out some of the numbers above!!

BS from Delta-Hedging

Today I’m going to look at another method of getting to the BS equations, by constructing a delta-hedge. This is the way that the equation was in fact first reached historically, and it’s a nice illustration of the principle of hedging. All of the same assumptions are made as in the post that derived the BS equation via Risk Neutral Valuation.

The principle is that because the price of some derivative C(S,t) is a function of the stochastic underlying S, then all of the uncertainty in C comes from the same source as the uncertainty in S. We try to construct a risk-free portfolio made up of the two of these that perfectly cancels out all of the risk. If the portfolio is risk-free, we know it must grow at the risk free rate r(t) or else we have an arbitrage opportunity.

Our model for S is the geometric brownian motion, note that we allow the rate of growth \mu in general to be different from r

    \[dS = \mu Sdt + \sigma SdW_t\]

We can express dC in terms of its derivatives with respect to t and S using Ito’s lemma, which I discussed in a previous post,

    \[dC = \Bigr[ {\partial C \over \partial t} + \mu S{\partial C \over \partial S} + {1 \over 2}\sigma^2 S^2{\partial^2 C \over \partial S^2} \Bigl] dt + \sigma S{\partial C \over \partial S} dW_t\]

Our portfolio is made up of one derivative worth C(S,t) and a fraction \alpha of the underlying stock, worth \alpha \cdot S; so the net price is C(S,t) + \alpha S. We combine the above two results to give

    \begin{eqnarray*} d(C+\alpha S) &=& \Bigr[ {\partial C \over \partial t} + \mu S{\partial C \over \partial S} + {1 \over 2}\sigma^2 S^2{\partial^2 C \over \partial S^2} + \mu S\alpha \Bigl] dt \\ \nonumber \\ &   & + \, \sigma S \bigl( {\partial C \over \partial S} +\alpha \bigr) dW_t \nonumber \end{eqnarray}

We are trying to find a portfolio that is risk-free, which means we would like the stochastic term to cancel. We see immediately that this happens for \alpha = -{\partial C \over \partial S}, which gives

    \[d(C+\alpha S) = \Bigr[ {\partial C \over \partial t} + {1 \over 2}\sigma^2 S^2{\partial^2 C \over \partial S^2}\Bigl] dt\]

Since this portfolio is risk-free, to prevent arbitrage it must grow deterministically at the risk free rate

    \[d ( C + \alpha S) = r ( C + \alpha S) dt\]

and so

    \[rC= {\partial C \over \partial t} + {1 \over 2}\sigma^2 S^2{\partial^2 C \over \partial S^2} + rS{\partial C \over \partial S}\]

This is the BS partial differential equation (pde). Note that despite the fact that the constant growth term for the underlying had a rate \mu, this has totally disappeared in the pde above – we might disagree with someone else about the expected rate of growth of the stock, but no-arbitrage still demands that we agree with them about the price of the option [as long as we agree about \sigma, that is!]

As for any pde, we can only solve for a specific situation if we have boundary conditions – in this case, given by the payoff at expiry t=T. At that point we know the exact form the value that C(S,T) must take

    \[C_{\rm call}(S,T) = \max(K-S,0)\]

Our job is to use the pdf to evolve the value of C(S,t) backwards to t=0. In the case of vanilla options this can be done exactly, while for more complicated payoffs we would need to discretise and solve numerically. This gives us another way of valuing options that is complementary (and equivalent) to the expectations approach discussed previously.

To solve the equation above, it is useful to first make some substitutions. As we are interested in time-to-expiry only, we make the change of variables \tau = T - t which yields

    \[rC= -{\partial C \over \partial \tau} + {1 \over 2}\sigma^2 S^2{\partial^2 C \over \partial S^2} + rS{\partial C \over \partial S}\]

We can eliminate the S terms by considering change-of-variables M = \ln S. This means that

    \[{\partial C \over \partial S} = {\partial C \over \partial M}{\partial M \over \partial S} = {1 \over S}{\partial C \over \partial M}\]

    \begin{eqnarray*} \partial^2 C \over \partial S^2} = {\partial \over \partial S} \Bigl({\partial C \over \partial S}\Bigr) & = & {\partial \over \partial S} \Bigl({1 \over S}{\partial C \over \partial M}\Bigr) \nonumber \\ \nonumber \\ & = & {-1 \over S^2} {\partial C \over \partial M} + {1 \over S}{\partial \over \partial S} \Bigl({\partial C \over \partial M}\Bigr) \nonumber \\ \nonumber \\ & = & {-1 \over S^2} {\partial C \over \partial M} + {1 \over S}\bigl({\partial M \over \partial S}{\partial \over \partial M}\bigr) \Bigl({\partial C \over \partial M}\Bigr) \nonumber \\ \nonumber \\ & = & {-1 \over S^2} {\partial C \over \partial M} + {1 \over S^2}{\partial^2 C \over \partial M^2 \nonumber \end{eqnarray}

Combining these the BS equation becomes

    \[rC= -{\partial C \over \partial \tau} + {1 \over 2}\sigma^2 {\partial^2 C \over \partial M^2} + \bigr( r - {1 \over 2} \sigma^2 \bigl) {\partial C \over \partial M}\]

The linear term in C can be removed by another transformation D = C e^{-r\tau} so that

    \begin{eqnarray*} {\partial D \over \partial \tau} &=& {\partial C \over \partial \tau}e^{-r\tau} - rCe^{-r\tau} \nonumber \\ \nonumber \\ {\partial^n D \over \partial M^n} &=& {\partial^n C \over \partial M^n}e^{-r\tau} \nonumber \end{eqnarray}

The exponential terms cancel throughout, and we are left with

    \[0 = -{\partial D \over \partial \tau} + {1 \over 2}\sigma^2 {\partial^2 D \over \partial M^2} + \bigr( r - {1 \over 2} \sigma^2 \bigl) {\partial D \over \partial M}\]

One final transformation will be needed before putting in boundary conditions. The transformation will be

    \[y = M + \Bigr( r -{1\over 2 }\sigma^2\Bigl)\tau\]

But unlike the other transformations I’ve suggested so far, this one mixes the two variables that we are using, so a bit of care is required about what we mean. When I most recently wrote the BS equation, D was a function of M and \tau – this means that the partial differentials with respect to \tau were implicitly holding M constant and vise versa. I’m now going to write D as a function of y and \tau instead, and because the relationship features all three variables we need to take a bit of care with our partial derivatives:

    \[dD(M,\tau) = {\partial D \over \partial \tau}\bigg|_{M} d\tau + {\partial D \over \partial M}\bigg|_{\tau} dM\]

where vertical lines indicate the variable that is being held constant during evaluation. Now, to move from M to y, we expand out the dM term in the same way as we did for dD above

    \[dD(M(y,\tau),\tau) = {\partial D \over \partial \tau}\bigg|_{M} d\tau + {\partial D \over \partial M}\bigg|_{\tau} \Bigr({\partial M \over \partial \tau}\bigg|_{y} d\tau + {\partial M \over \partial y}\bigg|_{\tau}dy\Bigl)\]

    \[= dD(y,\tau) = {\partial D \over \partial \tau}\bigg|_{y} d\tau + {\partial D \over \partial y}\bigg|_{\tau} dy\]

We can compare these last two equations to give expressions for the derivatives that we need after the transformation by comparing the coefficients of d\tau and dy

    \begin{eqnarray*} {\partial D \over \partial \tau}\bigg|_y &=& {\partial D \over \partial \tau}\bigg|_M + {\partial D \over \partial M}\bigg|_{\tau} {\partial M\over \partial \tau}\bigg|_y \nonumber \\ \nonumber \\ {\partial D \over \partial y}\bigg|_{\tau} &=& {\partial D \over \partial M}\bigg|_{\tau}{\partial M \over \partial y}\bigg|_{\tau} \nonumber \end{eqnarray}

Computing and inserting these derivatives [I’ve given a graphical representation of the first of these equations below, because the derivation is a little dry at present!] into the BS equation gives

    \[0 = -{\partial D \over \partial \tau}\bigg|_y + {1 \over 2}\sigma^2 {\partial^2 D \over \partial y^2}\bigg|_{\tau}\]

This is the well-known Heat Equation in physics. For the sake of brevity I won’t solve it here, but the solution is well known – see for example the wikipedia page – which gives the general solution:

    \[D(x,\tau) = {1 \over \sqrt{2\pi \sigma^2 \tau}}\int^{\infty}_{-\infty} e^{-(x-y)^2 \over 2\sigma^2 \tau} p(y) dy\]

Where p(y) is the payoff condition (it’s now an initial condition, as expiry is at \tau = 0). The algebra is quite involved so I give the solution its own post, and you can show by substitution that the BS option formulae given previously is a solution to the equation.

An illustration of the difference between partial differentials when a change of variables involving both current varibles is used
An illustration of the difference between partial differentials when a change of variables involving both current variables is used. This should be thought of as a contour plot with the value of D on the out-of-plane axis. The amount D changes when moving a small amount dt depends on which direction you are moving in, as shown above.

As an aside, what was the portfolio that I was considering all of the way through? Comparing \alpha to the vanilla greeks, we recognise it as the option delta – the hedging portfolio is just the portfolio of the option with just enough stock to hedge out the local delta risk. Of course, as time goes by this value will change, and we need to constantly adjust our hedge to account for this. This shows the breakdown caused by one of our assumptions – that we could trade whenever we want and without transaction costs. In fact, because we need to re-hedge at every moment to enforce this portfolio’s risk free nature, in the presence of transaction costs the hedging costs in this strategy will be infinite! This demonstrates a significant failing of one of our assumptions, I’ll come back again to the effect of this in the real world in future posts.

Random Number Generators

I’m in the process of preparing some C++ code for the blog to allow user-downloadable and -compilable Monte Carlo pricers that won’t overload my server. That’s got me thinking about random number generators, and I’m going to talk a bit about them here [In fact, all of the algorithms discussed below are entirely deterministic, so I should really call them pseudo-random number generators. Whether or not particular algorithms are suitable for use in applications requiring random numbers is an area of active research and occasional catastrophic failure].

Uniform RNGs

Generally, in Monte Carlo simulations we want to be able to generate many ‘sample paths’, each of which is a realisation of a possible evolution of the underlying price, and evaluate the payoff for that path. As I discussed in an earlier post, we usually model these quantities as depending on brownian motions and results are normally distributed, which means we need to generate random variables that are also normally distributed (so, for example, we are more likely to get values near to zero than at large values).

Generating these directly is quite tricky, so the first step is usually to generate uniform random variables and then transform these to gaussian variates. A uniform random variable is a variable with a constant probability of being anywhere within a certain range, typically [0,1] or [-1,1] – for the rest of this post, I denote a uniform variable as U[…], and define u~[0,1] and v~[-1,1]. The pdf of u and v are shown here:

The Pdfs of two uniform random variables
The Pdfs of two uniform random variables, over ranges [0,1] and [-1,1]

A straight-forward method of generating these numbers uses modular multiplication [this is variously called a Park-Miller generator, a linear congruential generator and a Lehmer generator]:

x_{k+1} = g \cdot x_k \mod n

For good choices of the variables, this will generate a sequence with a period of n that is uniformly distributed across the interval [0,n], so dividing by n rescales the variables to be distributed roughly according to u. The choices of g, n and \inline x_0 (the ‘seed’) are non-trivial and several are given in the literature, as usual I’ve followed the advice given on wikipedia and lumped for n = 2^32 – 5, g = 279470273, and the choice of seed is left to the user but should be coprime to modulus n. Note that this will have a period of about 2^32, so the upper limit on the number of monte-carlo evaluations that can meaningfully be done is around 4 billion steps. It is also easy to see how the generator could be hampered by a poor choice of parameters – if the multiplier and the modulus are not co-prime, for example, the actual period of the sequence can be much less than n.

The more advanced Marsenne Twister algorithm aims to correct many of the problems with the Park-Miller generator, such as increasing the period length of the sequence and improving the statistical randomness of the values produced. It’s too complicated to go through here (it’s a full post in itself!) in full detail but many good summaries of the theory and performance of the twister algorithm can be found online.

One other thing to point out is that if initilised in the same way, each time one of the above algorithms is run it will return the same sequence of numbers. At first this might seem like an unwanted property as it deviates from ‘true’ randomness. However, in our context it will be extremely valuable as it allows us to test code using the same sequence of numbers each time. This is vital for calculating greeks by bump-and-revalue using monte carlo as otherwise monte carlo error in price will usually swamp small changes due to greeks, and it also helpful when optimising code to make sure that results aren’t being affected.

Gaussian RNGs

Once we have a source of uniform random variates, we need to consider methods for converting them to gaussian variates. The simplest and most direct is the inversion method, which involves finding a mapping between uniform random variate u and the standard normal random variate z. Consider first the CDF of the normal variable (I have discussed this before here), shown below:

Cdf of the normal distribution
CDF of the normal distribution, which gives the probability of the variable being less-than-or-equal-to the value of the CDF at each point

This shows us the likelihood that z is below a certain value X, and is equal to the integral of the PDF from \inline -\infty to X. Because the PDF is normalised to integrate to 1, the CDF maps each possible outcome of z to a value between 0 and 1, with the most likely outcomes being where the CDF is steepest (since these cover the y-axis values more quickly) which is exactly where the PDF is largest. This is exactly the opposite of what we want, so we consider instead the inverse CDF shown here:

The inverse CDF - called the quantile function
The inverse CDF – called the quantile function – of the normal distribution (as usual, the inverse function is the function reflected in the line x=y). This function allows us to map a uniform variate onto the distribution in question. For example, the graph shows a uniform variate draw of 0.8 would map to a value of z = 0.8416 for z distributed according to the standard normal distribution

This mapping is exactly what we’re after [although there are others – it’s not unique]. It maps uniformly distributed variates to normally distributed variates and they’re in a 1-to-1 correspondence. This procedure is robust and in principle it is exact. However, it is not always the fastest method, and it requires us to know the inverse CDF, which to evaluate in a reasonable time requires us to make approximations (this technique works for any distribution – although there is no closed form CDF for the normal distribution, for many other distributions the inverse CDF can be expressed in closed form, in which case this is almost certainly the best method).

There are several other techniques for obtaining normal distributions from uniform variates, a few of which I have implemented and describe them here because they’re quite interesting. The first is called the Box-Muller technique, which converts two independent uniform variates into two independent normal variates by combining the inversion technique described above, the trick that we use to calculate a gaussian integral and the change of variables law for Pdfs that I discussed here.

Consider two independent normally-distributed variables x and y. The joint PDF of the two is

p(x,y)dxdy = p_z(x)p_z(y)dxdy = {1 \over 2 \pi}\exp{ \Bigl( -{1 \over 2}(x^2 + y^2)\Bigr) }dxdy

As for gaussian integration, we can re-express this in polar co-ordinates remembering that \inline dxdy = rdrd\theta

{1 \over 2 \pi}\exp{ \Bigl( -{1 \over 2}(x^2 + y^2)\Bigr) }dxdy = {1 \over 2 \pi}e^{-{1\over 2}r^2} rdrd\theta = p_R(r)p_{\theta}(\theta)rdrd\theta

If we can simulate a value of r and theta using uniforms that obeys these densities, we will be able to transform them to normals x and y straightforwardly. There is radial symmetry so we can integrate over \inline d\theta to give us the radial cumulative probability density (ie. radial CDF)

P(r\leq R) = \int^R_0 r' e^{-{1\over 2}r'^2} dr'

Unlike x or y, we can calculate this in closed form and invert it

\int^R_0 r' e^{-{1\over 2}r'^2} dr' = \Bigl[ -e^{-{1\over 2}r'^2} \Bigr]^R_0 = 1 - e^{-{1\over 2}R^2}

P^{-1}(u) = \sqrt{-2 \ln(1 - u)}

This quantile function allows us to select a point with the correct radial distribution using a first uniform variate, and we then need to select a theta from a uniform distribution around angles – this can be done by multiplying a second uniform variate by \inline 2\pi [nb. we can simplify the radial quantile further – note that if u is uniformly distributed across [0,1], then so is (1-u), so we can substitute u for (1-u)]. The values of normally distributed variates are given by the x- and y-coordinates of the resulting point, ie.

z_1 = \sqrt{-2 \ln u_1}\cdot \sin(2\pi u_2)

z_2 = \sqrt{-2 \ln u_1}\cdot \cos(2\pi u_2)

This is a really useful trick, quickly converting two uniform variates into two gaussian variates. We can improve the speed still further if we can get rid of the computationally expensive sine and cosine functions, for which there is another nifty trick using rejection. In the final step, we inserted two uniform variates to produce \inline z_1 and \inline z_2. We can use the following transformation to turn our initial uniform variates into two new variates that avoid the need for a sine or cosine:

Instead of taking \inline u_1 and \inline u_2, consider instead two uniform variables \inline v_1 and \inline v_2 across [-1,1]. Let \inline s = r^2 = v_1^2 + v_2^2, and if s > 1 then reject the values and start again. The variables now lie within the unit sphere as shown below:

Polar Box Muller procedure
Polar Box Muller procedure. Two alternative values s and theta are generated from the initial random variables U1 and U2. If these fall inside the unit circle, it can be shown that they are also uniformly distributed, so are equally suitable for use in the Box-Muller formulae. The advantage is algebraic simplicity – because theta is an angle, the ratio of s and the initial variates is already a sine or cosine, so we don’t need to calculate these explicitly

We can use change of variables to show that after rejecting external points, both theta and s are also uniform on [0,1]:

p(s)ds = p(r)rdr = 2rdr

p(s) = 2r{dr \over ds} = 2r \cdot {1 \over 2r} = 1

Above, we used \inline u_1 and \inline u_2 to generate a value of r and angle, but since the s and \inline \theta that we’ve generated here are also uniform on [0,1] they can be used just as well. Because of our definitions, \inline \sin(2\pi \theta) = {v_1 \over r}, so new expressions are:

z_1 = \sqrt{-2 \ln s}\cdot {v_1 \over r}

z_2 = \sqrt{-2 \ln s}\cdot {v_2 \over r}

We’ve got an even more efficient set of formulae, at the cost of having to throw out around 20% of our values. It will turn out that this is often quicker than the basic version of the algorithm, but is totally useless if we move to pseudo-random numbers, as I will discuss another time.

The final technique I want to talk about is a generalisation of acceptance-rejection, which we’ve just touched on and was also used implicitly in my first post on Monte Carlo.

The idea here is once again to simulate a distribution f(x) by sampling instead from another distribution g(x) that overlaps it everywhere (we can rescale g(x) by a constant A to ensure this), and rejecting values that fall inside A.g(x) but outside of f(x). This is usually used when we can simulate g(x) directly from uniforms by using its inverse CDF – perhaps the best way to demonstrate how this works is through another example. We let f(x) be the normal PDF, and g(x) the exponential distribution, such that

g(x) = \lambda e^{-\lambda x}

This only has support on positive x, so we redefine it to cover positive and negative x, and rescale it so that it is everywhere larger than the normal pdf

A\cdot g(x) = e^1 \cdot e^{-|x|}

and calculating the inverse CDF gives

G^{-1}(u) = \begin{matrix} -\ln{2u} & 0.5 \leq u < 1 \\ \ln{2u} & 0 \leq u < 0.5 \end{matrix}

We generate two independent uniform draws \inline u_1 and \inline u_2. The first is used to simulate a variate distributed according to A.g(x) via the inverse CDF given here. Then we compare the values of A.g(x) at that point to the value of f(x) evaluated at that point. The second uniform variate is compared to the ratio of the two – if it is GREATER than the ratio, our point has fallen ‘outside’ of f(x), so it is rejected and we start again. If it is LOWER than the ratio, we accept the value of \inline G^{-1}(u) as a realisation of our target distribution f(x).

An example of Acceptance Sampling
An example of Acceptance Sampling. Because of the form of g(x), we are able to write the inverse CDF in closed form and use it to generate variates distributed according to g(x) from uniform variates. For each variable, we compare the ratio of f(x) and g(x), and take another uniform variate to compare to. If it is greater than the ratio, all of the variables are discarded, while if it is less, we include the original value taken from g(x) in our set of random numbers. This set is distributed according to f(x), but we may need to sample many times if the functions f(x) and g(x) aren’t similar (in which case we will have numerous rejections)

An obvious problem with this technique is that if the distributions don’t overlap well, many variates will be rejected, so we need a good uniform generator to provide enough independent variables for the algorithm to work.

The algorithm can be made arbitrarily complex by designing trial distributions g(x) that match f(x) as closely as possible, such as the ‘Ziggaraut algorithm’ which optimises the approach by breaking g(x) into many parts matched to different sections of f(x), which and is often used to generate normal random variates. As with the Marsenne Twister, there are many good guides to this algorithm available online.

Some Results for Common Distributions

[NB. I try to make note of technical conditions where relevant, from now on I’ll put these in square brackets. On a first reading these can probably be ignored]


I review here a few commonly used results for normal and lognormal distributions. They get used over and over across quantitative finance, so I wanted to have lots of them in once place where I can refer back to them later.

Univariate Normal Distribution

The probability density function for X is

p(X=x) = {1\over \sigma\sqrt{2\pi}}e^{-{{(x-\mu)^2}\over 2 \sigma^2}}

with mean \inline \mu and standard deviation \inline \sigma. I tend to use the notation \inline X\sim{\mathbb N}(\mu,\sigma^2) to describe a normally distributed variable with these parameters.

As already referred to, we can take the location and the scale out of the variable X as follows:

X = a + bz

where a and b are constants and z \sim {\mathbb N}(0,1) is called a ‘standard normal variable’.

This comes from the result that the sum of any two normal variables is also normal – for \inline X_1\sim{\mathbb N}(\mu_1, \sigma_1) and \inline X_2\sim{\mathbb N}(\mu_2, \sigma_2), then [as long as \inline X_1 and \inline X_2 are independent]

aX_1 + bX_2 = X_3\sim{\mathbb N}(a\mu_1 + b\mu_2, a^2 \sigma_1^2 + b^2\sigma_2^2)

The cdf of the normal distribution isn’t analytically tractable, but I’ve already discussed a few numerical approximations to it here. I include a doodle that re-implements this function for the standard normal cdf for you to play with:

\Phi() =

The normal distribution comes up time and time again due to the central limit theorem. This says that [usually], as we take the mean value of a sequence of [independent, identically distributed] random variables, it will tend to a normally distributed variate regardless of the distribution of the underlying variables:

\lim_{n\rightarrow \infty} \Bigl( {1 \over n}\sum_{i=0}^n X_i\Bigr) \sim{\mathbb N}(\mu,{\sigma^2\over n})

This is of very broad importance. For example, it is the basis of Monte Carlo and the square root N convergence rate, since by taking many simulations, we are sampling the distribution of the mean of N realisations of the payoff of the option. Although the payoff probably isn’t normally distributed across the r-n probability distribution of the underlying, a very large number of payoffs will approach a normal distribution, and its mean is an estimator for the payoff mean. The variance also gives us error bounds, as variance should decrease with increasing number of samples as 1/n.

Lognormal Distribution

A lognormal variable is one whose log is distributed normally – so if \inline X\sim {\mathbb N}(\mu,\sigma^2) then \inline S \sim e^{a + bX} is lognormally distributed.

The pdf for a lognormal distribution is

p(X=x) = {1\over x\sigma\sqrt{2\pi}}e^{-{{(\ln x-\mu)^2}\over 2 \sigma^2}}; x>0

once again with mean \inline \mu and standard deviation \inline \sigma [Exercise: \inline \mu and \inline \sigma are actually the mean and std. dev. for the normal distribution in the exponent, NOT for the lognormal itself – calculate their true values for the lognormal distribution]. I tend to use the notation \inline X\sim {\mathbb L}{\mathbb N}(\mu,\sigma^2) to describe a lognormally distributed variable with these parameters.

Special properties of lognormal variables are mostly due to the properties of normal in the exponent. The two most important are related to the products of lognormal variates [here I am still assuming independence – I’ll generalise this another time], if X_1 \sim {\mathbb L}{\mathbb N}(\mu_1, \sigma_1^2) and X_2 \sim {\mathbb L}{\mathbb N}(\mu_2, \sigma_2^2) then:

X_1 \cdot X_2 = X_3 \sim {\mathbb L} {\mathbb N}(\mu_1 + \mu_2, \sigma_1^2+\sigma_2^2)

(X_1)^n \sim {\mathbb L}{\mathbb N}(n \mu_1, n^2\sigma_1^2)

{1\over X_1} = (X_1)^{-1}\sim{\mathbb L}{\mathbb N}(-\mu_1,\sigma_1^2)

Although the third of these is a special case of the second, it is worth taking a little bit of time to think about. It says that the distribution of the inverse of a lognormal variable is also lognormal. This will be useful areas like foreign exchange, since it says that if the future exchange rate is randomly distributed in this way, then the inverse of the exchange rate (which is of course just the exchange rate from the opposite currency perspective) is also lognormal. Secondly, what does \inline -\mu mean for the distribution? Well, don’t forget that this isn’t the distribution mean but the mean of the normal in the exponent – even when this is negative, the lognormal is positive, so this is an acceptable value.

Unlike the normal, there is no closed form expression for the sum of two lognormal variables. Two approaches are typically used in this case – for a large, independent selection of variables with the same mean and variance the CLT implies that the distribution of the average will be roughly normal, while for small, highly correlated sequences with similar means they are still roughly lognormal with an adjusted mean and variance. The second technique is an example of ‘moment matching’, I’ll discuss it later in more detail.

Multivariate Normal Distribution

In the case that we have more than one normally distributed random variable [we assume here that they are ‘jointly normally distributed’], we need to build into our calculations the possibility that they might not be independent, which will lead to the multivariate normal distribution (a trivial example of the failure of our above expressions for non-independent variables is if \inline X_2 = –\inline X_1; in which case \inline X_2 + \inline X_1 = 0, and it is certainly NOT true that \inline 0\sim {\mathbb N}(\mu_1+\mu_2,\sigma_1^2 + \sigma_2^2)!

To measure the dependence of two normal variables, our starting point is their covariance. Similarly to variance, this measures the difference between the expectation of the product of two of these variables from the individual expectations

{\rm Cov}[X_1,X_2] = {\mathbb E}[X_1\cdot X_2] - {\mathbb E}[X_1]{\mathbb E}[X_2]

which is 0 for independent variables. A scaled version of the covariance is called the correlation

\rho(X_1,X_2) = {{\rm Cov}[X_1,X_2] \over \sqrt{{\rm Var}[X_1]{\rm Var}[X_2]}}

so that \inline \rho \in [-1,1]. The combined pdf for two or more variables becomes rapidly more complicated, I’ll look at ways of thinking about it another time but state here the case for two variables each with a standard normal distribution and correlation \inline \rho:

P(X=x, Y=y) = {1 \over 2\pi \sqrt{1-\rho^2}}\exp{\Bigl(-{1 \over 2(1-\rho^2)}(x^2 + y^2 - 2\rho xy)\Bigr)}

[Exercise: derive the product rule for two normal and for two lognormal variables when the normals involved are allowed to be correlated]

Finally, for normal variables only, we have the important result that if X and Y have correlation \inline \rho, we can re-write X as the sum of Y and Z as

X = \rho \cdot Y + \sqrt{1-\rho^2}\cdot Zwhere Y and Z are uncorrelated. I’ll be using this expression lots in the future!

Risk Neutral Valuation

There are a few different but equivalent ways of viewing derivatives pricing. The first to be developed was the partial differential equations method, which was how the Black Scholes equation was originally derived. There’s lots of discussion of this on the wikipedia page, and I’ll talk about it at some stage – it’s quite intuitive and a lot of the concepts fall out of it very naturally. However, probably the more powerful method, and the one that I use almost all of the time in work, is the risk-neutral pricing method.

The idea is quite simple, although the actual mechanics can be a little intricate. Since the distribution of the underlying asset at expiry is known, it makes sense that the price of a derivative might be the expected value of the payoff of the option at expiry (eg. (S_t-K)^+ for a call option, where a superscript ‘+’ means “The greater of this value or zero”) over the underlying distribution. In fact, it turns out that this isn’t quite right: due to the risk aversion of investors, this will usually produce an overestimate of the true price. However, in arbitrage-free markets, there exists another probability distribution under which the expectation does give the true price – this is called the risk-neutral distribution of the asset. Further, [as long as the market is complete] any price other than the risk-neutral price allows the possibility of arbitrage. Taken together, these are roughly a statement of the Fundamental Theorem of Asset Pricing. In the case of vanilla call options, a portfolio of the underlying stock and a risk-free bond can be constructed that exactly replicate the option and could be used in such an arbitrage.

In this risk-neutral distribution, all risky assets grow at the risk-free rate, so that the ‘price of risk’ is exactly zero. Let’s say a government bond – which we’ll treat as risk-free – exists, has a price B and pays an interest rate of r, so that

dB = r B dt

Then, the stochastic process for the underlying stock that we looked at before

dS = \mu dt + \sigma dW_t

is modified so that mu becomes r, and the process is

dS = rdt + \sigma dW_t

 so the risk-neutral distribution of the asset is still lognormal, but with mu’s replaced by r’s:

S_t = S_0 e^{(r - {1\over 2}\sigma^2)t + \sigma\sqrt{t}z}


I’ve not provided the explicit formula yet, so I’ll demonstrate here how this can be used to price vanilla call options

C(F,K,\sigma,t,\phi) = \delta(t){\mathbb E^{S_t}}[(S_t-K)^+]

= \delta(t)\int^\infty_{0} (S_t - K)^+ p_S(S_t)dS_t

= \delta(t)\int^\infty_{K} (S_t - K) p_S(S_t)dS_t

= {\delta(t) \over \sqrt{2\pi}}\int^\infty_{x_K} (S_0 e^{(r-{1\over 2}\sigma^2)t + \sigma \sqrt{t}x} - K) e^{-{1\over 2}x^2}dx

= {\delta(t) \over \sqrt{2\pi}}\Bigr[S_0 e^{(r-{1\over 2}\sigma^2)t} \int^\infty_{x_K} e^{\sigma \sqrt{t}x -{1\over 2}x^2}dx - \int^\infty_{x_K} K e^{-{1\over 2}x^2} dx \Bigl]

= {\delta(t) \over \sqrt{2\pi}}\Bigr[S_0 e^{(r-{1\over 2}\sigma^2)t} \int^\infty_{x_K} e^{-{1\over 2}(x-\sigma\sqrt{t})^2 + {1\over 2}\sigma^2t}dx - \sqrt{2\pi} K\Phi(-x_K)\Bigl]

= {\delta(t) \over \sqrt{2\pi}}\Bigr[S_0 e^{rt} \int^\infty_{x_K - \sigma \sqrt{t}} e^{-{1\over 2}x^2} dx - \sqrt{2\pi} K \Phi(-x_K)\Bigl]

= \delta(t)\Bigr[S_0 e^{rt} \Phi(-x_K + \sigma \sqrt{t}) - K \Phi(-x_K)\Bigl]

= \delta(t)\Bigr[F \Phi(d_1) - K \Phi(d_2)\Bigl]

which is the celebrated BS formula! In the above, F = forward price = \inline S_0 e^{rt}\inline \Phi(x) is the standard normal cumulative density of x, \inline x_K is the value of x corresponding to strike S=K, ie.

x_K = {\ln{K \over F} + {1\over 2}\sigma^2t \over \sigma \sqrt{t}}

it is typical to use the variables d1 and d2 for the values in the cfds, such that

d_1 = {\ln{F \over K} + {1\over 2}\sigma^2t \over \sigma \sqrt{t}}

d_2 = {\ln{F \over K} - {1\over 2}\sigma^2t \over \sigma \sqrt{t}} = d_1 - \sigma \sqrt{t}

In reality, certain we have made certain assumptions that aren’t justified in reality. Some of these are:

1. No arbitrage – we assume that there is no opportunity for a risk-free profit

2. No transaction costs – we can freely buy and sell the underlying at a single price

3. Can go long/short as we please – we have no funding constraints, and can buy/sell an arbitrarily large amount of stock/options and balance it with an opposite position in bonds

4. Constant vol and r – we assume that vol and r are constant and don’t vary with strike. In fact, it’s an easy extension to allow them to vary with time, I’ll come back to this later

I’ll look at the validity of these and other assumptions in a future post.

If prices of vanillas have non-constant vols that vary with strike, doesn’t that make all of the above useless? Not at all – but we do need to turn it on its head! Instead of using the r-n distribution to find prices, we use prices to find the r-n distribution! Lets assume that we have access to a liquid market of vanilla calls and puts that we can trade in freely. If we look at their prices and apply the Fundamental Theorem of Calculus twice

C(t)= \delta(t) \int^\infty_{K} (S_t - K)p_S(S_t)dS_t

{\partial C \over \partial K} = -\delta(t) \int^\infty_K p_S(S_t)dS_t

{1\over \delta(t)}{\partial^2 C \over \partial K^2} =p_S(K)

 So the curvature of call prices wrt. strike tells us the local risk neutral probability! This means for each expiry time that we can see vanillas option prices, we can calculate the market-implied r-n distribution (which probably won’t be lognormal, telling us that the market doesn’t really believe the BS assumptions as stated above either). Once we know this, we can use it calibrate our choice of market model and to price other, more exotic options.

[Post script: It is worth noting that although this looks like alchemy, we haven’t totally tamed the distribution, because although we know the underlying marginal distribution at each expiry time, we still don’t know anything about the correlations between them. That is, we know the marginal distributions of the underlying at each time, but not the full distribution. For concreteness, consider two times \inline t_1 and \inline t_2. We know \inline P(S_{t_1}) and \inline P(S_{t_2}) but not \inline P(S_{t_1},S_{t_2}). To price an option paying off the average of the spot at these two times, knowing the first two isn’t enough, we need to know the second, as the expectation is \inline \int^\infty_0 \int^\infty_0 {1\over 2}(S_{t_1} + S_{t_2})P(S_{t_1},S_{t_2}) dS_{t_1}dS_{t_2}. To see the difference, from Bayes Theorem we have that \inline P(S_{t_1},S_{t_2}) = \inline P(S_{t_1}).\inline P(S_{t_2}|S_{t_1}). So, although we know how the spot will be distributed at each time, we don’t know how each distribution is conditional on the times before, which we’d need to know to price exactly – our modelling challenge will be to choose some sensible process for this that is consistent with the marginal distributions.]

Root Finders

One of the nice things about this job is the variety. There’s lots of maths, but the topics aren’t limited to set areas. So one day I might be working with stochastic calculus or statistics, and the next I have to delve into numerical techniques to solve a problem and then code it up.

I hinted about this briefly in my last post about implied vol. In order to do this, I had to solve an equation of the form

C_{BS}}({\rm Fwd},K,\sigma,t,\phi) - {\rm Market\_ Price} = 0

by varying sigma, and because the form of C in general isn’t tractable, numerical techniques must be used.

The easiest solver to implement is a bisection solver. This needs to be given two points on either side of the root, so that C_{BS}(...,\sigma_1,...) is negative and C_{BS}(...,\sigma_2,...) is positive (for vols, sensible values might be \inline \sigma_1 = 0.01% and \inline \sigma_2 = 200%). Then it simply takes the mid-point of these two (call it \inline \sigma_3) and evaluates the function there. If it’s positive, \inline \sigma_3 becomes the new \inline \sigma_2, and if negative \inline \sigma_3 becomes the new \inline \sigma_1. At each iteration this halves the remaining interval between the values. It’s very robust and will always home in to the root, but is considered to be relatively slow. The difference between the n-th value \inline c_n and the true value \inline c obeys

| c_n - c | \leq {| b - a| \over 2^n}

where b and a are the initial values. If we want this difference to be less than \inline 1\times 10^d we have

1\times 10^d \leq {|b-a|\over 2^n}

n \geq \ln_2 |b-a| + 3.32\cdot d

since \inline \ln_2 10 \simeq 3.32. This means that for each additional decimal point of accuracy, a further 3.32 iterations will be needed on average.

Faster ways are less reliable. For example, the secant method is a finite difference version of Newton-Raphson. The first points chosen don’t need to be on opposite sides of the root, but they should be near to the root. The expression used for successive approximations to the root is

x_n = x_{n-1} - f(x_{n-1}){x_{n-1}-x_{n-2} \over f(x_{n-1}) - f(x_{n-2})}

This is an interpolation technique, it works roughly as shown in the figures below. It’s speed of convergence is much faster than the bisection technique when close to the root, but like Newton-Raphson, it is not guaranteed to converge. This isn’t much use to us – we’d like to combine the benefits of both techniques.

A successful setup for the secant method
Here we can see the secant method working. The first step interpolates between the function values f(x1) and f(x2) and finds a better approximation x3. The next step intertpolates f(x2) and f(x3), leading to a new approximation x4 outside the interval [x2,x3]. From this point the procedure proceeds rapidly – it takes 14 steps to achieve 16 d.pds of accuracy (cf. about 55 steps for similar accuracy from the bisection method).
Here we see the secant method failing, due to the curvature of the function and the poor initial choice of x1 and x2. f(x2) and f(x3) are interpolated and lead to a chord that may or may not cross the function (depending on its behaviour at negative x). Clearly secant is not robust to poor choices of initial values or pathological function behaviour.

The industry standard technique for this is the Brent Method. The algorithm is a little complicated but it essentially combines these two approaches, using the secant method but checking that it performs better than the bisection method, and falling back on that if not. It also includes a few extra provisions for added speed and stability – once again, Wikipedia is a great reference for this. I’ve coded up a version of this for the pricer, and there’s a demonstration of how this outperforms the bisection method below. Choose any of the functions in the list, enter some points and see how many iterations each one takes to find the root!

Just a brief note on the coding as well, since I said I’d cover everything here. I’m using javascript at the moment, mostly for speed. Javascript is very forgiving (although consequently a pain to debug!), it allows variables to be defined as functions and passed around as parameters. I created a function

 BrentSolver( functionToSolve, lowerBound, upperBound, tolerance )

which takes a function of one argument functionToSolve(z) that returns the value of the function at z; and finds a root of that function between the lower and upper bounds. Usually we will need this function to have more than one parameter, but again javascript is kind to us – in the block of code that calls it, we call the parameters we need, and define a new function dynamically:

price = 5;
fwd = 100;
strike = 100;
time = 1;
optionStyle = -1;

functionToSolve = function ( vol ) {
  return vanillaPrice( fwd, strike, vol, time, optionStyle ) - price;

brentSolver( functionToSolve, 0.001, 200, 0.00000001 );

Much easier than in C++!! Below is a basic doodle to let you try out the different solvers on fairly well-known functions – have a play and see what happens.


Root Finder: 

Lower Bound:
Upper Bound:

# Iterations:

Stochastic Differential Equations Pt2: The Lognormal Distribution

This post follows from the earlier post on Stochastic Differential Equations.

I finished last time by saying that the solution to the BS SDE for terminal spot at time T was

\inline S_T = S_0 e^{(\mu - {1 \over 2}\sigma^2)T + \sigma W_T}

When we solve an ODE, it gives us an expression for the position of a particle at time T. But we’ve already said that we are uncertain about the price of an asset in the future, and this expression expresses that uncertainty through the \inline \sigma W_T term in the exponent. We said in the last post that the difference between this quantity at two different times s and t was normally distributed, and since this term is the distance between t=0 and t=T (we have implicitly ignored a term W_0, but this is ok because we assumed that the process started at zero) it is also normally distributed,

W_T \sim {\mathbb N}(0,T)

It’s a well-known property of the normal distribution (see the Wikipedia entry for this and many others) that if X \sim {\mathbb N}(0,1) then aX \sim {\mathbb N}(0,a^2) for constant a. We can use this in reverse to reduce W_T to a standard normal variable x, by taking a square root of time outside of the distribution so W_T \sim \sqrt{T}\cdot{\mathbb N}(0,1) and we now only need standard normal variables, which we know lots about. We can repeat our first expression in these terms

\inline S_T = S_0 e^{(\mu - {1 \over 2}\sigma^2)T + \sigma \sqrt{T} X}

What does all of this mean? In an ODE environment, we’d be able to specify the exact position of a particle at time T. Once we try to build in uncertainty via SDEs, we are implicitly sacrificing this ability, so instead we can only talk about expected positions, variances, and other probabilistic quantities. However, we certainly can do this, the properties of the normal distribution are very well understood from a probabilistic standpoint so we expect to be able to make headway! Just as X is a random variable distributed across a normal distribution, S(t) is now a random variable whose distribution is a function of random variable X and the other deterministic terms in the expression. We call this distribution the lognormal distribution since the log of S is distributed normally.

The random nature of S is determined entirely by the random nature of X. If we take a draw from X, that will entirely determine the corresponding value of S, since the remaining terms are deterministic. The first things we might want to do are calculate the expectation of S, its variance, and plot its distribution. To calculate the expectation, we integrate over all possible realisations of X weighted by their probability, complete the square and use the gaussian integral formula with a change of variables

{\mathbb E}[S_t] = \int^{\infty}_{-\infty} S_t(x) p(x) dx

={S_0 \over \sqrt{2\pi}}\int^{\infty}_{-\infty} e^{(\mu-{1\over 2}\sigma^2)t + \sigma \sqrt{t}x} e^{-{1\over 2}x^2} dx

={ {S_0 e^{(\mu-{1\over 2}\sigma^2)t}} \over \sqrt{2\pi}}\int^{\infty}_{-\infty} e^{-{1\over 2}x^2 + \sigma \sqrt{t} x} dx

={{S_0 e^{(\mu-{1\over 2}\sigma^2)t}} \over \sqrt{2\pi}}\int^{\infty}_{-\infty} e^{-{1\over 2} (x - \sigma \sqrt{t})^2} e^{{1\over 2}\sigma^2 t} dx

={{S_0 e^{\mu t}} \over \sqrt{2\pi}}\int^{\infty}_{-\infty} e^{-{1\over 2} y^2} dy

=S_0 e^{\mu t}

which is just the linear growth term acting over time [exercise: calculate the variance in a similar way]. We know what the probability distribution of X looks like (it’s a standard normal variable), but what does the probability distribution of S look like? We can calculate the pdf using the change-of-variables technique, which says that if S = g(x), then the area under each curve in corresponding regions must be equal:

\int_{x_1}^{x_2} p_x(x) dx = \int_{g(x_1)}^{g(x_2)} p_S(S) dS

p_x(x) dx = p_S(S) dS

p_S(S_t) = p_x(x) {dx \over dS_t}

 We know the function S(x), but the easiest way to calculate this derivative is first to invert the function t make it ameanable to differentiation

x = {\ln{S_t \over S_0} - (\mu - {1 \over 2}\sigma^2)t \over \sigma \sqrt{t}}

{dx \over dS_t} = {1 \over \sigma \sqrt{t} S_t}

So the pdf of S expressed in terms of S is

 p_S(S_t) = {1 \over S_t \sigma \sqrt{2\pi t}} \exp{-\Bigl(\ln{S_t\over S_0} - (\mu-{1\over 2}\sigma^2)t \Bigr)^2\over 2 \sigma^2 t}

Well it’s a nasty looking function indeed! I’ve plotted it below for a few typical parameter sets and evolution times.

A lognormal PDF with typical parameter values. Of course, it is highly unlikely that parameter values will stay constant for 4 years – we’ll discuss time dependence of these another day
A more-volatile PDF. Note that even though the mode is falling over time, the mean (which is independent of vol) is still going up due to the fat tails at positive values.

This distribution is really central to a lot of what we do, so I’ll come back to it soon and discuss a few more of its properties. The one other thing to mention is that if we want to calculate an expected value over S (which will turn out to be something we do a lot), we have two approaches – either integrate over \inline p_S(S_t)

{\mathbb E}[f(S_t)] = \int_0^{\infty} f(S_t)p_S(S_t)dS

or,instead express the function in terms of x instead (using \inline S_t = S_0 e^{(\mu - {1 \over 2}\sigma^2)t + \sigma \sqrt{t} x}) and instead integrate over the normal distribution

{\mathbb E}[f(S_t(x))] = \int_{-\infty}^{\infty} f(x)p_x(x)dx

This is typically the easier option. I think it is called the Law of the Unconscious Statistician. On that note, we’ve certainly covered enough ground for the moment!


Stochastic Differential Equations Pt1: The BS Equation

SDEs are of fundamental importance in quantitative finance. Because we don’t know what is going to happen in the future, we need some kind of random process built into our equations to model this uncertainty.

By far the most common process used is Brownian motion, and in 1 dimension this is called the Weiner process, which we will label \inline W_t at time t. This is a diffusive process (basically, there are no ‘jumps’, and in small time intervals the distance that the particle has traveled is probably very small) with no memory, so that future motion is independent of past motion. If we choose any two times s and t with t > s, then the difference between the value of the process at these times is distributed normally, \inline W_t - W_s \sim {\mathbb N}(0,t-s).  This means that our uncertainty about a particle (or asset) following a Weiner process at some future time T can be characterised by a normal distribution with variance T (ie. standard deviation T^0.5) and expectation 0.

The Black-Scholes SDE for a stock price (for reasons that will become clear later, this is usually called geometric brownian motion, by the way) is

{dS \over S} = \mu dt + \sigma dW_t

What does this mean? Well, the first two terms are simple enough for anyone who has studied ordinary differential equations. S is an asset price, t is time and \inline \mu and \inline \sigma are both constants. Without the stochastic term, we would be able to solve the equation by a simple integration on both sides

\int_{S_0}^{S_T} {dS \over S} = \int_0^T \mu dt

S_T = S_0 e^{\mu T}

which is simply exponential growth of a stock with time. Later, we will introduce risk-free bonds that grow in this way, like compound interest in a bank account, but here we are interested in including an element of uncertainty as well. How do we deal with that?

We might be tempted to try the same thing and integrate the whole equation, using the result that \inline d \ln S = S^{-1} \cdot dS = \mu dt + \sigma d W_t as we implicitly did above. Unfortunately, for non-deterministic functions this doesn’t work any more. The reasons are rather deep, but for the moment the fix is that we need to use a central result of stochastic calculus, Ito’s Lemma, to resolve the conundrum. This is the stocastic version of the chain rule for a function of many variables, which says that if

dS = \mu (S,t) dt + \sigma (S,t) dW_t


d( f(S,t)) = \Bigr( {\partial f \over \partial t} + \mu(S,t) {\partial f \over \partial S} + {\sigma(S,t)^2 \over 2}{\partial^2 f \over \partial S^2} \Bigl) dt + \sigma(S,t) {\partial f \over \partial S} dW_t

That seems like quite a mouthful, but for the case that we’re considering, ln(S), it’s actually pretty straight-forward. Since ln(S) doesn’t depend explicitly on time, the time-derivative disappears. It’s quite easy to calculate the first and second derivatives of ln(S) with S, and remembering that we needed to multiply all of the terms in the BS equation above by S to turn the LHS into plain old dS (these were absorbed into our \inline \mu(S,t) and \inline \sigma(S,t)), we get:

d \ln S = ( \mu - {1 \over 2} \sigma^2 ) dt + \sigma dW_t

with the \inline \mu and \inline \sigma just plain old constants again. We have picked up an extra term here, and it has come from the second derivative term in Ito’s Lemma. We are now in a position to integrate both sides of the equation (\inline \int^T \sigma dW_t = \sigma W_T, by the way), and similar to the result in the deterministic case we get

S_T = S_0 e^{(\mu - {1 \over 2}\sigma^2)T + \sigma W_T}

This is the solution of the BS SDE, but what does it mean? There’s quite a bit to go along with above, so I’ll talk a bit more about what this means in the second part of this blog post.