Forward Rate Agreements and Swaps

For calibration of discount curves from swap rates, see my post on Bootstrapping the Discount Curve from Swap Rates.

In this post I’m going to introduce two of the fundamental interest rate products, Forward Rate Agreements (FRAs) and Swaps. FRAs allow us to ‘lock in’ a specified interest rate for borrowing between two future times, and Swaps are agreements to exchange a future stream of fixed interest payments for floating ones, or visa-versa. Both are model independent, which means we can statically hedge them today using just Zero Coupon Bonds (ZCBs) – so their prices won’t depend on our models for interest rates or underlying prices etc. ZCBs really are fundamental here – if you haven’t read my earlier post on them yet I recommend starting there!

First, a note about conventions. In other contexts, I’ve used r to be the continuously-compounding rate, such that if I start with $1, by time t it will be worth \exp \bigl( \int_0^t r(t')dt' \bigr) . In the rates world, this would be very confusing, and rates are always quoted as simple annualised rates, so that $1 lent for half a year at 5% would return $1.025 exactly in half a year’s time (note the additional factor of 0.5 coming from the year-fraction of the deposit), and this rates convention will be used throughout this post.

Each ZCB gives us a rate at which we can put money on deposit now for expiry at a specific time, and we can construct a discount curve from the collection of all ZCBs that we have available to us. In the case of a ZCB, we deposit the money now and receive it back at expiry. A Forward Rate Agreement extends the idea of putting money on deposit now for a fixed period of time to putting it on deposit at a future date for a specific period of time. We’ve picked up an extra variable here – our rate r(T) for deposit starting now depends only on time of expiry T, while the FRA rate f(t,T) will depend on the time that we put the money on deposit t as well as the time of expiry T.

An FRA, just like a deposit, involves two cash flows. We pay the counterparty a notional M at time t, and receive our notional M plus interest of (T-t)\cdot f(t,T) \cdot M at time T; where the first term is the Year Fraction, the second is the Forward Rate, and the final is the Notional. Since f will be fixed when we sign the contract, we can hedge these two cash flows exactly at t=0 using ZCBs. The value of the FRA is the value of receiving the second sum minus the cost of making the first payment:

    \[{\rm FRA}_{price} = Z(0,T)\cdot \Bigl[ (T - t)\cdot f(t,T) + 1\Bigr]\cdot M - Z(0,t)\cdot M\]

This agreement is at ‘fair value’ if the forward rate f(t,T) makes {\rm FRA}_{price} = 0, and re-arranging gives

    \[f(t,T) = {1 \over T-t}\ \Bigr( {Z(0,t) \over Z(0,T)} - 1 \Bigl)\]

An FRA allows us to ‘lock-in’ a particular interest rate for some time in the future – this is analogous in rates markets to the forward price of a stock or commodity for future delivery, which was discussed in an earlier post. Note that the price of all FRAs is uniquely determined from the discount curve [although in reality our discount curve will be limited in both temporal resolution and maximum date by the ZCBs or other products available on the market which we can use to build it].

A Swap is an agreement to exchange two cash flows coming from assets, but not the assets themselves. By far the most common is the Interest Rate Swap, in which two parties agree to swap a stream of fixed rate interest rate payments on a notional M of cash for a stream of floating rate payments on the same notional. Although the notional might be quite large, usually only the differences between the payments at each time are exchanged, so the actual payments will be very much smaller. The mechanics are probably best demonstrated by example:

A swap is written on a notional $100Mn, with periods starting in a year and continuing for three years and with payments at the end of each three month period; to pay fixed annualised 5% payments, and floating payments at the three-month deposit rate (fixed at the beginning of each period). What payments actually get made?

The swap starts in a year’s time, but the first payment is made at the end of the first 3-month period, in 15 months time. At this time, the fixed payment will be 5\% \cdot M \cdot (3{\rm m} / 12{\rm m}). The floating payment will be whatever the three-month deposit rate was at 1 year, multiplied by the same prefactors, r_{\rm 1 yr}(0,3{\rm m}) \cdot M \cdot (3{\rm m} / 12{\rm m}) – let’s say that r_{\rm 1 yr}(0,3{\rm m}) is 4% (although of course we won’t know what it is until the beginning of each period, and the payment will be delayed until the end of the period) – then the net cash flow will be $250,000, paid by the person paying fixed to the person paying floating.

The same calculation happens at each time, and a payment is made equal to the difference between the fixed and the floating leg cash flow. Although the notional is huge, we can see that the actual payments are much, much smaller [be alert for newspapers quoting ‘outstanding notional’ to make positions seem large and unsteady!]. The convention for naming Swaps is that if we are receiving fixed payments, we have a receiver’s swap; and if we are receiving floating payments, it is a payer’s swap.

What is the point of this product? Well, if we have a loan on which we are having to pay floating rate interest, using a swap we can exchange that to a fixed rate of interest by making fixed rate payments to the counterparty and receiving floating rate payments back, which match the payments that we’re making on the loan. This is of use to companies, who need to handle interest rate risk and might not want to be exposed to rates rising heavily on money they’re borrowing. A bank holding a large portfolio of fixed rate mortgage loans but required to pay interest to a central bank at floating rate might engage in a swap in the reverse direction to hedge it’s exposure.

How can we price this product? It’s easy to price the fixed payments – since each is known exactly in advance, we can hedge these out using ZCBs. The value of the sum of the fixed payments is

    \[V_{\rm fixed} = M \sum_{n=0}^{N-1} x \cdot Z(0,t_{n+1}) \cdot \tau\]

where x is the fixed rate (5% in our example above), and \tau is the relevant year-fraction for each payment (0.25 in our example about). What about the floating payments? We have to think a little harder here, but it turns out we can use FRA agreements to hedge these exactly as well. The problem is that we don’t know what the three-month deposit rate will be in a year’s time. But we can replicate it: if we borrow an amount N in a year’s time, and put that on deposit for three months, we’ll receive back M \cdot(r_{1 yr}(0,3{\rm m}) + 1)\cdot (3{\rm m}/12{\rm m}) in 15 months. We know from our discussion above that we can enter an FRA to borrow N in a year’s time, and we’ll need to pay M \cdot (f(12{\rm m},15{\rm m})+1) \cdot (3{\rm m}/12{\rm m}) in 15 months time – so we can guarantee to match the payment profile of the floating leg using the forward rates for the periods in question. The value of the floating payments is therefore

    \[V_{\rm floating} = M \sum_{n=0}^{N-1} f(t_n,t_{n+1}) \cdot Z(0,t_{n+1}) \cdot \tau\]

    \[= M \sum_{n=0}^{N-1} Z(0,t_{n+1}) \cdot ({Z(0,t_n)\over Z(0,t_{n+1})} - 1)\]

    \[= M \sum_{n=0}^{N-1} (Z(0,t_n) - Z(0,t_{n+1}))\]

    \[= M \bigl(Z(0,t_0) - Z(0,t_N)\bigr)\]

Since we can fix the price of the two legs exactly by arbitrage right now, we can value the swap by comparing the present value of each leg

    \[V_{\rm swap} = V_{\rm fixed} - V_{\rm floating}\]

    \[= M \sum_{n=0}^{N-1} (x - f(t_n,t_{n+1}))\cdot Z(0,t_n) \cdot\tau\]

    \[= M \Bigl( Z(0,t_N) - Z(0,t_0) + \sum_{n=0}^{N-1} x \cdot Z(0,t_{n+1})\cdot \tau \Bigr)\]

As with FRAs, swaps are said to be at fair value when the values of the fixed and the floating rate match, and the overall value is zero. This is the fixed rate at which we can enter a Swap for free, and occurs when x = X such that

    \[X = { Z(0,t_0) - Z(0,t_N) \over \sum_{n=0}^{N-1} \tau\cdot Z(0,t_{n+1}) }\]

This is called the Swap Rate. It is fully determined by the discount curve, and as we shall see the reverse is also true – the Swap Rate is in 1-to-1 correspondence with discount curve. Of course, after a swap is issued the Swap Rate will change constantly, in which case the actual fixed payment will no longer match X and the swap will have non-zero value. If K is the swap fixed coupon payment and X is the current swap rate, then

    \[V_{\rm swap}= M \Bigl( Z(0,t_N) - Z(0,t_0) + \sum_{n=0}^{N-1} K \cdot Z(0,t_{n+1})\cdot \tau \Bigr)\]

    \[= M \Bigl( {Z(0,t_N) - Z(0,t_0)\over \sum_{n=0}^{N-1} \cdot Z(0,t_{n+1})\cdot \tau} + K \Bigr) \cdot \sum_{n=0}^{N-1} \cdot Z(0,t_{n+1})\cdot \tau\]

    \[V_{\rm swap}= M \cdot (K - X)\cdot \sum_{n=0}^{N-1} Z(0,t_{n+1})\cdot \tau\]

    \[V_{\rm swap}= M \cdot (K - X)\cdot B\]

where B = \sum_{n=0}^{N-1} Z(0,t_{n+1})\cdot \tau is called the annuity of the swap. The value is proportional to the difference between the swap rate and the swap fixed coupon.

Because of the number of institutions that want to handle interest rate risk resulting from loans, IR Swaps are one of the most liquidly traded financial products. Although we’ve derived their price here from the Discount Curve, in practice it is often done the other way around – Swaps often exist up to much higher maturity dates than other products, and Discount Curves at long maturities are instead constructed from the swap rates quoted on the market at these dates. This is a very important procedure, but financially rather trivial. I’m not going to cover it here but will probably come back to it in a short post in the near future.

As well as being important in their own right, FRAs and Swaps (along with ZCBs, of course) are the foundation of the rich field of interest rate derivatives. The right (but not obligation) to enter into an FRA – a call option on an FRA – is called a caplet, and a portfolio of such options on FRAs across different time periods is called a cap, since you have guaranteed a cap on the maximum interest you will have to pay over the whole period to borrow money (a put on an FRA is called a floorlet, and a sequence of these forms a floor, for similar reasons). An option to enter a swap is called a swaption, and these are also heavily traded wherever a borrower might want to re-finance a loan at a later date if interest rates move sharply. The pricing of these products becomes dependent on the underlying model that we assume for interest rates, and I’ll start to deal with them in later posts.

Factoryised C++

In two of my most recent posts I’ve talked about Forward Contracts and an implementation of the Normal Quantile function. One strategic reason for this will become apparent from today’s post!

I’m going to be talking here about one of the design patterns used in the Monte Carlo code presented on my blog and discussed recently. It isn’t really my goal for this blog to dwell heavily on programming topics as there are many, many more competent people than me sharing their knowledge on the subject across the web. However, there are some topics that are quite interesting to discuss and make ideal blog topics, and a grounding in Object-Oriented programming (particularly in C++) is vital for the modern Quant.

As I stated before, the goal of the code presented to date in ‘Simple Option Pricer 1.0’ is to choose a fairly simple option payoff, choose a method for generating uniform random numbers, and then choose a method for generating gaussian variates from these (and of course to enter the relevant market parameters like discount factor and volatility). The first thing to do is to download all of the code presented on that page, compile it, and run it a few times to make sure you can get all of it to work. Check the answers correspond to the results you get from the analytical pricers for the same parameters, although don’t forget that there will be some Monte Carlo error on the answers coming for C++, and observe that answers get better (though time taken increases!) as the number of paths is increased.

A very straight-forward way of achieving selectivity would be to use a switch statement:

int option;
SimpleOption* theOptionPointer;

cout << "Select a Contract Type: Call [1], Put [2]";
cin >> option;

switch ( option ) {
 case 1 :
  theOptionPointer = new CallOption( ... );
 case 2 :
  theOptionPointer = new PutOption( ... );
 default :
  throw("You must enter [1] or [2]!");

where I’ve assumed that we’ve defined some suitable option classes elsewhere that all inherit from an abstract base class SimpleOption (if there is demand for it I might go over inheritance another time, but there are many sources on the web. This IS important for quants, but it’s rather dry to go over!).

The switch statement looks at the user-entered value and assigns the pointer to an option of the relevant type. This is a basic example of polymorphism – we’ve declared theOptionPointer to be a pointer to the abstract base class SimpleOption, but then given it the address of a specific implementation of the base class via a derived class, either CallOption or PutOption. If we were to call any functions of this pointer, they would have to be those functions defined in the base class itself (although they could be abstract and potentially over-loaded in the derived classes) – if, for example, CallOption had a method called getStrike(), there would need to be a [potentially abstract] prototype for this in the base class SimpleOption if we want to have access to it via the SimpleOption pointer..

However, the code isn’t that great as it stands. We’re going to need a switch statement for each of the choices we’re giving the user (currently Option type, RNG type and Gaussian converter type), which will lead to a bloated main(){} function. More annoyingly, every time we want to include a new option in any of these selections, as well as coding up the relevant classes for the new choice we’re going to need to go into the main function and alter the choices and control structures given by the switch statements. As well as being a bother, this will mean the whole project will need to be re-compiled, and for large projects this can be a significant time constraint. You can see roughly how much time this takes by comparing the speed of Build/Compiling your project (which checks to see if changes have been made, and then only rebuilds the files that are affected) with the time taken to Re-Build/Re-Compile your project (which throws away previously compiled files and starts again from the beginning). Give this a try!

I’ve got around this in ‘Simple Option Pricer 1.0’ by coding up a basic Factory for each of the three selections, based on the Factory presented in Chapter 10 of Joshi’s text, C++ Design Patterns and Derivatives Pricing (this is one of my favourite textbooks on the subject – I’ll do some book reviews of other gems in the future). You can see how this works by downloading the following four files:

Drop these into the same folder as the rest of the files in ‘Simple Option Pricer 1.0’, and then right-click on the Project in your development environment and add them to the project one-by-one [Nb. if you’re using Visual Studio, you may need to add

#include "stdafx.h"

to the two .cpp files before this will work – and make sure they’re in the same folder as the rest, this can cause problems!!].

Now, Build/Compile your project. If you’ve done everything correctly, it should happen very swiftly – only the forward.cpp and the gaussianCdfInverter.cpp will need to compile, and the linker will need to run, but the other files won’t need to be recompiled (this is the real strength of the method, by the way!). Now, re-run the programme and you should discover two new options available to you for the option pricer! How have we achieved this sorcery?!

In the files optionFactory.cpp and optionFactory.h (and the corresponding files for the RNG and gaussian converters), we’ve defined the factories for the different types of classes we want to use. I’ve used the Singleton Pattern (for brevity I’m not going to discuss it here, but do have a look at wikipedia) to create and access a single instance of each factory. The important point is that there will be a single instance of each factory, and we can get their addresses in memory using the following method at any point in the code


The factories themselves store a two column table (implemented with a map) which links a string (the option name) in the first column with a function that constructs an option of that type and returns a pointer to it in the second column (this will be a base class pointer, using polymorphism as discussed above).

The functions that will create these pointers are implemented using another type of polymorphism – templatisation. In the optionFactoryHelper.h header, I’ve defined a new class which is constructed by calling it with a string. It also has a method which builds a new instance of the <class T>, and returns a pointer to it – which is exactly what we said the factory table needs.

template <class T>
simpleOption* simpleOptionHelper<T>::buildOption(double strike) {
 return new T(strike);

The constructer for the helper function registers this method in the factory’s table, along with the string.

template <class T>
simpleOptionHelper<T>::simpleOptionHelper(std::string optionType){
 optionFactory& theFactoryAddress = optionFactory::factoryAddress();
 theFactoryAddress.registerOption(optionType, simpleOptionHelper<T>::buildOption);

Options are registered in their .cpp files, the simple code to register the forward is at the top of forward.cpp, creating a new instance of the helper class templatised on the specific instance of the option that we are registering through it. This will call the helper class constructor, which as we just saw contains code to register the corresponding option type with the factory

namespace {
 simpleOptionHelper<forward> registerForward("forward");

Since these are done in namespaces, they happen before main(){} starts running – and once registered, we can call the factory at any time in our code using a string and giving it the required parameters and it will create a new instance of the derived class with that name

simpleOption* thisOptionPointer = optionFactory::factoryAddress().createOption( optionType, strike );

I’ll use this pattern again and again in future as more and more options are added to the code. At some point in the future, I’ll look at a way of combining ALL of the factories into a single, templatised factory process using more polymorphism, but the complication will be that since different processes need different arguments for their constructors, I’ll need to build a generic container class first that could contain many different sorts of arguments inside it. The next thing I want to do, however, is look at easier ways of data input and output, as the current console application is very clunky and wouldn’t be suitable for a large amount of data input. I’ll discuss ways of dealing with this in future posts.

Inverse Normal CDF

Now that I’ve got some Monte Carlo code up, it’s inevitable that I will eventually need an implementation of the Inverse of the Normal Cumulative Density Function (CDF). The inverse of a CDF is called a Quantile function by the way, so I’ll often refer to this as the Normal Quantile function. The reason this is so important is because, as I discussed in my post on Random Number Generators, it can be used to convert uniformly distributed random numbers into normally distributed random numbers, which we will need for Monte Carlo simulations if we believe that underlyings are distributed according to a lognormal distribution.

As a reminder, I’ve repeated the graph from the previous post here to show the behaviour of the Normal Quantile:

The inverse CDF - called the quantile function
The standard Normal Quantile function. Simple transformations of the normal distribution will allow any non-standard normal quantile to be expressed using the standard version.


As with the implementation of an algorithm for the Normal CDF (discussed here), there were several possibilities for implementation. In the end I’ve decided to go with the analytic approximation due to Beasley, Springer and Moro as presented in Joshi’s text on computational derivative pricing:

a1 = 2.50662823884
a2 = -18.61500062529
a3 = 41.39119773534
a4 = -25.44106049637

b1 = -8.47351093090
b2 = 23.08336743743
b3 = -21.06224101826
b4 = 3.13082909833

c1 = 0.3374754822726147
c2 = 0.9761690190917186
c3 = 0.1607979714918209
c4 = 0.0276438810333863
c5 = 0.0038405729373609
c6= 0.0003951896511919
c7 = 0.0000321767881768
c8 = 0.0000002888167364
c9 = 0.0000003960315187


A polynomial form is used for the central region of the quantile, where \inline 0.08 < x < 0.92:

y = x - 0.5 \ ; \quad z = y^2

\Phi^{-1}(x) \simeq y \cdot { (a_1 + a_2 z + a_3 z^2 + a_4 z^3 ) \over (1 + b_1 z + b_2 z^2 + b_3 z^3 + b_4 z^4 ) }


And if \inline x \leq 0.08 or \inline x \geq 0.92 then:

y = x - 0.5 \ ; \quad z = \Bigl\{ \begin{matrix} x & y\leq 0\\ 1 - x & y > 0 \end{matrix} \ ; \quad \kappa = \ln (-\ln z)

\Phi^{-1}(x) = \pm (c_1 + c_2\kappa + c_3\kappa^2 + c_4\kappa^3 + c_5\kappa^4 + c_6\kappa^5 + c_7\kappa^6 + c_8\kappa^7 + c_9\kappa^8)

With a plus at the front if \inline y \geq 0 and a minus in front if \inline y < 0.

This is fairly swift as an algorithm, and the quantile method of generating normal variates will turn out to have some key advantages over the various other methods discussed before which will become apparent in future. In a future post I’ll show how to integrate this method of converting uniform variates to gaussian variates into the Monte Carlo code that I’ve made available here, thanks to the factory pattern that I’ve used it will turn out to be incredibly straight-forward!

Finally, if you are wondering why the numbers in these numerical algorithms seem so arbitrary, it’s because they [sort-of] are! Once I’ve implemented some optimisation routines (this might not be for a while…) I’ll have a look at how we can find our own expressions for these functions, and hopefully be able to back out some of the numbers above!!


I covered Forward Contracts in a post a week ago, and promised to look at the related Futures Contract soon. These contracts aim to do the same thing – provide a means of ensuring an underlying asset for future delivery at a price determined today – but their dynamics are rather different from those of the Forward Contract.

One of the main criticisms of Forward Contracts is that they are directly written between two parties. This means they are subject to credit risk – if one counterparty goes bankrupt, the contract may not be guaranteed any more. By contrast, a Futures Contract is written via an exchange. This is a central organisation that matches long and short Futures Contract counterparties, and oversees the transaction (and of course, charges a fee for its services). Futures Contracts are standardised in terms of expiry dates and contract terms, which increases liquidity in the market. The counterparty who wants to receive the underlying asset at expiry is said to be the long party, while the one providing it is the short party.

As with a Forward Contract, an underlying asset is specified. Each party must have an account with the exchange in order to enter transactions. Each day, the exchange will quote the current Futures Price, which is the price at which the transaction will happen; but to enter into the contract doesn’t cost either party (beyond the small exchange fee mentioned above). Instead, the counterparties must update their accounts daily to reflect the difference between the initial Futures Price quoted and the current Futures Price quoted. If it is above the initial price, the party who is short the contract must have this difference in their account; and if it is below the initial price, the party who is long must have this difference in their account.

This is probably best demonstrated by a simple example. Let’s say that I want to enter a Futures Contract on Day 0 for delivery of a barrel of oil on Day 5 (realistically this would be a much longer period of months or years, but in order to keep the number of steps in the example down I’m keeping it short here), the current Futures Price quoted for a contract expiring in 5 days is $100. I’m the long party as I want to receive the oil, my counterparty is short as he will be delivering.

Day 1: Futures Price quoted for expiry on Day 5 increases to $101.50. The short party has to deposit $1.50 in his account to cover this increase.

Day 2: Futures Price increases to $103.50. The short party has to deposit another $2 in his account to cover this increase (so it’s now $3.50 total).

Day 3: Futures Price falls to $101. The short party can withdraw $2.50 from his account to leave only $1, which covers the change from day 0.

Day 4: Futures Price for delivery on Day 5 (ie. tomorrow!) crashes to $96. The short party can take all of the money out of his account, and I need to deposit $4 to cover the difference between the current Futures Price and its initial value.

Day 5: Futures Price recovers to $97. I can take $1 out of my account to leave $3 total. Since this is the day the contract expires, we make the transaction today at the initially agreed Futures Price of $100, and the money in my account is returned to me (if the money was in the short party’s account, it would have been returned to him in the same way). Note also that the Futures Price on day 5 for expiry on day 5 must be equal to the spot rate at that time – so this contract obliges me to trade at a price above the current spot rate, just as a Forward Contract at these prices would have done.

There are several points to make. Firstly, I’ve described a Physically Settled Futures Contract here, since I actually took delivery of a barrel of oil. The standardised contracts will specify a grade of oil and a specific delivery location – many exchanges have physical warehouses for these transactions to happen. However, the large majority of Futures Contracts today are Cash Settled: in this case, instead of taking a delivery, I exchange money with the short party to achieve the same financial payoff. In the example above, since we were transacting at $100 while the market price was only $97, I would have to pay $3 to the short party (as we shall see, it’s no coincidence that this was the amount in my account!).

This contract more-or-less eliminates credit risk due to a counterparty defaulting. Consider, for example, that the oil company had defaulted on Day 2. This might seem like bad news for me – at that point, I was expecting to pay $100 for a commodity worth $103.50. The critical point is the account with the exchange. This amount is forfeited if the counterparty wants to leave the transaction or fails to keep topping it up as required, and delivered to the other party. In this case, the short party’s account of $3.50 would have been given to me, and the contract terminated – since the current price was $103.50, I could enter the same contract with a different counterparty without taking any financial loss.

On the other hand, if I’d decided on Day 4 that I no longer wanted to be part of the contract, as it looked like I’d be paying $100 while the contract is only worth $96, I might abscond. However, the exchange still has my $4 on account which it can take and give to the short counterparty, so once again they aren’t hurt by my reneging. The only risk to either party is that there is a large shift in price in a very short time, so that the counterparty is unable to credit a sufficient amount to his account before defaulting.

Although it more-or-less deals with credit risk, this contract does introduce liquidity risk. Considering again Day 2, the company had had to deposit money in their account to cover the position. Even though the contract eventually ends up profitably for the short party, it is conceivable that cash-flow concerns would make them unable to honour their intermediate commitments, in which case they would have to break the contract at that stage and take a loss. For a Forward Contract, as there is no payment until the expiry date this couldn’t happen.

Aside from these issues, this contract seems to be essentially the same as a Forward Contract, since it guaranteed a transaction on Day 5 at $100. However, so far we have ignored interest rates, which turn out to make a subtle difference to the final payoff (and which make the maths interesting!).

I’m going to to another post soon comparing these two contracts in the presence of interest rates, but I’ll tell you the answer right now to whet your appetite: in the presence of deterministic interest rates, the two payoffs are the same. This isn’t surprising – if we know what will happen in the future then we can hedge by replication both of the contracts (Forward Contracts in the way described in the previous post on them; Futures Contracts as I will describe next time – but have a go and try to construct a replication strategy yourself) so that their values are fixed today. However, this is actually quite unrealistic and most real models of interest rates include stochastic interest rates of some type – in this case, the Futures Contract becomes a path-dependent derivative and we can’t construct a static hedge for it at the current time, since we don’t know what the interest rate will look like at each stage! More to come soon.

New Section: Monte Carlo in C++

Good news, this blog has a new section – Monte Carlo pricers! I’m going to experiment to find the most convenient way of making it available for download, and would like it to be available both as a compiled .exe and also as uncompiled source code to allow readers to examine and alter it.

The first iteration can be found online now in the MONTE CARLO section. It is code for a straight-forward Monte Carlo pricer that will price calls and puts, with uniform random variates selected by park-miller and converted into gaussian variates using one of the box-muller processes (for more information on these, refer to the post on Random Numbers).

I’ve used a factory pattern so in principle it should be very straight forward to add new options and also RNG processes to the code, although there are a few more improvements that I will be making in the near future, which will extend the code to basic path-dependent options and allow easier data input by the user. I’ll be going over various aspects of the code in detail in future posts, and also putting together some instructions on how to compile the code yourself using Dev C++ (a free, open-source compiler) and how to add options and processes via the factory.

As always, if you have any problems getting the code to work, please tell me!


I’ve mentioned Forwards many times in the blog so far, but haven’t yet given any description of what they are, and it’s about time for a summary! Forwards were probably the first ‘financial derivative’ to be traded, and they still occupy a central role in the field today.

At the most basic level, a Forward Contract is a delayed sale between two parties. The contract is signed now and the sale and the price agreed to, but the transaction only happens at a later date. We can immediately see why these contracts would be popular, as they insulate the people who have signed the contract from risk due to price variations. For example, an airline can foresee fairly accurately its fuel requirements over the next year, so might wish to enter into a Forward Contract with an oil company for a certain amount of fuel to be delivered next year, in order not to be exposed to the risk of price rises. Meanwhile, the oil company knows its costs of production, and might well also decide that as long as the contract price is profitable it is locking in business for the future and avoiding the risk of price falls harming its operations.

The first thing to note is that, unlike options, the contract is binding – both parties are obliged to enter into the transaction at the given date, there is no optionality in this contract. The agreed price has different names, but for consistency with other products quants tend to call it the Strike Price, K.

Forward Contracts are the most simple form of derivative. A derivative is a financial instrument that derives it’s price from some other (‘underlying’) instrument. In the example above, the underlying was the cost of oil, but it could equally well have been foreign exchange rates, stock prices or interest rates – indeed, all of these are different sorts of risks that companies will in general want to hedge out. What is the price of a Forward Contract and how does it depend on the underlying? If the price of oil rises after the contract above had been signed, the airline company would breath a sign of relief – it has locked in the forward price for its oil via the Forward Contract so the contract is clearly a valuable asset for it. By contrast, the oil company will be miffed, as it could have sold the oil on the market for a better price. If prices fall, situations will be reversed. How can we express this mathematically?

As a simple example, I will consider a Forward Contract for me to buy a stock from a bank in a year’s time for $100. We can calculate the price of this contract using a technique called ‘replication’. What we need to do is construct a portfolio that exactly matches the payments implicit in the Forward Contract, and according to the ‘Law of One Price’, the price of these two matching portfolios must be the same. Call now t=0 and the expiry of the contract t=T.

Portfolio 1: 1 Forward Contract on S at strike price $100 and at time T. Transactions are: at time T, I pay $100, and receive one unit of stock. This will be worth S(T), the price of the underlying at that time. There are no net payments at time t=0.

How can we match this? Well, we will need to ensure the receipt of an amount of cash S(T) at time T, and an ideal way of ensuring this is by buying a unit of the stock at t=0 and holding it, since this is guaranteed to be worth S(T) at t=T. We also need to match the payment of the strike price, $100, at t=T. An ideal product to use for this is a Zero Coupon Bond maturing at t=T (this is equivalent to borrowing money and you can see it either way. We’re making all of the usual assumptions about being able to go long and short equally easily etc.). Recall that Zero Coupon Bonds are worth $1 at time t=T, and worth

    \[{\rm ZCB}(0,T) = {\mathbb E} \Bigr[ e^{-\int_0^T r(t')dt'} \Bigl] = e^{-rT}\]

at t=0, with the last equality holding only in the case of constant interest rates. So, we can short 100 ZCBs at t=0, these will mature at t=T when we will have to pay the holder $100. The holder will pay us the present value at t=0, which is 100 \cdot ZCB(0,T). We’ve constructed a portfolio that matches all of the payments of portfolio 1 at t=T:

Portfolio 2: Long 1 underlying stock S, short 100 ZCBs; close all positions at t=T.

How much does it cost to enter into portfolio 2? As we’ve said, this will identically be the price of portfolio 1. The cost is

    \[{\rm Fwd}(0,T) = S(0) - K \cdot {\rm ZCB}(0,T)\]

    \[= {\rm ZCB}(0,T) \cdot \Bigl({S(0) \over {\rm ZCB}(0,T)} - K \Bigr)\]

We can see that this price is a linear function of K – being long a Forward Contract is better if the strike is lower! We see that its price doesn’t depend on any assumptions about the underlying growth rate (the only assumption was that we can readily trade it on the market at the two times t=0 and t=T), and depends instead on the risk-free interest rate r(t). For this reason the contract is described as ‘Model Independent’ – its price doesn’t depend on any assumptions we make about the world, we can enforce it by no arbitrage using prices of things that are available to buy right now. Compare that to the hedging strategy for a vanilla option discussed here, which relied on continuous re-hedging in the future, and is hence open to risk if our model of future volatility and interest rate isn’t perfect (and it won’t be!).

The ‘Forward Price’ is defined as the strike which makes this contract valueless for both parties – the ‘fair price’ – looking at the equation above we can see that this must be

    \[{S(0)\over {\rm ZCB}(0,T)} = S(0)\cdot{\mathbb E}\Bigl[ e^{\int^T_0 r(t') dt'}\Bigr] = S(0)\cdot e^{rT}\]

with the second equality again only valid for constant rates. The Forward Price is exactly the expectation of the underlying stock S in the risk-neutral measure, it appears across quantitative finance and often simplifies algebra, consider for example the following two ways of expressing the BS equation for vanilla option prices:

    \[C_{\rm call}(K,T,\sigma,r,\delta) = S(0)\cdot \Phi(d_1) - \delta(0,T) K \cdot \Phi(d_2)\]

    \[C_{\rm call}(K,T,\sigma,r,\delta) = \delta(0,T) \Bigl( {\rm Fwd}(0,T)\cdot \Phi(d_1) - K \cdot \Phi(d_2) \Bigr)\]

with the various terms as defined in previous posts (nb. the discount factor here from t=0 to t=T is exactly the same as a ZCB expiring at T). I certainly prefer the second form – it is expressed in terms of quantities depending on the same time T, and all of the discounting is handled outside the brackets, so it is much more easily modularised when coding it up.

Compare the payoff of a forward at expiry with that of a call and a put option. How are they related? It turns out that

    \[C_{\rm call}(K,T,\sigma,r,\delta) - C_{\rm put}(K,T,\sigma,r,\delta) = \delta(0,T)\Big({\rm Fwd}(K,T,r,\delta) - K\Big)\]

This result is called put-call parity. It comes about because if we’re long a call and short a put, we will exercise the call only if the price goes up and the counterparty will exercise the put only if the price falls – we’ve effectively locked in a sale at the strike price K at time T – such a strategy is called a synthetic forward. Note this is actually quite a deep result – although the price of both a call and a put option will depend on the model that we assume for the underlying, the difference in the two prices is model-independent and enforceable by arbitrage! If we assume an increased level of volatility over the option lifetimes, it will increase both prices by the same amount.

Payoffs at expiry for a Forward Contract compared to vanilla calls and puts.
Payoffs at expiry for a Forward Contract compared to vanilla calls and puts. Since a Forward Contract obliges to enter a transaction at expiry at strike K, the payoff can now be negative. If strike is $100 as shown above, but spot has fallen to $70, then the contract is worth -$30, as we could have bought the underlying from the market at $70 but are instead obliged to pay $100 to honour our Forward Contract. Similarly, if the price rises, our contract payoff will be positive. By comparing to the vanilla payoffs at expiry, we can see that a long call and a short put will have the same payoff as a Forward Contract, a result called put-call parity.

Up to this point, I’ve made the implicit assumptions that the underlying asset doesn’t pay any income, it’s free to hold it, and we’re able to go long or short with equal ease. Of course, in general, none of these are true. If the asset pays income like dividends, or if there are costs of storage, it is fairly easy to adjust our replication argument above to find a new price for the Forward Contract that takes into account the extra cash flows associated with holding the underlying over the period 0 \textless t \textless T. However, we can’t always go short the underlying asset – a particular case of this will be commodities, where the asset in question may not exist at the current time (ie. if you haven’t dug it out of the ground or transported it to the delivery point at t=0!), in which case there is no reason that the forward price should be related to the spot price in the deterministic way described above – this is because we have violated our assumption about being able to trade the underlying at the two times t=0 and t=T, which was implicitly vital! However, the price of the Forward Price will still converge towards the expected future spot price as we approach t=T even in this case, since a Forward Contract for transaction at the current time is simply a spot transaction!

One final word about Forward Contracts – they are ‘Over The Counter’, which means they are signed between two counterparties directly and thus carry Credit Risk – there is a chance that your counterparty might go bankrupt before the contract expiry, in which case the transaction won’t be completed. If your contract was out-of-the-money, you will still be expected to settle your position with them, while if it was in-the-money they may well not be able to pay you your full dues, so credit risk will decrease the actual value of the contract. There are advanced ways of accounting for this that I deal with in a later post (CVA – Credit Valuation Adjustment); alternatively there is a similar product called the Futures Contract, which I will deal with very soon in another post, which is superficially similar to the Forward Contract (leading to MUCH confusion in the industry!) but also attempts to deal with this credit risk.

BS from Delta-Hedging

Today I’m going to look at another method of getting to the BS equations, by constructing a delta-hedge. This is the way that the equation was in fact first reached historically, and it’s a nice illustration of the principle of hedging. All of the same assumptions are made as in the post that derived the BS equation via Risk Neutral Valuation.

The principle is that because the price of some derivative C(S,t) is a function of the stochastic underlying S, then all of the uncertainty in C comes from the same source as the uncertainty in S. We try to construct a risk-free portfolio made up of the two of these that perfectly cancels out all of the risk. If the portfolio is risk-free, we know it must grow at the risk free rate r(t) or else we have an arbitrage opportunity.

Our model for S is the geometric brownian motion, note that we allow the rate of growth \mu in general to be different from r

    \[dS = \mu Sdt + \sigma SdW_t\]

We can express dC in terms of its derivatives with respect to t and S using Ito’s lemma, which I discussed in a previous post,

    \[dC = \Bigr[ {\partial C \over \partial t} + \mu S{\partial C \over \partial S} + {1 \over 2}\sigma^2 S^2{\partial^2 C \over \partial S^2} \Bigl] dt + \sigma S{\partial C \over \partial S} dW_t\]

Our portfolio is made up of one derivative worth C(S,t) and a fraction \alpha of the underlying stock, worth \alpha \cdot S; so the net price is C(S,t) + \alpha S. We combine the above two results to give

    \begin{eqnarray*} d(C+\alpha S) &=& \Bigr[ {\partial C \over \partial t} + \mu S{\partial C \over \partial S} + {1 \over 2}\sigma^2 S^2{\partial^2 C \over \partial S^2} + \mu S\alpha \Bigl] dt \\ \nonumber \\ &   & + \, \sigma S \bigl( {\partial C \over \partial S} +\alpha \bigr) dW_t \nonumber \end{eqnarray}

We are trying to find a portfolio that is risk-free, which means we would like the stochastic term to cancel. We see immediately that this happens for \alpha = -{\partial C \over \partial S}, which gives

    \[d(C+\alpha S) = \Bigr[ {\partial C \over \partial t} + {1 \over 2}\sigma^2 S^2{\partial^2 C \over \partial S^2}\Bigl] dt\]

Since this portfolio is risk-free, to prevent arbitrage it must grow deterministically at the risk free rate

    \[d ( C + \alpha S) = r ( C + \alpha S) dt\]

and so

    \[rC= {\partial C \over \partial t} + {1 \over 2}\sigma^2 S^2{\partial^2 C \over \partial S^2} + rS{\partial C \over \partial S}\]

This is the BS partial differential equation (pde). Note that despite the fact that the constant growth term for the underlying had a rate \mu, this has totally disappeared in the pde above – we might disagree with someone else about the expected rate of growth of the stock, but no-arbitrage still demands that we agree with them about the price of the option [as long as we agree about \sigma, that is!]

As for any pde, we can only solve for a specific situation if we have boundary conditions – in this case, given by the payoff at expiry t=T. At that point we know the exact form the value that C(S,T) must take

    \[C_{\rm call}(S,T) = \max(K-S,0)\]

Our job is to use the pdf to evolve the value of C(S,t) backwards to t=0. In the case of vanilla options this can be done exactly, while for more complicated payoffs we would need to discretise and solve numerically. This gives us another way of valuing options that is complementary (and equivalent) to the expectations approach discussed previously.

To solve the equation above, it is useful to first make some substitutions. As we are interested in time-to-expiry only, we make the change of variables \tau = T - t which yields

    \[rC= -{\partial C \over \partial \tau} + {1 \over 2}\sigma^2 S^2{\partial^2 C \over \partial S^2} + rS{\partial C \over \partial S}\]

We can eliminate the S terms by considering change-of-variables M = \ln S. This means that

    \[{\partial C \over \partial S} = {\partial C \over \partial M}{\partial M \over \partial S} = {1 \over S}{\partial C \over \partial M}\]

    \begin{eqnarray*} \partial^2 C \over \partial S^2} = {\partial \over \partial S} \Bigl({\partial C \over \partial S}\Bigr) & = & {\partial \over \partial S} \Bigl({1 \over S}{\partial C \over \partial M}\Bigr) \nonumber \\ \nonumber \\ & = & {-1 \over S^2} {\partial C \over \partial M} + {1 \over S}{\partial \over \partial S} \Bigl({\partial C \over \partial M}\Bigr) \nonumber \\ \nonumber \\ & = & {-1 \over S^2} {\partial C \over \partial M} + {1 \over S}\bigl({\partial M \over \partial S}{\partial \over \partial M}\bigr) \Bigl({\partial C \over \partial M}\Bigr) \nonumber \\ \nonumber \\ & = & {-1 \over S^2} {\partial C \over \partial M} + {1 \over S^2}{\partial^2 C \over \partial M^2 \nonumber \end{eqnarray}

Combining these the BS equation becomes

    \[rC= -{\partial C \over \partial \tau} + {1 \over 2}\sigma^2 {\partial^2 C \over \partial M^2} + \bigr( r - {1 \over 2} \sigma^2 \bigl) {\partial C \over \partial M}\]

The linear term in C can be removed by another transformation D = C e^{-r\tau} so that

    \begin{eqnarray*} {\partial D \over \partial \tau} &=& {\partial C \over \partial \tau}e^{-r\tau} - rCe^{-r\tau} \nonumber \\ \nonumber \\ {\partial^n D \over \partial M^n} &=& {\partial^n C \over \partial M^n}e^{-r\tau} \nonumber \end{eqnarray}

The exponential terms cancel throughout, and we are left with

    \[0 = -{\partial D \over \partial \tau} + {1 \over 2}\sigma^2 {\partial^2 D \over \partial M^2} + \bigr( r - {1 \over 2} \sigma^2 \bigl) {\partial D \over \partial M}\]

One final transformation will be needed before putting in boundary conditions. The transformation will be

    \[y = M + \Bigr( r -{1\over 2 }\sigma^2\Bigl)\tau\]

But unlike the other transformations I’ve suggested so far, this one mixes the two variables that we are using, so a bit of care is required about what we mean. When I most recently wrote the BS equation, D was a function of M and \tau – this means that the partial differentials with respect to \tau were implicitly holding M constant and vise versa. I’m now going to write D as a function of y and \tau instead, and because the relationship features all three variables we need to take a bit of care with our partial derivatives:

    \[dD(M,\tau) = {\partial D \over \partial \tau}\bigg|_{M} d\tau + {\partial D \over \partial M}\bigg|_{\tau} dM\]

where vertical lines indicate the variable that is being held constant during evaluation. Now, to move from M to y, we expand out the dM term in the same way as we did for dD above

    \[dD(M(y,\tau),\tau) = {\partial D \over \partial \tau}\bigg|_{M} d\tau + {\partial D \over \partial M}\bigg|_{\tau} \Bigr({\partial M \over \partial \tau}\bigg|_{y} d\tau + {\partial M \over \partial y}\bigg|_{\tau}dy\Bigl)\]

    \[= dD(y,\tau) = {\partial D \over \partial \tau}\bigg|_{y} d\tau + {\partial D \over \partial y}\bigg|_{\tau} dy\]

We can compare these last two equations to give expressions for the derivatives that we need after the transformation by comparing the coefficients of d\tau and dy

    \begin{eqnarray*} {\partial D \over \partial \tau}\bigg|_y &=& {\partial D \over \partial \tau}\bigg|_M + {\partial D \over \partial M}\bigg|_{\tau} {\partial M\over \partial \tau}\bigg|_y \nonumber \\ \nonumber \\ {\partial D \over \partial y}\bigg|_{\tau} &=& {\partial D \over \partial M}\bigg|_{\tau}{\partial M \over \partial y}\bigg|_{\tau} \nonumber \end{eqnarray}

Computing and inserting these derivatives [I’ve given a graphical representation of the first of these equations below, because the derivation is a little dry at present!] into the BS equation gives

    \[0 = -{\partial D \over \partial \tau}\bigg|_y + {1 \over 2}\sigma^2 {\partial^2 D \over \partial y^2}\bigg|_{\tau}\]

This is the well-known Heat Equation in physics. For the sake of brevity I won’t solve it here, but the solution is well known – see for example the wikipedia page – which gives the general solution:

    \[D(x,\tau) = {1 \over \sqrt{2\pi \sigma^2 \tau}}\int^{\infty}_{-\infty} e^{-(x-y)^2 \over 2\sigma^2 \tau} p(y) dy\]

Where p(y) is the payoff condition (it’s now an initial condition, as expiry is at \tau = 0). The algebra is quite involved so I give the solution its own post, and you can show by substitution that the BS option formulae given previously is a solution to the equation.

An illustration of the difference between partial differentials when a change of variables involving both current varibles is used
An illustration of the difference between partial differentials when a change of variables involving both current variables is used. This should be thought of as a contour plot with the value of D on the out-of-plane axis. The amount D changes when moving a small amount dt depends on which direction you are moving in, as shown above.

As an aside, what was the portfolio that I was considering all of the way through? Comparing \alpha to the vanilla greeks, we recognise it as the option delta – the hedging portfolio is just the portfolio of the option with just enough stock to hedge out the local delta risk. Of course, as time goes by this value will change, and we need to constantly adjust our hedge to account for this. This shows the breakdown caused by one of our assumptions – that we could trade whenever we want and without transaction costs. In fact, because we need to re-hedge at every moment to enforce this portfolio’s risk free nature, in the presence of transaction costs the hedging costs in this strategy will be infinite! This demonstrates a significant failing of one of our assumptions, I’ll come back again to the effect of this in the real world in future posts.

Time Varying Parameters

When I discussed the BS equation here, one of the assumptions was that r and \inline \sigma were constant parameters. In reality, neither of these will be constant: how much of a problem is this for us? In general, they will both be stochastic and hence unpredictable in the future. In this post however I’m going to stick to deterministic quantities and demonstrate that BS can be readily extended to time-varying rates and vols. This will enable us to price at-the-money options correctly, but still won’t help us with the vol smile effect that I discussed here.

If r and \inline \sigma are time varying, the stochastic differential equation describing the underlying spot price in BS is

{ dS \over S} = r(t)dt + \sigma(t)dW_t

Using Ito’s Lemma as discussed before, this can be re-written in terms of the log as

d(\ln S) = \Bigl( r(t) - {1 \over 2}\sigma^2(t)\Bigr) dt + \sigma(t) dW_t

And hence

S(t) = S(0).\exp{ \Bigr[ \Bigr( \bar{r} - {1 \over 2}\bar{\sigma}^2 \Bigl)t + \bar{\sigma} \sqrt{t} z \Bigl] }

where \inline z \sim {\mathbb N}(0,1). This is still lognormal as before, but we now have an effective rate and an implied volatility defined by (you can see these by comparing to the distribution coming from constant r and \inline \sigma):

\bar{r}(t) = \int_0^t r(t')dt'

\bar{\sigma}(t) = \sqrt{ {1 \over t} \int_0^t \sigma^2(t')dt'}

\inline \sigma(t) is called the instantaneous vol, but the relevant quantity for option pricing is always the implied vol \inline \bar{\sigma}(t). Assuming that we have a discount curve constructed from traded bonds we can calculate r(t) from that as I described here, and if we can see liquid at-the-money (ATM) options on the market at different times we can also calculate the \inline \sigma(t) function consistent with them from their implied volatilities, which we will need to simulate paths in Monte Carlo, via the following procedure

  • Take all available ATM options, label their times in order \inline \{ t_1, t_2, \cdots , t_n \}
  • Calculate their implied vols using a vol solver (eg. the one on my PRICERS page). These correspond to \inline \{ \bar{\sigma}(t_1), \bar{\sigma}(t_2), \cdots , \bar{\sigma}(t_n) \}
  • As a first approximation, we will assume \inline \sigma(t) is constant between time windows (although you could make approximations arbitrarily complicated). In this case, for \inline 0 < t < t_1 we have

\bar{\sigma}(t_1) = \sqrt{ {1 \over t_1} \int_0^{t_1} \sigma^2dt'} \qquad 0 < t' < t_1

  • Which works out to simply \inline \sigma(t) = \bar{\sigma}(t) in this region
  • For later vols, the procedure is a little more complicated:

\bar{\sigma}(t_2) = \sqrt{ {1 \over t_2} \int_0^{t_2} \sigma^2(t')dt'} = \sqrt{ {1 \over t_2} \int_{t_1}^{t_2} \sigma^2(t')dt' + \bar{\sigma}^2(t_1)}

t_2 \cdot\bar{\sigma}^2(t_2) - t_1 \cdot\bar{\sigma}^2(t_1) = \int_{t_1}^{t_2} \sigma^2(t')dt' = (t_2 - t_1) \sigma^2(t_1)

\sigma(t_1) = \sqrt{t_2 \cdot\bar{\sigma}^2(t_2) - t_1 \cdot\bar{\sigma}^2(t_1) \over (t_2 - t_1) }

  • And for general time window \inline t_n < t < t_{n+1}:

\sigma(t) = \sqrt{t_{n+1} \cdot\bar{\sigma}^2(t_{n+1}) - t_n \cdot\bar{\sigma}^2(t_n) \over (t_{n+1} - t_n) } \qquad t_n < t < t_{n+1}

  • And this final expression can be used to calculate the value of the instantaneous vol required between each time window to give the correct ATM implied vol

Note that this will fail if \inline t_{n+1} \cdot\bar{\sigma}^2(t_{n+1}) < t_n \cdot\bar{\sigma}^2(t_n) – this is because we expect the distribution variance \inline t \cdot\sigma^2(t) to be an increasing function of time. If it didn’t hold for any time windows we’d have an arbitrage opportunity, selling the first option and buying the second to lock in a risk-free profit.

As I mentioned, this extension allows BS to correctly price options and forwards at different times by matching to market observables, but still doesn’t predict any volatility smile. Can we take it any further? In fact we can: in a later post I will discuss the local vol model which extends \inline \sigma(t) to be \inline \sigma(S,t) and will allow us to match any set of arbitrage-free market prices.

Random Number Generators

I’m in the process of preparing some C++ code for the blog to allow user-downloadable and -compilable Monte Carlo pricers that won’t overload my server. That’s got me thinking about random number generators, and I’m going to talk a bit about them here [In fact, all of the algorithms discussed below are entirely deterministic, so I should really call them pseudo-random number generators. Whether or not particular algorithms are suitable for use in applications requiring random numbers is an area of active research and occasional catastrophic failure].

Uniform RNGs

Generally, in Monte Carlo simulations we want to be able to generate many ‘sample paths’, each of which is a realisation of a possible evolution of the underlying price, and evaluate the payoff for that path. As I discussed in an earlier post, we usually model these quantities as depending on brownian motions and results are normally distributed, which means we need to generate random variables that are also normally distributed (so, for example, we are more likely to get values near to zero than at large values).

Generating these directly is quite tricky, so the first step is usually to generate uniform random variables and then transform these to gaussian variates. A uniform random variable is a variable with a constant probability of being anywhere within a certain range, typically [0,1] or [-1,1] – for the rest of this post, I denote a uniform variable as U[…], and define u~[0,1] and v~[-1,1]. The pdf of u and v are shown here:

The Pdfs of two uniform random variables
The Pdfs of two uniform random variables, over ranges [0,1] and [-1,1]

A straight-forward method of generating these numbers uses modular multiplication [this is variously called a Park-Miller generator, a linear congruential generator and a Lehmer generator]:

x_{k+1} = g \cdot x_k \mod n

For good choices of the variables, this will generate a sequence with a period of n that is uniformly distributed across the interval [0,n], so dividing by n rescales the variables to be distributed roughly according to u. The choices of g, n and \inline x_0 (the ‘seed’) are non-trivial and several are given in the literature, as usual I’ve followed the advice given on wikipedia and lumped for n = 2^32 – 5, g = 279470273, and the choice of seed is left to the user but should be coprime to modulus n. Note that this will have a period of about 2^32, so the upper limit on the number of monte-carlo evaluations that can meaningfully be done is around 4 billion steps. It is also easy to see how the generator could be hampered by a poor choice of parameters – if the multiplier and the modulus are not co-prime, for example, the actual period of the sequence can be much less than n.

The more advanced Marsenne Twister algorithm aims to correct many of the problems with the Park-Miller generator, such as increasing the period length of the sequence and improving the statistical randomness of the values produced. It’s too complicated to go through here (it’s a full post in itself!) in full detail but many good summaries of the theory and performance of the twister algorithm can be found online.

One other thing to point out is that if initilised in the same way, each time one of the above algorithms is run it will return the same sequence of numbers. At first this might seem like an unwanted property as it deviates from ‘true’ randomness. However, in our context it will be extremely valuable as it allows us to test code using the same sequence of numbers each time. This is vital for calculating greeks by bump-and-revalue using monte carlo as otherwise monte carlo error in price will usually swamp small changes due to greeks, and it also helpful when optimising code to make sure that results aren’t being affected.

Gaussian RNGs

Once we have a source of uniform random variates, we need to consider methods for converting them to gaussian variates. The simplest and most direct is the inversion method, which involves finding a mapping between uniform random variate u and the standard normal random variate z. Consider first the CDF of the normal variable (I have discussed this before here), shown below:

Cdf of the normal distribution
CDF of the normal distribution, which gives the probability of the variable being less-than-or-equal-to the value of the CDF at each point

This shows us the likelihood that z is below a certain value X, and is equal to the integral of the PDF from \inline -\infty to X. Because the PDF is normalised to integrate to 1, the CDF maps each possible outcome of z to a value between 0 and 1, with the most likely outcomes being where the CDF is steepest (since these cover the y-axis values more quickly) which is exactly where the PDF is largest. This is exactly the opposite of what we want, so we consider instead the inverse CDF shown here:

The inverse CDF - called the quantile function
The inverse CDF – called the quantile function – of the normal distribution (as usual, the inverse function is the function reflected in the line x=y). This function allows us to map a uniform variate onto the distribution in question. For example, the graph shows a uniform variate draw of 0.8 would map to a value of z = 0.8416 for z distributed according to the standard normal distribution

This mapping is exactly what we’re after [although there are others – it’s not unique]. It maps uniformly distributed variates to normally distributed variates and they’re in a 1-to-1 correspondence. This procedure is robust and in principle it is exact. However, it is not always the fastest method, and it requires us to know the inverse CDF, which to evaluate in a reasonable time requires us to make approximations (this technique works for any distribution – although there is no closed form CDF for the normal distribution, for many other distributions the inverse CDF can be expressed in closed form, in which case this is almost certainly the best method).

There are several other techniques for obtaining normal distributions from uniform variates, a few of which I have implemented and describe them here because they’re quite interesting. The first is called the Box-Muller technique, which converts two independent uniform variates into two independent normal variates by combining the inversion technique described above, the trick that we use to calculate a gaussian integral and the change of variables law for Pdfs that I discussed here.

Consider two independent normally-distributed variables x and y. The joint PDF of the two is

p(x,y)dxdy = p_z(x)p_z(y)dxdy = {1 \over 2 \pi}\exp{ \Bigl( -{1 \over 2}(x^2 + y^2)\Bigr) }dxdy

As for gaussian integration, we can re-express this in polar co-ordinates remembering that \inline dxdy = rdrd\theta

{1 \over 2 \pi}\exp{ \Bigl( -{1 \over 2}(x^2 + y^2)\Bigr) }dxdy = {1 \over 2 \pi}e^{-{1\over 2}r^2} rdrd\theta = p_R(r)p_{\theta}(\theta)rdrd\theta

If we can simulate a value of r and theta using uniforms that obeys these densities, we will be able to transform them to normals x and y straightforwardly. There is radial symmetry so we can integrate over \inline d\theta to give us the radial cumulative probability density (ie. radial CDF)

P(r\leq R) = \int^R_0 r' e^{-{1\over 2}r'^2} dr'

Unlike x or y, we can calculate this in closed form and invert it

\int^R_0 r' e^{-{1\over 2}r'^2} dr' = \Bigl[ -e^{-{1\over 2}r'^2} \Bigr]^R_0 = 1 - e^{-{1\over 2}R^2}

P^{-1}(u) = \sqrt{-2 \ln(1 - u)}

This quantile function allows us to select a point with the correct radial distribution using a first uniform variate, and we then need to select a theta from a uniform distribution around angles – this can be done by multiplying a second uniform variate by \inline 2\pi [nb. we can simplify the radial quantile further – note that if u is uniformly distributed across [0,1], then so is (1-u), so we can substitute u for (1-u)]. The values of normally distributed variates are given by the x- and y-coordinates of the resulting point, ie.

z_1 = \sqrt{-2 \ln u_1}\cdot \sin(2\pi u_2)

z_2 = \sqrt{-2 \ln u_1}\cdot \cos(2\pi u_2)

This is a really useful trick, quickly converting two uniform variates into two gaussian variates. We can improve the speed still further if we can get rid of the computationally expensive sine and cosine functions, for which there is another nifty trick using rejection. In the final step, we inserted two uniform variates to produce \inline z_1 and \inline z_2. We can use the following transformation to turn our initial uniform variates into two new variates that avoid the need for a sine or cosine:

Instead of taking \inline u_1 and \inline u_2, consider instead two uniform variables \inline v_1 and \inline v_2 across [-1,1]. Let \inline s = r^2 = v_1^2 + v_2^2, and if s > 1 then reject the values and start again. The variables now lie within the unit sphere as shown below:

Polar Box Muller procedure
Polar Box Muller procedure. Two alternative values s and theta are generated from the initial random variables U1 and U2. If these fall inside the unit circle, it can be shown that they are also uniformly distributed, so are equally suitable for use in the Box-Muller formulae. The advantage is algebraic simplicity – because theta is an angle, the ratio of s and the initial variates is already a sine or cosine, so we don’t need to calculate these explicitly

We can use change of variables to show that after rejecting external points, both theta and s are also uniform on [0,1]:

p(s)ds = p(r)rdr = 2rdr

p(s) = 2r{dr \over ds} = 2r \cdot {1 \over 2r} = 1

Above, we used \inline u_1 and \inline u_2 to generate a value of r and angle, but since the s and \inline \theta that we’ve generated here are also uniform on [0,1] they can be used just as well. Because of our definitions, \inline \sin(2\pi \theta) = {v_1 \over r}, so new expressions are:

z_1 = \sqrt{-2 \ln s}\cdot {v_1 \over r}

z_2 = \sqrt{-2 \ln s}\cdot {v_2 \over r}

We’ve got an even more efficient set of formulae, at the cost of having to throw out around 20% of our values. It will turn out that this is often quicker than the basic version of the algorithm, but is totally useless if we move to pseudo-random numbers, as I will discuss another time.

The final technique I want to talk about is a generalisation of acceptance-rejection, which we’ve just touched on and was also used implicitly in my first post on Monte Carlo.

The idea here is once again to simulate a distribution f(x) by sampling instead from another distribution g(x) that overlaps it everywhere (we can rescale g(x) by a constant A to ensure this), and rejecting values that fall inside A.g(x) but outside of f(x). This is usually used when we can simulate g(x) directly from uniforms by using its inverse CDF – perhaps the best way to demonstrate how this works is through another example. We let f(x) be the normal PDF, and g(x) the exponential distribution, such that

g(x) = \lambda e^{-\lambda x}

This only has support on positive x, so we redefine it to cover positive and negative x, and rescale it so that it is everywhere larger than the normal pdf

A\cdot g(x) = e^1 \cdot e^{-|x|}

and calculating the inverse CDF gives

G^{-1}(u) = \begin{matrix} -\ln{2u} & 0.5 \leq u < 1 \\ \ln{2u} & 0 \leq u < 0.5 \end{matrix}

We generate two independent uniform draws \inline u_1 and \inline u_2. The first is used to simulate a variate distributed according to A.g(x) via the inverse CDF given here. Then we compare the values of A.g(x) at that point to the value of f(x) evaluated at that point. The second uniform variate is compared to the ratio of the two – if it is GREATER than the ratio, our point has fallen ‘outside’ of f(x), so it is rejected and we start again. If it is LOWER than the ratio, we accept the value of \inline G^{-1}(u) as a realisation of our target distribution f(x).

An example of Acceptance Sampling
An example of Acceptance Sampling. Because of the form of g(x), we are able to write the inverse CDF in closed form and use it to generate variates distributed according to g(x) from uniform variates. For each variable, we compare the ratio of f(x) and g(x), and take another uniform variate to compare to. If it is greater than the ratio, all of the variables are discarded, while if it is less, we include the original value taken from g(x) in our set of random numbers. This set is distributed according to f(x), but we may need to sample many times if the functions f(x) and g(x) aren’t similar (in which case we will have numerous rejections)

An obvious problem with this technique is that if the distributions don’t overlap well, many variates will be rejected, so we need a good uniform generator to provide enough independent variables for the algorithm to work.

The algorithm can be made arbitrarily complex by designing trial distributions g(x) that match f(x) as closely as possible, such as the ‘Ziggaraut algorithm’ which optimises the approach by breaking g(x) into many parts matched to different sections of f(x), which and is often used to generate normal random variates. As with the Marsenne Twister, there are many good guides to this algorithm available online.