Interview Questions II

A friend visited last weekend and after several beers challenged me to a numbers game. I liked it and thought it would make a good post – apparently he read it in a book of interview questions so there’s a good excuse for me to put it on here! The variant of the game we played went as follows, although there is some scope for changing the various numbers.

The first player says any number between 1 and 10. The second player then follows with a larger number, but it can be no more than 10 larger. So, if player one had said 4, player two could say any number between 5 and 15. Play continues in turn until someone gets to exactly 50, at which point they are the winner and play finishes.

There is actually a simple rule to play by to ensure victory, but the best way to proceed is to play a few games with someone first to get the feel of play. The solution is below, along with the rest of this post.





Solution: The winner is the first person to say 50. But, after playing a few times you’ll rapidly notice that another good number to arrive at is 39. If I say 39, the other player has to say a number between 40 and 49, any of which allow me to say 50 on the next move and win; so 39 is a winning number as well. But of course, I can ensure that I am the person who says 39 if I am also the player who says 28, by the same logic. I can ensure that I say 28 if I also say 17, and 6. So, if I can be the player that says 6, I can be certain of victory – and since this is possible on the first move, I can always win if I go first. Is this always the case? Have a think about the case where the victor is the first person to say 55, instead of 50!

This backwards-propagation of the result is a lot like option pricing (the link seems slightly tenuous, but they ‘feel’ rather similar to me). With an option, we know what the payoff will look like as a function of spot at expiry, but we don’t know what it looks like before then – the whole challenge is to propagate the final payoff backwards in whatever model we are using to the current date, which can be done using backwards pdes or expectations.

As an extra, if you are interested in a slightly more complicated variant, try the following game (again for two people). Arrange 15 matchsticks in rows to form a pyramid as follows:


On each turn, remove some matchsticks from a single row. You can remove as few or as many as you like, but they have to come from the same row. The winner is the person who removes the last match/matches.

[The game has been thoroughly solved, it’s called Nimh and a quick internet search should tell you everything you need to know, but I strongly recommend having a play with a friend first to try and get a feel for how it works!]

The Discount Curve, ZCBs and the Time Value of Money

This is a foundational piece about the time value of money. It will feel a bit more like accountancy rather than mathematical finance, but it’s the absolute bedrock of what we do so definitely worth spending a bit of time on!

$100 today is worth more than $100 in a year’s time. This should be obvious – if I have $100 today, I can put it into a bank account and earn (back in the good old days!) perhaps 3% on it – so in a year’s time I have $103.

For the rest of this post (and in much of finance), I will assume that I can put money on deposit at a risk-free rate r(t), and that it will grow according to the following p.d.e.:

dB = r(t) B dt

note that this has no stochastic component – it’s solution is risk-free, exponential growth so that if I start with \inline B(0) at time 0, by time T it has grown to \inline B(T) = B(0)\exp {\int^T_0 r(t) dt} – compound interest with ‘short rate’ r(t).

Time varying interest rates can be troublesome to work with, a more convenient – but closely related – concept is the Zero Coupon Bond (ZCB). This is a bond that I buy today with a specific maturity date, at which I will get paid exactly $1. It doesn’t pay any interest in between, and since I’ve said that money is more valuable now than later, I expect to pay less than $1 for the coupon now to reflect this. How much less? Well, thinking about the bank account example above, I locked away my $100 to receive $103 a year later – this is just the same as buying 103 ZCBs maturing in a year – so the price of each bond must be 100/103 = $97.09 or there is an arbitrage opportunity. As we can see, there is a 1-to-1 correspondence between interest rate curves and ZCB prices over a period, knowing one allows us to calculate the other in a rather straightforward way,

\delta(0,T) = e^{-\int^T_0 r(t)dt}

where \inline \delta is the price of a ZCB bought now with maturity at T [the above formula is only strictly true in the context of deterministic rates – more on this later]. I’ve added a little doodle here to allow conversion between a (constant) interest rate and time period and a discount factor and also on the PRICERS page – I’ll add functionality for time-varying rates another time.

Rate: %; Time: Years


The collection of all of the ZCB prices over a period is called a discount curve. This tells us how much a payment at a particular time is worth in today’s money. As a bank, if a customer comes to me and asks to borrow $10,000 for five years at 8\% interest, the payments I will receive from him will be $800 per year for the next five years and then the return of the principle. Assuming he doesn’t default on the loan [this is another problem all together… another day!] then I can work out the ‘Net Present Value’ of this loan by taking the price for a ZCB expiring at each of the five payment dates from the discount curve, multiplying by the payment amount, and summing them together. As long as this is over $10,000 I should make a profit – if this profit is enough to compensate me for the credit risk that I’m taking, I will probably go ahead and give him the loan. By way of example, the discount curve for several different constant interest rates over 20 years is shown here:

Example Discount Curves
Examples of discount curves generated by constant interest rates of 3%, 5% and 10% over a 20 year period. Any pre-determined future cash flow can be hedged precisely by using zero coupon bonds at the same times, giving the Net Present Value (NPV) of the cash flow

One of the reasons that these rates are so important is that as I said in Risk Free Valuation, to price derivatives we assume that they grow at the risk-free rate r(t) in the RN measure. This is just the r(t) that we’ve worked with above, and we can back it out from the discount curve:

\delta(0,T) = e^{-\int^T_0 r(t)dt}

-\ln[\delta(0,T)] = \int^T_0 r(t)dt

-{\partial \ln[\delta(0,T)] \over \partial T} = r(T)

-{1 \over \delta(0,T)}{\partial \delta(0,T) \over \partial T} = r(T)

so we can calculate the instantaneous risk-free rate given a discount curve fairly straight-forwardly by taking the local gradient and dividing by the megative of the local value (in general the gradient will be negative, so this should be a positive quantity).

In reality, ZCBs don’t exist. In this context all of the above seems a bit academic! However, they can be built up out of a combination of coupon-bearing bonds. Typically, government bonds will pay a fixed interest rate each year, and have a set maturity year in which they pay both an interest payment and the original premium. The rates won’t be the same – typically longer maturity bonds will have higher rates to compensate for the added risk to the principle for locking it away for so long. Further, there is a secondary market for government debt, so we can see the current prices of these bonds for differing maturities on the market. Imagine we see a bond maturing in a year’s time that will pay 5% interest (it doesn’t matter how old it is – it might have been issued 3 or 20 years ago without affecting us), and because it is due for maturity then, it will also return the $1 principle. So it will pay a single payment, in a year’s time, of $1.05. This is already effectively a ZCB because all but one of the payments have already been made. We can see it’s market price P (lets say for concreteness it is $1.03) and from that calculate the market-implied discount factor

\delta(0,1) = {1.03 \over 1.05} = 0.981

So far, so good. But how can we create a ZCB with longer maturities? The trick here is to combine multiple coupon-bearing bonds. Lets say we can see another bond on the market with maturity in 2 years, that pays yearly 4.5% interest and returns its principle at the end. This will make two payments, $0.045 in a year’s time, then $1.045 in two years. The trick is to buy one of these bonds, and simultaneously sell (0.045/1.05 = 0.047) of the original 1 year bonds [I’m assuming we can both buy and sell fractional amounts of bonds, and that we can short sell bonds. The quantities that banks work with mean the first isn’t usually a problem, the second is also probably ok but more on this later]. This fractional bond will exactly match the payout of the second bond in the first year, so we receive an interest payment on that but need to make a payment for the fraction of the first bond. On the second year, we receive our interest and principle on the second bond – so we’ve paid money initially to set up the two-bond portfolio, we receive a payment at the end of the second year, but all of the intermediate cash flows cancel out: we’ve created a synthetic ZCB with two year maturity. To illustrate how to back out the ZCB price, let’s say the second bond price was $1.06. Cash flows are:

  • We paid $1.06 to buy the second bond at t=0
  • We received (0.045/1.05)*$1.03 for shorting the fist bond at t=0
  • Net payment at t=0 is $1.016
  • All cash flows at t=1 cancel out, as discussed above
  • We receive $1.045 in two years from the second bond

Since we paid $1.016 at t=0 and receive $1.045 in two year’s time, we repeat the about calculation for the two-year discount factor

\delta(0,2) = {1.016 \over 1.045} = 0.972

We can see from this how in general you can put together a full discount curve given a sequence of coupon-bearing bonds. We can formalise this process into a matrix equation involving the bond prices, cash flows and discount factors. First define the following quantities

\tilde{P} = \begin{bmatrix} P_1\\ P_2\\ \vdots\\ P_T \end{bmatrix}; \quad \tilde{D} = \begin{bmatrix} \delta_1\\ \delta_2\\ \vdots\\ \delta_T \end{bmatrix}; \quad \tilde{C} = \begin{bmatrix} C_{11} & 0 & \cdots & 0\\ C_{21} & C_{11} & \cdots & 0\\ \vdots & \vdots & \ddots & \vdots\\ C_{T1} & C_{T2} & \cdots & C_{TT} \end{bmatrix}

where \inline \tilde{P} is the price of the bonds expiring at each time, \inline \tilde{D} is the relevant discount factor, and \inline \tilde{C} is the matrix of cash flows for each bond at each time. These must obey

\tilde{C} \circ \tilde{D} = \tilde{P}

and as long as we have matrix inversion code, we can calculate the discount factors via the inverse relationship

\tilde{D} = \tilde{C}^{-1} \circ \tilde{P}

In reality in large institutions these discount curves are formed considering as many bonds as possible, not just those at yearly intervals, in order to minimise the extrapolation required between time points, but this is a mechanical problem rather than a theoretical one which I overlook here for brevity.

The Greeks

I’ve already discussed how to price vanilla options in the BS model. But options traders need to know more than just the price: they also want to know how price changes with the various other parameters in their models.

The way traders make money is just the same way that shop-keepers do – by selling options to other people for a little bit more than they buy them for. Once they sell an option, they have some money but they also have some risk, since if the price of the underlying moves in the wrong direction, they stand to lose a large amount of money. In the simplest case, a trader might be able to buy a matching option on the market for less than she sold the original option to her client for. This would cancel (‘hedge’) all of her risk, and generate a small positive profit (‘PnL’ or Profit & Loss) equal to the difference in the two prices.

This might be difficult to do, however, and it won’t generate as much profit as the trader would like, because whoever she buys the hedging option from will also be trying to charge a premium over the actual price. Another possibility is to try and create a hedged portfolio consisting of several options and the underlying stock as well, so as to minimise the net risk of the portfolio.

Since she has sold an option on a stock (for concreteness, let’s say she has sold a call option expiring at time T on a stock with a spot price S(t) and the option has strike K – because she has sold it, we say she is ‘short’ a call option), the trader will have to pay the larger of zero or ( S(T) – K ) at the option expiry to the client. Clearly, if the stock price goes up too high, she will lose more money than she received for selling the option. One possibility might be for her to buy the stock for S(t). Since she will have to pay a maximum of S(T) – K, but would be able to sell the stock for S(T), she would cover her position in the case that the stock price goes very high (and actually guarantee a profit in this case). But she has over-hedged her position – in the case that the stock falls in price, she will lose S(t) – S(T) on the stock. This is shown in the graph below.

The payoff at expiry of the three portfolios shown in the text. The unhedged option makes the trader money if spot doesn’t rise by more than the premium. The overhedged option is the reverse – now the trader loses money if the stock falls too far below the strike price – this is called a covered call, the payoff is the same as the payoff for an uncovered put option. Finally, a delta hedged option will make money as long as the spot does not move too far in either direction – the trader has taken a directionless bet, she is instead betting on volatility remaining low 

In fact, she can ‘delta-hedge’ the option by buying a fraction \inline \Delta of the stock. One way of deriving the BS equation (which I’ll get to at some point) is to construct a portfolio consisting of one option and a (negative) fraction of the underlying stock where the movement of the option price due to the underlying moving is exactly cancelled out but the movement of the underlying in the portfolio – any small increase in S(t) will increase the price of the option but decrease the value of the negative stock position so that the net portfolio change is zero. The fraction \inline \Delta is called the Delta of the option, mathematically it is the derivative of the option price C with respect to the stock price S

The price of a vanilla call is roughly the same as the payoff at expiry for very high spots and very low spots. Near-the-money, the difference between the option price and its payoff at expiry is greatest as the implicit insurance provided by the option is most useful. The delta of this option is the local gradient of the call price with spot. This is the amount of the underlying that would be required to delta-hedge the portfolio, so that its value is unaffected by small changes in the spot price.

\Delta = {\partial C \over \partial S}

For a call option this will be positive and for a put it will be negative, and the magnitude of both will be between 0 when far out-of-the-money and 1 when far in-the-money.

This graph shows the instantaneous change in PnL due to changes in spot for the three portfolios discussed above. Both the covered and the uncovered calls have some delta – a change in the spot price will have a direct effect in the value of the portfolio. By contrast, the value of the delta hedged portfolio is insensitive to the value of spot for small moves. Unfortunately, it is short gamma, so large moves in either direction will reduce portfolio value, so the trader must be careful to re-hedge frequently. As all of these portfolios are short an option, they are long theta – that is, the value of the portfolio will INCREASE with time if other factors remain constant, as the time value of the option is decaying towards expiry.

Similarly, if the market changes its mind about the implied volatility of an option, this will increase or decrease the price of the trader’s current option portfolio. This exposure can also be hedged, but now she will need to do it by trading options in the stock as the stock price itself is independent of volatility. This time the relevant quantity is the ‘vega’ of the option, the rate of change of price with respect to vol.

These sensitivities of the derivative price are called the Greeks, as they tend to be represented with various greek letters. Some examples are delta (variation with spot), vega (variation with vol), theta (variation with time); and second-order greeks like gamma (sensitivity of delta to spot), vanna (sensitivity of delta to vol, or equivalently sensitivity of vega to spot), and volga (sensitivity of vega to vol).

For vanilla options in the BS model, there are simple expressions for the greeks (see, for example, the Wikipedia page). I’ve updated the PRICERS page to give values for a few of the vanilla greeks of these options along with the price, and there are some graphs of typical greeks below.

The deltas of a long call and long put position – note that these are opposite sign and everywhere separated by a constant value (here 1, but in general the discount factor at the option expiry). Vega is always positive – so increased vol will always increase the price. Increasing spot tends to increase the value of a call, while it decreases the value of a put, but by a progressively smaller amount as spot increases. For these options (and the graph below), spot = fwd = strike = 100; vol = 0.1, expiry = 1 and r = 0.
The Gamma, Vanna and Volga of long vanilla options (these are the same for a call and a put). Gamma is always positive for long options – this means that price is a convex function of spot. Vanna and volga tell us the sensitivity of other greeks to volatility, and are useful in hedging portfolios if vol is changing rapidly.

For exotic options greeks are often intractable analytically, so typically they will be calculated by ‘bump and revalue’, where input parameters are varied slightly and the change in price is observed. For example, a derivative’s \inline \Delta at spot price S could be calculated from its price by ‘bumping’ spot \inline S by a small amount \inline \delta S:

\begin{matrix} C(S+\delta S,\sigma) = C(S,\sigma) + {\partial C \over \partial S}\delta S + {1 \over 2}{\partial^2 C \over \partial S^2} (\delta S)^2 + O[(\delta S)^3] \\ C(S-\delta S,\sigma) = C(S,\sigma) - {\partial C \over \partial S}\delta S + {1 \over 2}{\partial^2 C \over \partial S^2} (\delta S)^2 + O[(\delta S)^3] \end{array}

{C(S+\delta S, \sigma) - C(S-\delta S, \sigma) \over 2 (\delta S) } = {\partial C \over \partial S} + O[(\delta S)^3] \simeq \Delta

which is the derivative delta to a very good approximation for small \inline \delta S.

Since banks will have large portfolios and want to calculate their total exposure fairly frequently, pricing procedures will typically need to be fairly fast so that these risk calculations can be done in a reasonable amount of time, which usually rules out Monte Carlo as a technique here.

These hedges will only work for small changes in the underlying price (for example, delta itself changes with the underlying price according to the second-order greek, gamma). What this means is that the trader will need to re-hedge from time to time, which will cost her some money to do – one of the main challenges for a trader is to balance the need to hedge her portfolio with the associated costs of doing so. Hopefully by buying and selling a wide variety of options to various clients she will be able to minimise many of her greek exposures naturally – this ‘warehousing of risk’ is one of the main functions that banks undertake and a key driver of their profits.

Importance of the Vol Smile

After going through the BS model and deriving an equation for vanilla options, it is tempting to believe all of the assumptions that have gone into it. This post will be the first in a series examining some of those assumptions, extending them where possible, and looking at how they fail in some other cases.

Today I’m going to write about the vol smile. If you go to the market and examine quoted vanilla option prices at a given expiry, and put these into an implied vol solver (for example the one on my page!!), if the market believed the BS model was correct, we’d expect to get the same value for each option, no matter what the strike. Alas, not so! In fact, what we will see is that options that have strikes further away from the forward price will usually have higher implied vols than those near the forward price (ie. ‘at-the-money’). This is called the ‘vol smile’ because as it increases away from the money in either direction, it looks something like a smile! In the picture below I’ve shown some toy vol smiles and the sort of evolution that is typically seen with time-to-expiry.

What does this mean? A higher implied vol means a higher price for the option, so we’re saying options far in-the-money or out-of-the-money cost more in real life than expected by the BS model. In the BS model, the log of the stock price was normally distributed at expiry, but more expensive options at distant strikes means that the real distribution has a higher-than-expected probability of ending up at extreme strikes, so the real probability has ‘fat tails’ relative to a normal distribution.

As described in the post on Risk-Free Valuation, we can go a step further and back out the market-implied distribution from the observed call prices from the equation

p_S(K) = {1\over \delta(t)}{\partial^2 C\over \partial K^2}

Taking the smile shown above at \inline t_1, and calculating prices at each strike (using the standard BS vanilla equation, but with the implied vol given by the smile as the vol input), and taking the second derivative with respect to strike, gives the following risk-neutral distribution

The risk-neutral distribution for the smile shown above, compared to a lognormal Black-Scholes distribution. Note the central peak and fat tails of the smiley distribution

Note that the real distribution does indeed have fat tails at distant strikes. Since they have the same expectation and variance, this means that it also has a central peak, and intermediate values suppressed relative to the normal. The graph below shows the tails in more focus

A comparison of the tails of the distribution. The smile distribution leads to an increased probability of extreme events with large spot moves.

How can we alter the BS model to accommodate a vol smile? One possibility is to allow the vol parameter to vary deterministically with spot and time. This approach can indeed match observed vol smiles, also has several weaknesses, I’ll explore it in depth in a later post (it’s called the Local Volatility model, by the way).

A more interesting idea is to allow the volatility itself to be a random variable. This seems intuitive – the volatility, as well as the stock price, responds to information arriving randomly and unpredictably and thus probably should be stochastic. Why would this give us a vol smile? Well – option prices can be seen as the average payoff over all of the different paths that the spot might take. For paths in which the vol stays low, the price won’t go very far. On the other hand, if the vol increases lots there’s a much higher chance that we will end up far away from the money. Looking at this in reverse, if at the expiry date we’re far from the money, it’s much more likely that we followed a higher volatility path to get here, so the implied vol away from the money will be higher.

Stochastic vol models are widely used by practitioners, and there are many different types and models used with many different strengths and weaknesses. I will return to this topic again, most likely repeatedly! The take-home lesson for today though is that vol smiles are important: they imply fat-tailed distributions relative to a lognormal, and they are significant, real features of markets – we need our models to match them or we will lose lots of money to other people who are!

[An interesting historical point is that before the market crash in 1987, there was no vol smile – options indeed tended to have the same vol regardless of strike – people believed the BS model more than they do now. Could the crash have made people realise that large moves were much more likely in real life than BS suggested, and adjusted accordingly? Or do higher prices at extreme strikes represent traders insuring themselves against the possibilities of more market crashes? There are parallels with the present – another assumption of BS is that it is possible to borrow unlimited amounts at the risk-free borrowing rate. This was almost true for big banks before the 2007 crash, but not so any more, and once again a lot of what we do now is trying to understand how to price options correctly in a price where there isn’t really such a thing at the risk-free rate. Each crash seems to lead to belated better understanding of the BS model weaknesses, and because markets often follow the models that participants are using to model them, this improved understanding itself has an effect on the market!]

Interview Questions I

I thought it would be fun to have a section looking at some of the interview questions that I’ve faced, or heard of, during my time as a quant, and to discuss some ways of approaching them. I’ll start with a fairly simple one that I got some time ago.

One of the traders at your bank has agreed to sell a product to a client, where a coin is tossed. If it’s a tails, the client is paid $1, if it’s a head the coin is tossed again. If the second flip is a tails, the payoff is $2. Otherwise, we continue. On the third flip, if it’s a tails the payoff is $4, or $8 on the fourth flip and so on, with the pot doubling each time. The trader asks you what the price he should charge his client to play the game.

The first part of this problem is to calculate the expected payout of the game. Let’s say X is the number of heads flipped before the first tails comes up. It’s a discrete game, so we calculate the expectation by summing the payout, C(X), over the probability distribution p(X).

If the first flip is a tails, X=0 and C(X)=$1. If there’s one head then one tails, X=1 and C(X)=$2, and so on, so that

C(X) = \$(2^X)

The probability of getting tails on the first flip is 0.5, so p(X=0)=0.5. The probability of getting exactly one head and then one tails is 0.25, so p(X=1)=0.25, and so on, so p(X) is given by

p(X) = \Bigl({1\over 2}\Bigr)^X

To calculate the expectation of the payoff, we need to sum over all possible values of X. There are an infinite number of possibilities, as we could have any number of heads before the first tails, although of course the probability gets very small so the  higher X values shouldn’t contribute too much. Let’s have a look:

{\mathbb E}[C(X)] = \sum^\infty_{X=0} C(X)p(X)

= \$ \sum^\infty_{X=0} 2^X \Bigl({1\over 2}\Bigr)^X

= \$ \sum^\infty_{X=0} 1

What is this sum? It’s the value 1 summed over all possible states of X – so each state adds the same amount to the expectation… but as we’ve already said, there are an infinite number of possible states of X, so the sum must be

=\$ \infty

Oh dear – what does this mean?! Although the probability of getting a very large number of heads is very small, the payoff in these cases is correspondingly large, so although unlikely they still add a constant amount to the expectation. This is difficult – our expected loss from this game is infinite! How can we find a reasonable price to charge the client?

Probably the best way to deal with this is to notice that most of the weight of the expectation is coming from rather unlikely payoffs involving huge sums of money and very tiny possibilities. We could artificially cut these off beyond a certain point, but in fact there’s a very natural cutoff – the bank’s solvency. Beyond a certain point, we simply can’t pay any more because we don’t have it! Let’s say we’re a particularly well capitalised bank, and could afford to lose $100bn before collapsing (by way of comparison, the modest loss of $35bn by the RBS in 2008 was enough to topple that bank, at the time the World’s largest). The log base 2 of 100bn is about 36.5, so after 37 heads in a row we’re already bankrupt. In fact, the payout function looks something more like this:

C(X) = \Bigl\{ \begin{matrix} \$ (2^X )\\ \$ (10^{11}) \end{matrix} \quad \begin{matrix} X < 37 \\ X \geq 37 \end{matrix}

Now, the states X=0 to X=36 all contribute 1 to the payoff, but the contribution of the remaining states declines with the probability since payoff is capped at our capital reserves. The total contribution of the remaining states will then be

\$ \sum_{X=37}^\infty \Bigl({1\over 2}\Bigr)^X \cdot 10^{11}

= \$ \Bigl( {1\over 2}\Bigr)^{36} \cdot 10^{11}

Which is around $1.5. So, a more realistic consideration of our capital position gives a price of around $37.50, much more useful than our initial price of infinity. Of course, the trader would be sensible to charge quite a bit more than this, both to make a profit but also to prevent giving away potentially market-sensitive information about our capital reserves!

Some Results for Common Distributions

[NB. I try to make note of technical conditions where relevant, from now on I’ll put these in square brackets. On a first reading these can probably be ignored]


I review here a few commonly used results for normal and lognormal distributions. They get used over and over across quantitative finance, so I wanted to have lots of them in once place where I can refer back to them later.

Univariate Normal Distribution

The probability density function for X is

p(X=x) = {1\over \sigma\sqrt{2\pi}}e^{-{{(x-\mu)^2}\over 2 \sigma^2}}

with mean \inline \mu and standard deviation \inline \sigma. I tend to use the notation \inline X\sim{\mathbb N}(\mu,\sigma^2) to describe a normally distributed variable with these parameters.

As already referred to, we can take the location and the scale out of the variable X as follows:

X = a + bz

where a and b are constants and z \sim {\mathbb N}(0,1) is called a ‘standard normal variable’.

This comes from the result that the sum of any two normal variables is also normal – for \inline X_1\sim{\mathbb N}(\mu_1, \sigma_1) and \inline X_2\sim{\mathbb N}(\mu_2, \sigma_2), then [as long as \inline X_1 and \inline X_2 are independent]

aX_1 + bX_2 = X_3\sim{\mathbb N}(a\mu_1 + b\mu_2, a^2 \sigma_1^2 + b^2\sigma_2^2)

The cdf of the normal distribution isn’t analytically tractable, but I’ve already discussed a few numerical approximations to it here. I include a doodle that re-implements this function for the standard normal cdf for you to play with:

\Phi() =

The normal distribution comes up time and time again due to the central limit theorem. This says that [usually], as we take the mean value of a sequence of [independent, identically distributed] random variables, it will tend to a normally distributed variate regardless of the distribution of the underlying variables:

\lim_{n\rightarrow \infty} \Bigl( {1 \over n}\sum_{i=0}^n X_i\Bigr) \sim{\mathbb N}(\mu,{\sigma^2\over n})

This is of very broad importance. For example, it is the basis of Monte Carlo and the square root N convergence rate, since by taking many simulations, we are sampling the distribution of the mean of N realisations of the payoff of the option. Although the payoff probably isn’t normally distributed across the r-n probability distribution of the underlying, a very large number of payoffs will approach a normal distribution, and its mean is an estimator for the payoff mean. The variance also gives us error bounds, as variance should decrease with increasing number of samples as 1/n.

Lognormal Distribution

A lognormal variable is one whose log is distributed normally – so if \inline X\sim {\mathbb N}(\mu,\sigma^2) then \inline S \sim e^{a + bX} is lognormally distributed.

The pdf for a lognormal distribution is

p(X=x) = {1\over x\sigma\sqrt{2\pi}}e^{-{{(\ln x-\mu)^2}\over 2 \sigma^2}}; x>0

once again with mean \inline \mu and standard deviation \inline \sigma [Exercise: \inline \mu and \inline \sigma are actually the mean and std. dev. for the normal distribution in the exponent, NOT for the lognormal itself – calculate their true values for the lognormal distribution]. I tend to use the notation \inline X\sim {\mathbb L}{\mathbb N}(\mu,\sigma^2) to describe a lognormally distributed variable with these parameters.

Special properties of lognormal variables are mostly due to the properties of normal in the exponent. The two most important are related to the products of lognormal variates [here I am still assuming independence – I’ll generalise this another time], if X_1 \sim {\mathbb L}{\mathbb N}(\mu_1, \sigma_1^2) and X_2 \sim {\mathbb L}{\mathbb N}(\mu_2, \sigma_2^2) then:

X_1 \cdot X_2 = X_3 \sim {\mathbb L} {\mathbb N}(\mu_1 + \mu_2, \sigma_1^2+\sigma_2^2)

(X_1)^n \sim {\mathbb L}{\mathbb N}(n \mu_1, n^2\sigma_1^2)

{1\over X_1} = (X_1)^{-1}\sim{\mathbb L}{\mathbb N}(-\mu_1,\sigma_1^2)

Although the third of these is a special case of the second, it is worth taking a little bit of time to think about. It says that the distribution of the inverse of a lognormal variable is also lognormal. This will be useful areas like foreign exchange, since it says that if the future exchange rate is randomly distributed in this way, then the inverse of the exchange rate (which is of course just the exchange rate from the opposite currency perspective) is also lognormal. Secondly, what does \inline -\mu mean for the distribution? Well, don’t forget that this isn’t the distribution mean but the mean of the normal in the exponent – even when this is negative, the lognormal is positive, so this is an acceptable value.

Unlike the normal, there is no closed form expression for the sum of two lognormal variables. Two approaches are typically used in this case – for a large, independent selection of variables with the same mean and variance the CLT implies that the distribution of the average will be roughly normal, while for small, highly correlated sequences with similar means they are still roughly lognormal with an adjusted mean and variance. The second technique is an example of ‘moment matching’, I’ll discuss it later in more detail.

Multivariate Normal Distribution

In the case that we have more than one normally distributed random variable [we assume here that they are ‘jointly normally distributed’], we need to build into our calculations the possibility that they might not be independent, which will lead to the multivariate normal distribution (a trivial example of the failure of our above expressions for non-independent variables is if \inline X_2 = –\inline X_1; in which case \inline X_2 + \inline X_1 = 0, and it is certainly NOT true that \inline 0\sim {\mathbb N}(\mu_1+\mu_2,\sigma_1^2 + \sigma_2^2)!

To measure the dependence of two normal variables, our starting point is their covariance. Similarly to variance, this measures the difference between the expectation of the product of two of these variables from the individual expectations

{\rm Cov}[X_1,X_2] = {\mathbb E}[X_1\cdot X_2] - {\mathbb E}[X_1]{\mathbb E}[X_2]

which is 0 for independent variables. A scaled version of the covariance is called the correlation

\rho(X_1,X_2) = {{\rm Cov}[X_1,X_2] \over \sqrt{{\rm Var}[X_1]{\rm Var}[X_2]}}

so that \inline \rho \in [-1,1]. The combined pdf for two or more variables becomes rapidly more complicated, I’ll look at ways of thinking about it another time but state here the case for two variables each with a standard normal distribution and correlation \inline \rho:

P(X=x, Y=y) = {1 \over 2\pi \sqrt{1-\rho^2}}\exp{\Bigl(-{1 \over 2(1-\rho^2)}(x^2 + y^2 - 2\rho xy)\Bigr)}

[Exercise: derive the product rule for two normal and for two lognormal variables when the normals involved are allowed to be correlated]

Finally, for normal variables only, we have the important result that if X and Y have correlation \inline \rho, we can re-write X as the sum of Y and Z as

X = \rho \cdot Y + \sqrt{1-\rho^2}\cdot Zwhere Y and Z are uncorrelated. I’ll be using this expression lots in the future!

Risk Neutral Valuation

There are a few different but equivalent ways of viewing derivatives pricing. The first to be developed was the partial differential equations method, which was how the Black Scholes equation was originally derived. There’s lots of discussion of this on the wikipedia page, and I’ll talk about it at some stage – it’s quite intuitive and a lot of the concepts fall out of it very naturally. However, probably the more powerful method, and the one that I use almost all of the time in work, is the risk-neutral pricing method.

The idea is quite simple, although the actual mechanics can be a little intricate. Since the distribution of the underlying asset at expiry is known, it makes sense that the price of a derivative might be the expected value of the payoff of the option at expiry (eg. (S_t-K)^+ for a call option, where a superscript ‘+’ means “The greater of this value or zero”) over the underlying distribution. In fact, it turns out that this isn’t quite right: due to the risk aversion of investors, this will usually produce an overestimate of the true price. However, in arbitrage-free markets, there exists another probability distribution under which the expectation does give the true price – this is called the risk-neutral distribution of the asset. Further, [as long as the market is complete] any price other than the risk-neutral price allows the possibility of arbitrage. Taken together, these are roughly a statement of the Fundamental Theorem of Asset Pricing. In the case of vanilla call options, a portfolio of the underlying stock and a risk-free bond can be constructed that exactly replicate the option and could be used in such an arbitrage.

In this risk-neutral distribution, all risky assets grow at the risk-free rate, so that the ‘price of risk’ is exactly zero. Let’s say a government bond – which we’ll treat as risk-free – exists, has a price B and pays an interest rate of r, so that

dB = r B dt

Then, the stochastic process for the underlying stock that we looked at before

dS = \mu dt + \sigma dW_t

is modified so that mu becomes r, and the process is

dS = rdt + \sigma dW_t

 so the risk-neutral distribution of the asset is still lognormal, but with mu’s replaced by r’s:

S_t = S_0 e^{(r - {1\over 2}\sigma^2)t + \sigma\sqrt{t}z}


I’ve not provided the explicit formula yet, so I’ll demonstrate here how this can be used to price vanilla call options

C(F,K,\sigma,t,\phi) = \delta(t){\mathbb E^{S_t}}[(S_t-K)^+]

= \delta(t)\int^\infty_{0} (S_t - K)^+ p_S(S_t)dS_t

= \delta(t)\int^\infty_{K} (S_t - K) p_S(S_t)dS_t

= {\delta(t) \over \sqrt{2\pi}}\int^\infty_{x_K} (S_0 e^{(r-{1\over 2}\sigma^2)t + \sigma \sqrt{t}x} - K) e^{-{1\over 2}x^2}dx

= {\delta(t) \over \sqrt{2\pi}}\Bigr[S_0 e^{(r-{1\over 2}\sigma^2)t} \int^\infty_{x_K} e^{\sigma \sqrt{t}x -{1\over 2}x^2}dx - \int^\infty_{x_K} K e^{-{1\over 2}x^2} dx \Bigl]

= {\delta(t) \over \sqrt{2\pi}}\Bigr[S_0 e^{(r-{1\over 2}\sigma^2)t} \int^\infty_{x_K} e^{-{1\over 2}(x-\sigma\sqrt{t})^2 + {1\over 2}\sigma^2t}dx - \sqrt{2\pi} K\Phi(-x_K)\Bigl]

= {\delta(t) \over \sqrt{2\pi}}\Bigr[S_0 e^{rt} \int^\infty_{x_K - \sigma \sqrt{t}} e^{-{1\over 2}x^2} dx - \sqrt{2\pi} K \Phi(-x_K)\Bigl]

= \delta(t)\Bigr[S_0 e^{rt} \Phi(-x_K + \sigma \sqrt{t}) - K \Phi(-x_K)\Bigl]

= \delta(t)\Bigr[F \Phi(d_1) - K \Phi(d_2)\Bigl]

which is the celebrated BS formula! In the above, F = forward price = \inline S_0 e^{rt}\inline \Phi(x) is the standard normal cumulative density of x, \inline x_K is the value of x corresponding to strike S=K, ie.

x_K = {\ln{K \over F} + {1\over 2}\sigma^2t \over \sigma \sqrt{t}}

it is typical to use the variables d1 and d2 for the values in the cfds, such that

d_1 = {\ln{F \over K} + {1\over 2}\sigma^2t \over \sigma \sqrt{t}}

d_2 = {\ln{F \over K} - {1\over 2}\sigma^2t \over \sigma \sqrt{t}} = d_1 - \sigma \sqrt{t}

In reality, certain we have made certain assumptions that aren’t justified in reality. Some of these are:

1. No arbitrage – we assume that there is no opportunity for a risk-free profit

2. No transaction costs – we can freely buy and sell the underlying at a single price

3. Can go long/short as we please – we have no funding constraints, and can buy/sell an arbitrarily large amount of stock/options and balance it with an opposite position in bonds

4. Constant vol and r – we assume that vol and r are constant and don’t vary with strike. In fact, it’s an easy extension to allow them to vary with time, I’ll come back to this later

I’ll look at the validity of these and other assumptions in a future post.

If prices of vanillas have non-constant vols that vary with strike, doesn’t that make all of the above useless? Not at all – but we do need to turn it on its head! Instead of using the r-n distribution to find prices, we use prices to find the r-n distribution! Lets assume that we have access to a liquid market of vanilla calls and puts that we can trade in freely. If we look at their prices and apply the Fundamental Theorem of Calculus twice

C(t)= \delta(t) \int^\infty_{K} (S_t - K)p_S(S_t)dS_t

{\partial C \over \partial K} = -\delta(t) \int^\infty_K p_S(S_t)dS_t

{1\over \delta(t)}{\partial^2 C \over \partial K^2} =p_S(K)

 So the curvature of call prices wrt. strike tells us the local risk neutral probability! This means for each expiry time that we can see vanillas option prices, we can calculate the market-implied r-n distribution (which probably won’t be lognormal, telling us that the market doesn’t really believe the BS assumptions as stated above either). Once we know this, we can use it calibrate our choice of market model and to price other, more exotic options.

[Post script: It is worth noting that although this looks like alchemy, we haven’t totally tamed the distribution, because although we know the underlying marginal distribution at each expiry time, we still don’t know anything about the correlations between them. That is, we know the marginal distributions of the underlying at each time, but not the full distribution. For concreteness, consider two times \inline t_1 and \inline t_2. We know \inline P(S_{t_1}) and \inline P(S_{t_2}) but not \inline P(S_{t_1},S_{t_2}). To price an option paying off the average of the spot at these two times, knowing the first two isn’t enough, we need to know the second, as the expectation is \inline \int^\infty_0 \int^\infty_0 {1\over 2}(S_{t_1} + S_{t_2})P(S_{t_1},S_{t_2}) dS_{t_1}dS_{t_2}. To see the difference, from Bayes Theorem we have that \inline P(S_{t_1},S_{t_2}) = \inline P(S_{t_1}).\inline P(S_{t_2}|S_{t_1}). So, although we know how the spot will be distributed at each time, we don’t know how each distribution is conditional on the times before, which we’d need to know to price exactly – our modelling challenge will be to choose some sensible process for this that is consistent with the marginal distributions.]

Root Finders

One of the nice things about this job is the variety. There’s lots of maths, but the topics aren’t limited to set areas. So one day I might be working with stochastic calculus or statistics, and the next I have to delve into numerical techniques to solve a problem and then code it up.

I hinted about this briefly in my last post about implied vol. In order to do this, I had to solve an equation of the form

C_{BS}}({\rm Fwd},K,\sigma,t,\phi) - {\rm Market\_ Price} = 0

by varying sigma, and because the form of C in general isn’t tractable, numerical techniques must be used.

The easiest solver to implement is a bisection solver. This needs to be given two points on either side of the root, so that C_{BS}(...,\sigma_1,...) is negative and C_{BS}(...,\sigma_2,...) is positive (for vols, sensible values might be \inline \sigma_1 = 0.01% and \inline \sigma_2 = 200%). Then it simply takes the mid-point of these two (call it \inline \sigma_3) and evaluates the function there. If it’s positive, \inline \sigma_3 becomes the new \inline \sigma_2, and if negative \inline \sigma_3 becomes the new \inline \sigma_1. At each iteration this halves the remaining interval between the values. It’s very robust and will always home in to the root, but is considered to be relatively slow. The difference between the n-th value \inline c_n and the true value \inline c obeys

| c_n - c | \leq {| b - a| \over 2^n}

where b and a are the initial values. If we want this difference to be less than \inline 1\times 10^d we have

1\times 10^d \leq {|b-a|\over 2^n}

n \geq \ln_2 |b-a| + 3.32\cdot d

since \inline \ln_2 10 \simeq 3.32. This means that for each additional decimal point of accuracy, a further 3.32 iterations will be needed on average.

Faster ways are less reliable. For example, the secant method is a finite difference version of Newton-Raphson. The first points chosen don’t need to be on opposite sides of the root, but they should be near to the root. The expression used for successive approximations to the root is

x_n = x_{n-1} - f(x_{n-1}){x_{n-1}-x_{n-2} \over f(x_{n-1}) - f(x_{n-2})}

This is an interpolation technique, it works roughly as shown in the figures below. It’s speed of convergence is much faster than the bisection technique when close to the root, but like Newton-Raphson, it is not guaranteed to converge. This isn’t much use to us – we’d like to combine the benefits of both techniques.

A successful setup for the secant method
Here we can see the secant method working. The first step interpolates between the function values f(x1) and f(x2) and finds a better approximation x3. The next step intertpolates f(x2) and f(x3), leading to a new approximation x4 outside the interval [x2,x3]. From this point the procedure proceeds rapidly – it takes 14 steps to achieve 16 d.pds of accuracy (cf. about 55 steps for similar accuracy from the bisection method).
Here we see the secant method failing, due to the curvature of the function and the poor initial choice of x1 and x2. f(x2) and f(x3) are interpolated and lead to a chord that may or may not cross the function (depending on its behaviour at negative x). Clearly secant is not robust to poor choices of initial values or pathological function behaviour.

The industry standard technique for this is the Brent Method. The algorithm is a little complicated but it essentially combines these two approaches, using the secant method but checking that it performs better than the bisection method, and falling back on that if not. It also includes a few extra provisions for added speed and stability – once again, Wikipedia is a great reference for this. I’ve coded up a version of this for the pricer, and there’s a demonstration of how this outperforms the bisection method below. Choose any of the functions in the list, enter some points and see how many iterations each one takes to find the root!

Just a brief note on the coding as well, since I said I’d cover everything here. I’m using javascript at the moment, mostly for speed. Javascript is very forgiving (although consequently a pain to debug!), it allows variables to be defined as functions and passed around as parameters. I created a function

 BrentSolver( functionToSolve, lowerBound, upperBound, tolerance )

which takes a function of one argument functionToSolve(z) that returns the value of the function at z; and finds a root of that function between the lower and upper bounds. Usually we will need this function to have more than one parameter, but again javascript is kind to us – in the block of code that calls it, we call the parameters we need, and define a new function dynamically:

price = 5;
fwd = 100;
strike = 100;
time = 1;
optionStyle = -1;

functionToSolve = function ( vol ) {
  return vanillaPrice( fwd, strike, vol, time, optionStyle ) - price;

brentSolver( functionToSolve, 0.001, 200, 0.00000001 );

Much easier than in C++!! Below is a basic doodle to let you try out the different solvers on fairly well-known functions – have a play and see what happens.


Root Finder: 

Lower Bound:
Upper Bound:

# Iterations:

Price vs. Implied Vol

Something people often comment on when they start out in quantitative finance is that it’s odd that prices tend to be quoted in terms of implied vol instead of… well, price! This seems a bit strange, surely price is both more useful and more meaningful, given that implied vol is based on a model which isn’t really correct?

Briefly, when a vanilla option is priced in the Black-Scholes, its price is given by the following formula

 C(F,K,\sigma,\tau,\delta) = \delta(\tau) \phi \Bigl( F \Phi(\phi \cdot d_1) - K \Phi(\phi \cdot d_2) \Bigr)

d_1 = {\ln{F \over K} + {1 \over 2}\sigma^2 \tau \over \sigma \sqrt{\tau}}

d_2 = d_1 - \sigma \sqrt{\tau}

with \inline \tau the time to expiry, \inline \Phi(x) the standard normal cumulative density of x, \inline \delta(\tau) the discount factor to expiry, \inline \phi +1 for a put and -1 for a call, F the forward to expiry, K the strike, and\inline \sigma the Black-Scholes volatility (there are a few different ways of expressing this formula, I’ll come back to it another time).

Importantly, for both puts and calls there is a 1-to-1 correspondence between price and vol – in both cases, increased vol means increased price, since more vol means a higher chance of greater returns, while our losses are capped at zero. However, the BS price is derived by assuming that vol is a constant parameter (or at least that it only varies with time), but we know that in reality it also varies with strike (this is called the vol smile, and it is a VERY important phenomenon which I’ll talk about LOTS in these posts!). What vol should we put into the equation to get a sensible price?

Actually, we usually think about this in reverse – prices are quoted on the market, and we can invert the BS price to give us instead an implied vol. In fact, usually even the quotes that we receive will be given in terms of implied vol!

There are a few reasons for this. Firstly, a price varys depending on the notional of an option – in physics we’d call it an extrinsic variable, while imp vol is an intrinsic one. But it’s more that price doesn’t really give us as much information about where the option is as vol does. Have a look at the graphs below:

Two graphs of price variation with strike,for options with a flat BS vol (grey) and from a vol smile(red). On the left, their comparative prices are plotted with the forward price for reference. On the right, the corresponding BS implied vols are plotted. In all cases, the forward price is 100, time to expiry is 1 year, the flat vol is a constant 0.1 while the SABR parameters are instant vol = 0.1, vol of vol = 0.5 and rho = -0.2.

These graphs show the price variation with strike for two vol surfaces. Although they come from very different vol surfaces, we really can’t see that from the price graphs. Because the scale is so large, the relatively small price differences are overwhelmed. On one end they look like essentially forwards, while on the other end they are effectively zero.

But when we look at the implied vols instead, we see that they’re in fact very different options. One set has an (unrealistic) constant vol of 10%, while the other set shows higher vols away from the money (ie. at high and low strikes), which is what we typically see in the market. If we didn’t take these into account and priced them using the same vols, we’d be exposing ourselves to significant arbitrage opportunities (incidentally, this vol smile comes from a model commonly used to model and interpolate vol smiles called SABR – we’ll be seeing a lot more of this in the future).

Finally, implied vols give us a feeling for what is happening – since vol is annualised, this is the same order as the percentage change that we would expect in the underlying in a typical year. This gives us an important intuition check on our results that could easily be forgotten in the decimal points or trailing zeroes of a price given in dollars or euros.

As an aside, I’m in the process of upgrading the vanilla pricer to do implied vol calculations as well – so you will be able to either enter a vol and calculate the price of the option, or else enter a price and work out the corresponding vol. Have fun!

[This requires some root-finding (once again, no closed form for the normal cdfs…), and once again I’m taking the path of least resistance for the moment and coding a bisection solver. Since this involves many, many calls to the normal cdf code I used before, I should probably use a quicker method eventually, so I’ll be coding a brent solver soon, which will probably be a post in itself]


Stochastic Differential Equations Pt2: The Lognormal Distribution

This post follows from the earlier post on Stochastic Differential Equations.

I finished last time by saying that the solution to the BS SDE for terminal spot at time T was

\inline S_T = S_0 e^{(\mu - {1 \over 2}\sigma^2)T + \sigma W_T}

When we solve an ODE, it gives us an expression for the position of a particle at time T. But we’ve already said that we are uncertain about the price of an asset in the future, and this expression expresses that uncertainty through the \inline \sigma W_T term in the exponent. We said in the last post that the difference between this quantity at two different times s and t was normally distributed, and since this term is the distance between t=0 and t=T (we have implicitly ignored a term W_0, but this is ok because we assumed that the process started at zero) it is also normally distributed,

W_T \sim {\mathbb N}(0,T)

It’s a well-known property of the normal distribution (see the Wikipedia entry for this and many others) that if X \sim {\mathbb N}(0,1) then aX \sim {\mathbb N}(0,a^2) for constant a. We can use this in reverse to reduce W_T to a standard normal variable x, by taking a square root of time outside of the distribution so W_T \sim \sqrt{T}\cdot{\mathbb N}(0,1) and we now only need standard normal variables, which we know lots about. We can repeat our first expression in these terms

\inline S_T = S_0 e^{(\mu - {1 \over 2}\sigma^2)T + \sigma \sqrt{T} X}

What does all of this mean? In an ODE environment, we’d be able to specify the exact position of a particle at time T. Once we try to build in uncertainty via SDEs, we are implicitly sacrificing this ability, so instead we can only talk about expected positions, variances, and other probabilistic quantities. However, we certainly can do this, the properties of the normal distribution are very well understood from a probabilistic standpoint so we expect to be able to make headway! Just as X is a random variable distributed across a normal distribution, S(t) is now a random variable whose distribution is a function of random variable X and the other deterministic terms in the expression. We call this distribution the lognormal distribution since the log of S is distributed normally.

The random nature of S is determined entirely by the random nature of X. If we take a draw from X, that will entirely determine the corresponding value of S, since the remaining terms are deterministic. The first things we might want to do are calculate the expectation of S, its variance, and plot its distribution. To calculate the expectation, we integrate over all possible realisations of X weighted by their probability, complete the square and use the gaussian integral formula with a change of variables

{\mathbb E}[S_t] = \int^{\infty}_{-\infty} S_t(x) p(x) dx

={S_0 \over \sqrt{2\pi}}\int^{\infty}_{-\infty} e^{(\mu-{1\over 2}\sigma^2)t + \sigma \sqrt{t}x} e^{-{1\over 2}x^2} dx

={ {S_0 e^{(\mu-{1\over 2}\sigma^2)t}} \over \sqrt{2\pi}}\int^{\infty}_{-\infty} e^{-{1\over 2}x^2 + \sigma \sqrt{t} x} dx

={{S_0 e^{(\mu-{1\over 2}\sigma^2)t}} \over \sqrt{2\pi}}\int^{\infty}_{-\infty} e^{-{1\over 2} (x - \sigma \sqrt{t})^2} e^{{1\over 2}\sigma^2 t} dx

={{S_0 e^{\mu t}} \over \sqrt{2\pi}}\int^{\infty}_{-\infty} e^{-{1\over 2} y^2} dy

=S_0 e^{\mu t}

which is just the linear growth term acting over time [exercise: calculate the variance in a similar way]. We know what the probability distribution of X looks like (it’s a standard normal variable), but what does the probability distribution of S look like? We can calculate the pdf using the change-of-variables technique, which says that if S = g(x), then the area under each curve in corresponding regions must be equal:

\int_{x_1}^{x_2} p_x(x) dx = \int_{g(x_1)}^{g(x_2)} p_S(S) dS

p_x(x) dx = p_S(S) dS

p_S(S_t) = p_x(x) {dx \over dS_t}

 We know the function S(x), but the easiest way to calculate this derivative is first to invert the function t make it ameanable to differentiation

x = {\ln{S_t \over S_0} - (\mu - {1 \over 2}\sigma^2)t \over \sigma \sqrt{t}}

{dx \over dS_t} = {1 \over \sigma \sqrt{t} S_t}

So the pdf of S expressed in terms of S is

 p_S(S_t) = {1 \over S_t \sigma \sqrt{2\pi t}} \exp{-\Bigl(\ln{S_t\over S_0} - (\mu-{1\over 2}\sigma^2)t \Bigr)^2\over 2 \sigma^2 t}

Well it’s a nasty looking function indeed! I’ve plotted it below for a few typical parameter sets and evolution times.

A lognormal PDF with typical parameter values. Of course, it is highly unlikely that parameter values will stay constant for 4 years – we’ll discuss time dependence of these another day
A more-volatile PDF. Note that even though the mode is falling over time, the mean (which is independent of vol) is still going up due to the fat tails at positive values.

This distribution is really central to a lot of what we do, so I’ll come back to it soon and discuss a few more of its properties. The one other thing to mention is that if we want to calculate an expected value over S (which will turn out to be something we do a lot), we have two approaches – either integrate over \inline p_S(S_t)

{\mathbb E}[f(S_t)] = \int_0^{\infty} f(S_t)p_S(S_t)dS

or,instead express the function in terms of x instead (using \inline S_t = S_0 e^{(\mu - {1 \over 2}\sigma^2)t + \sigma \sqrt{t} x}) and instead integrate over the normal distribution

{\mathbb E}[f(S_t(x))] = \int_{-\infty}^{\infty} f(x)p_x(x)dx

This is typically the easier option. I think it is called the Law of the Unconscious Statistician. On that note, we’ve certainly covered enough ground for the moment!