Solving the Heat Equation

This post follows on from my earlier post on the BS Equation from Delta Hedging. In that post, I showed how delta hedging arguments within the BS model lead to the heat equation

{1\over 2}\sigma^2 {\partial^2 D\over \partial y^2} - {\partial D \over \partial \tau} = 0

where \inline D = C e^{-r \tau}, C(S,t) is the derivative price, \inline y = \ln S + \big( r - {1\over 2}\sigma^2 \big)\tau, and \inline \tau = T - t. In this post I’m going to solve this equation to give the BS option price for a vanilla call. S can only take positive values, but due the the logarithm y runs from \inline -\infty to \inline \infty.

In physics, this equation is usually encountered with periodic boundary conditions coming from the boundaries of the region of interest, but here we don’t have that luxury so will need to be more careful. In this post I’m going to solve the equation using the method of separation of variables, in a later post I’ll re-solve it using the method of Fourier transforms.

The heat equation is linear, which means that if \inline f(y,\tau) and \inline g(y,\tau) are both solutions, then so is the sum \inline \big( f(y,\tau) + g(y,\tau) \big). This means that if we can find a selection of possible solutions, we can combine them to match the boundary condition given by the known form of \inline D(y,\tau = 0) at expiry (coming from \inline C(S,\tau = 0) = \big( S - K \big)^+). We look for separable solutions, which are those of the form \inline f(y,\tau) = Y(y)T(\tau), in which case the derivatives in the heat equation only act on their respective terms:

{1\over 2}\sigma^2 T(\tau){\partial^2 Y(y)\over \partial y^2} - Y(y){\partial T(\tau) \over \partial \tau} = 0

{1\over 2}\sigma^2 Y''T = Y \dot{T}

{1\over 2}\sigma^2 {Y''\over Y} = {\dot{T} \over T} = \lambda^2

where a dash denotes differentiation by y and a dot differentiation by tau, In the final equation, both sides depend only on independent variables y and tau respectively – the only way this can be true is if both sides are equal to some constant, which we call \inline \lambda^2.

In general, we should consider the possibilities that \inline \lambda^2 is zero or negative, but it turns out that these won’t matter in this example so I ignore them for brevity. Two equations can be extracted, linked by \inline \lambda^2

Y'' - 2 {\lambda^2 \over \sigma^2} Y = 0

\dot{T} - \lambda^2 T = 0

Solving these equations,

Y_\lambda(y) = Y_{0,\lambda} e^{\sqrt{2}{\lambda \over \sigma }y}

T_\lambda (\tau) = e^{\lambda^2 \tau}

with any positive value of lambda allowed. We first try to find the fundamental solution of the heat equation, which is the solution that satisfies \inline f(y,\tau = 0) = \delta(y-a) where \inline \delta is the Dirac delta function, and I’ll use this to find the particular solution for a vanilla call at the end. At \inline \tau = 0 all of the functions T are 1, so these can be neglected, and we simply sum up the right number of Y terms, each with a different value of \inline \lambda which we index with n, to produce the delta function:

\sum_n Y_{0,\lambda_n} e^{\sqrt{2}{\lambda_n \over \sigma }y} = \delta(y-a)because \inline \lambda can take any value on the continuous interval \inline -\infty < \lambda < \infty, we change the sum to an integral, and also express the delta function using its integral representation and compare co-efficient to equate the two expressions

\int_{-\infty}^{\infty} Y_{0,\lambda_n} e^{\sqrt{2}{\lambda_n \over \sigma }y}dn = {1 \over 2\pi} \int_{-\infty}^{\infty} e^{in(y-a)}dn


Y_{0,n} = {1 \over 2\pi}e^{-in a} \quad ; \quad \lambda_n = {1 \over \sqrt{2}}\sigma i n

And plugging these into our earlier expressions for Y and T, and combining these in the same way for \inline \tau > 0 to provide the temporal behavior, we arrive at

f(y,\tau) = \int_{-\infty}^{\infty} Y_{\lambda_n}(y) T_{\lambda_n}(\tau) dn

= {1\over 2\pi} \int_{-\infty}^{\infty} e^{in(y-a)} e^{-{\sigma^2 n^2 \tau \over 2}}dn

and once again, this is a gaussian integral that we’ll tackle by completing the square…

= {1\over 2\pi} \int_{-\infty}^{\infty} \exp\Big\{ -{1 \over 2} \sigma^2 \tau \Big( n^2 - 2 {i(y-a)\over \sigma^2 \tau} + \big( {i(y-a)\over \sigma^2 \tau}\big)^2-\big( {i(y-a)\over \sigma^2 \tau}\big)^2 \Big) \Big\} dn

= {1\over 2\pi} \int_{-\infty}^{\infty} \exp\Big\{ -{1 \over 2} \sigma^2 \tau \Big(n- \big( {i(y-a)\over \sigma}\big) \Big)^2 \Big\} \cdot \exp{-(y-a)^2 \over 2 \sigma^2 \tau} dn

= {1\over 2\pi}\cdot\sqrt{2\pi \over \sigma^2 \tau}\cdot \exp{-(y-a)^2 \over 2 \sigma^2 \tau}

= {1\over \sqrt{2\pi \sigma^2 \tau}}\cdot \exp{-(y-a)^2 \over 2 \sigma^2 \tau}

and this is the fundamental solution given at the end of the first post and also at wikipedia. Note that although we used separation of variables to find the forms of the functions Y and T, the separability hasn’t been carried through the summation of many different T-Y combinations – the fundamental solution that we’ve arrived at ISN’T separable!

The fundamental solution is the time and space evolution of a function that is initially a delta-function, such that the evolution always obeys the heat equation. We can use the following property of the delta function to construct specific solutions for particular payoff functions:

f(y) = \int_{-\infty}^{\infty} \delta(y-a) f(a) da

So, because the fundamental solution \inline f(y,\tau=0) = \delta(y-a), we can construct the payoff function using this integral at \inline \tau = 0

D(y,\tau=0) = \int_{-\infty}^{\infty} f(a,\tau=0) D(a,\tau=0) da

and consequently, at later times

D(y,\tau) = \int_{-\infty}^{\infty} f(a,\tau) D(a,\tau=0) daand we can solve this for the specific case that \inline C(S,\tau = 0) = \big( S(\tau=0) - K \big)^+ as follows (denoting a as y’)

D(y,\tau) = {e^{-r\tau} \over \sqrt{2\pi\sigma^2\tau}}\int_{-\infty}^{\infty} \exp\Big\{ - {(y'-y)^2 \over 2\sigma^2 \tau} \Big\} \cdot \big( e^{y'} - K \big)^+ dy'

The second factor is only non-zero when \inline y' > \ln K = y_L, so we can absorb this condition into the lower limit

= {e^{-r\tau} \over \sqrt{2\pi\sigma^2\tau}}\int_{y_L}^{\infty} \exp\Big\{ - {(y'-y)^2 \over 2\sigma^2 \tau} \Big\} \cdot \big( e^{y'} - K \big) dy'and splitting this into two integrals, which we do in turn

= {e^{-r\tau} \over \sqrt{2\pi\sigma^2\tau}}\int_{y_L}^{\infty} K\exp\Big\{ - {(y'-y)^2 \over 2\sigma^2 \tau} \Big\} dy' = e^{-r\tau}\cdot K\cdot \Phi\Big({y - y_L \over \sigma \sqrt{\tau}}\Big)


{e^{-r\tau} \over \sqrt{2\pi\sigma^2\tau}}\int_{y_L}^{\infty} \exp\Big\{ - {(y'-y)^2 \over 2\sigma^2 \tau} \Big\} \cdot e^{y'} dy'

= {e^{-r\tau} \over \sqrt{2\pi\sigma^2\tau}}\int_{y_L}^{\infty} \exp\Big\{ - {1\over 2\sigma^2 \tau}\Big( y'^2 + y^2 - 2y'y -2y'\sigma^2\tau \pm ( y + \sigma^2\tau )^2 \Big) \Big\} dy'

= {e^{-r\tau} \over \sqrt{2\pi\sigma^2\tau}}\int_{y_L}^{\infty} \exp\Big\{ - {1\over 2\sigma^2 \tau}\Big( \big(y'-(y+\sigma^2 \tau)\big)^2 - \big(y + \sigma^2 \tau\big)^2 + y^2\Big) \Big\} dy'

= {e^{-r\tau} \over \sqrt{2\pi\sigma^2\tau}}\int_{y_L}^{\infty} \exp\Big\{ - {1\over 2\sigma^2 \tau}\Big( \big(y'-(y+\sigma^2 \tau)\big)^2 - \sigma^2\tau\big(2y + \sigma^2 \tau\big)\Big) \Big\} dy'

= {e^{-r\tau} \over \sqrt{2\pi\sigma^2\tau}}\int_{y_L}^{\infty} \exp\Big\{ - {\big(y'-(y+\sigma^2 \tau)\big)^2\over 2\sigma^2 \tau}\Big\}\cdot S e^{r\tau} dy'

= e^{-r\tau} \cdot F \cdot \Phi \Big( {y - y_L + \sigma^2 \tau \over \sigma \sqrt{\tau}} \Big)

where as usual, \inline \Phi(x) is the cumulative normal distribution; and we’ve used the expression \inline y = \ln S + (r - {1\over 2}\sigma^2)\tau and also \inline y + {1 \over 2}\sigma^2 \tau = S e^{r\tau} = F. Putting all of this together, we finally arrive at the Black-Scholes expression for a vanilla call price,

{e^{-r\tau} \over \sqrt{2\pi\sigma^2\tau}}\int_{y_L}^{\infty} \exp\Big\{ - {(y'-y)^2 \over 2\sigma^2 \tau} \Big\} \cdot \big( e^{y'} - K \big) dy'

= e^{-r\tau} \Big( F \cdot \Phi(d_1) - K\cdot \Phi(d_2)\Big)



Wow! Finally! Hopefully this will convince you that the method of risk-neutral valuation is a bit more straight-forward!

BS from Delta-Hedging

Today I’m going to look at another method of getting to the BS equations, by constructing a delta-hedge. This is the way that the equation was in fact first reached historically, and it’s a nice illustration of the principle of hedging. All of the same assumptions are made as in the post that derived the BS equation via Risk Neutral Valuation.

The principle is that because the price of some derivative C(S,t) is a function of the stochastic underlying S, then all of the uncertainty in C comes from the same source as the uncertainty in S. We try to construct a risk-free portfolio made up of the two of these that perfectly cancels out all of the risk. If the portfolio is risk-free, we know it must grow at the risk free rate r(t) or else we have an arbitrage opportunity.

Our model for S is the geometric brownian motion, note that we allow the rate of growth \mu in general to be different from r

    \[dS = \mu Sdt + \sigma SdW_t\]

We can express dC in terms of its derivatives with respect to t and S using Ito’s lemma, which I discussed in a previous post,

    \[dC = \Bigr[ {\partial C \over \partial t} + \mu S{\partial C \over \partial S} + {1 \over 2}\sigma^2 S^2{\partial^2 C \over \partial S^2} \Bigl] dt + \sigma S{\partial C \over \partial S} dW_t\]

Our portfolio is made up of one derivative worth C(S,t) and a fraction \alpha of the underlying stock, worth \alpha \cdot S; so the net price is C(S,t) + \alpha S. We combine the above two results to give

    \begin{eqnarray*} d(C+\alpha S) &=& \Bigr[ {\partial C \over \partial t} + \mu S{\partial C \over \partial S} + {1 \over 2}\sigma^2 S^2{\partial^2 C \over \partial S^2} + \mu S\alpha \Bigl] dt \\ \nonumber \\ &   & + \, \sigma S \bigl( {\partial C \over \partial S} +\alpha \bigr) dW_t \nonumber \end{eqnarray}

We are trying to find a portfolio that is risk-free, which means we would like the stochastic term to cancel. We see immediately that this happens for \alpha = -{\partial C \over \partial S}, which gives

    \[d(C+\alpha S) = \Bigr[ {\partial C \over \partial t} + {1 \over 2}\sigma^2 S^2{\partial^2 C \over \partial S^2}\Bigl] dt\]

Since this portfolio is risk-free, to prevent arbitrage it must grow deterministically at the risk free rate

    \[d ( C + \alpha S) = r ( C + \alpha S) dt\]

and so

    \[rC= {\partial C \over \partial t} + {1 \over 2}\sigma^2 S^2{\partial^2 C \over \partial S^2} + rS{\partial C \over \partial S}\]

This is the BS partial differential equation (pde). Note that despite the fact that the constant growth term for the underlying had a rate \mu, this has totally disappeared in the pde above – we might disagree with someone else about the expected rate of growth of the stock, but no-arbitrage still demands that we agree with them about the price of the option [as long as we agree about \sigma, that is!]

As for any pde, we can only solve for a specific situation if we have boundary conditions – in this case, given by the payoff at expiry t=T. At that point we know the exact form the value that C(S,T) must take

    \[C_{\rm call}(S,T) = \max(K-S,0)\]

Our job is to use the pdf to evolve the value of C(S,t) backwards to t=0. In the case of vanilla options this can be done exactly, while for more complicated payoffs we would need to discretise and solve numerically. This gives us another way of valuing options that is complementary (and equivalent) to the expectations approach discussed previously.

To solve the equation above, it is useful to first make some substitutions. As we are interested in time-to-expiry only, we make the change of variables \tau = T - t which yields

    \[rC= -{\partial C \over \partial \tau} + {1 \over 2}\sigma^2 S^2{\partial^2 C \over \partial S^2} + rS{\partial C \over \partial S}\]

We can eliminate the S terms by considering change-of-variables M = \ln S. This means that

    \[{\partial C \over \partial S} = {\partial C \over \partial M}{\partial M \over \partial S} = {1 \over S}{\partial C \over \partial M}\]

    \begin{eqnarray*} \partial^2 C \over \partial S^2} = {\partial \over \partial S} \Bigl({\partial C \over \partial S}\Bigr) & = & {\partial \over \partial S} \Bigl({1 \over S}{\partial C \over \partial M}\Bigr) \nonumber \\ \nonumber \\ & = & {-1 \over S^2} {\partial C \over \partial M} + {1 \over S}{\partial \over \partial S} \Bigl({\partial C \over \partial M}\Bigr) \nonumber \\ \nonumber \\ & = & {-1 \over S^2} {\partial C \over \partial M} + {1 \over S}\bigl({\partial M \over \partial S}{\partial \over \partial M}\bigr) \Bigl({\partial C \over \partial M}\Bigr) \nonumber \\ \nonumber \\ & = & {-1 \over S^2} {\partial C \over \partial M} + {1 \over S^2}{\partial^2 C \over \partial M^2 \nonumber \end{eqnarray}

Combining these the BS equation becomes

    \[rC= -{\partial C \over \partial \tau} + {1 \over 2}\sigma^2 {\partial^2 C \over \partial M^2} + \bigr( r - {1 \over 2} \sigma^2 \bigl) {\partial C \over \partial M}\]

The linear term in C can be removed by another transformation D = C e^{-r\tau} so that

    \begin{eqnarray*} {\partial D \over \partial \tau} &=& {\partial C \over \partial \tau}e^{-r\tau} - rCe^{-r\tau} \nonumber \\ \nonumber \\ {\partial^n D \over \partial M^n} &=& {\partial^n C \over \partial M^n}e^{-r\tau} \nonumber \end{eqnarray}

The exponential terms cancel throughout, and we are left with

    \[0 = -{\partial D \over \partial \tau} + {1 \over 2}\sigma^2 {\partial^2 D \over \partial M^2} + \bigr( r - {1 \over 2} \sigma^2 \bigl) {\partial D \over \partial M}\]

One final transformation will be needed before putting in boundary conditions. The transformation will be

    \[y = M + \Bigr( r -{1\over 2 }\sigma^2\Bigl)\tau\]

But unlike the other transformations I’ve suggested so far, this one mixes the two variables that we are using, so a bit of care is required about what we mean. When I most recently wrote the BS equation, D was a function of M and \tau – this means that the partial differentials with respect to \tau were implicitly holding M constant and vise versa. I’m now going to write D as a function of y and \tau instead, and because the relationship features all three variables we need to take a bit of care with our partial derivatives:

    \[dD(M,\tau) = {\partial D \over \partial \tau}\bigg|_{M} d\tau + {\partial D \over \partial M}\bigg|_{\tau} dM\]

where vertical lines indicate the variable that is being held constant during evaluation. Now, to move from M to y, we expand out the dM term in the same way as we did for dD above

    \[dD(M(y,\tau),\tau) = {\partial D \over \partial \tau}\bigg|_{M} d\tau + {\partial D \over \partial M}\bigg|_{\tau} \Bigr({\partial M \over \partial \tau}\bigg|_{y} d\tau + {\partial M \over \partial y}\bigg|_{\tau}dy\Bigl)\]

    \[= dD(y,\tau) = {\partial D \over \partial \tau}\bigg|_{y} d\tau + {\partial D \over \partial y}\bigg|_{\tau} dy\]

We can compare these last two equations to give expressions for the derivatives that we need after the transformation by comparing the coefficients of d\tau and dy

    \begin{eqnarray*} {\partial D \over \partial \tau}\bigg|_y &=& {\partial D \over \partial \tau}\bigg|_M + {\partial D \over \partial M}\bigg|_{\tau} {\partial M\over \partial \tau}\bigg|_y \nonumber \\ \nonumber \\ {\partial D \over \partial y}\bigg|_{\tau} &=& {\partial D \over \partial M}\bigg|_{\tau}{\partial M \over \partial y}\bigg|_{\tau} \nonumber \end{eqnarray}

Computing and inserting these derivatives [I’ve given a graphical representation of the first of these equations below, because the derivation is a little dry at present!] into the BS equation gives

    \[0 = -{\partial D \over \partial \tau}\bigg|_y + {1 \over 2}\sigma^2 {\partial^2 D \over \partial y^2}\bigg|_{\tau}\]

This is the well-known Heat Equation in physics. For the sake of brevity I won’t solve it here, but the solution is well known – see for example the wikipedia page – which gives the general solution:

    \[D(x,\tau) = {1 \over \sqrt{2\pi \sigma^2 \tau}}\int^{\infty}_{-\infty} e^{-(x-y)^2 \over 2\sigma^2 \tau} p(y) dy\]

Where p(y) is the payoff condition (it’s now an initial condition, as expiry is at \tau = 0). The algebra is quite involved so I give the solution its own post, and you can show by substitution that the BS option formulae given previously is a solution to the equation.

An illustration of the difference between partial differentials when a change of variables involving both current varibles is used
An illustration of the difference between partial differentials when a change of variables involving both current variables is used. This should be thought of as a contour plot with the value of D on the out-of-plane axis. The amount D changes when moving a small amount dt depends on which direction you are moving in, as shown above.

As an aside, what was the portfolio that I was considering all of the way through? Comparing \alpha to the vanilla greeks, we recognise it as the option delta – the hedging portfolio is just the portfolio of the option with just enough stock to hedge out the local delta risk. Of course, as time goes by this value will change, and we need to constantly adjust our hedge to account for this. This shows the breakdown caused by one of our assumptions – that we could trade whenever we want and without transaction costs. In fact, because we need to re-hedge at every moment to enforce this portfolio’s risk free nature, in the presence of transaction costs the hedging costs in this strategy will be infinite! This demonstrates a significant failing of one of our assumptions, I’ll come back again to the effect of this in the real world in future posts.