Cumulative Normal Distribution

As discussed in my last post, I had some issues with an implementation of the standard normal cdf function, which I address here. This post will be my first test of the equation editor that I’ve just installed!

The standard normal cdf looks like this:

\Phi(x) = {1 \over \sqrt{2\pi}}\int ^{x}_{-\infty} e^{-{1 \over 2} y^2 } dy

I had a few thoughts as to how to go about this. It’s symmetrical around x = 0, so we only need to worry about solving for positive x, and then we can calculate negative x values by reflection. Possibilities were:

  • Series Approximation

e^x = \sum_{n = 0}^{\infty} {1 \over n!} x^n

{1\over \sqrt{2\pi}}\int_{-\infty}^x e^{-{1 \over 2}y^2} dy = {1 \over 2} + \sum_{n = 0}^{\infty} \int_{0}^x {-1^n \over 2^n n!} {y^{2n}} dy = {1 \over 2} + \sum_{n = 0}^{\infty} {-1^n \over (2n+1) 2^n n!} {x^{2n+1}}

The problem with this expression is that the terms oscillate between positive and negative, which means that for large enough x it will explode to either very big positive or negative values. We could include a fixing point and build in some sort of curve above a certain x, but I’m looking for a fairly robust solution that doesn’t involve too much testing for the moment, and this sounds like a lot of work!

  • Lookup Table

Again, this solution would be a lot of work to set up, but would have a guaranteed level of accuracy and would find the answer in a fixed amount of time. A vector could be set up containing the value of x that gives cdf(x) = 0.5+n*delta; delta might be one millionth or so, the array would have 2^19 members and a binary search would locate the closest value to x in approximately 18 operations (assuming constant time to check a vector). The index of this value would then give the value of the cdf, and an interpolation scheme between adjacent values could improve accuracy further.

Pros of this solution are efficiency and accuracy, but there would be a large setup cost and possibly memory cost to load up the look up table. If we had to run the function inside a Monte Carlo regime or numerical integration, this might be justified by the runtime savings, but for the moment it’s a bit much work!

  • Analytic Approximation

To get the pricer described in my last post running, I implemented the following approximation to the error function:

a1 = 0.278393
a2 = 0.230389
a3 = 0.000972
a4 = 0.078108

{\rm erf}(x) = 1 - {1 \over (a_1x + a_2 x^2 + a_3 x^3 + a_4 x^4)^4}

\Phi(x) = 0.5\Bigl[ 1 + {\rm erf}\Bigl({x \over \sqrt{2}}\Bigr) \Bigr]

Well, it got the pricer working but turns out to be not too accurate. For example, I got this running on excel and compared it to the inbuild NORMSDIST implementation, for some typical numbers (forward = 100, strike = 90, vol=10%, dcf = 1, time = 1). For NORMSDIST(d1) in excel, I get 0.86512, while for my Phi(d1) I get 0.86489, so the error is in the 4th sf. This leads to a price difference of 10.7124 for excel and 10.7095 for my implementation, and I’d like to do a bit better. Since I’d ruled out the possibilities described above, I went back to wikipedia and implemented the following version of the cdf, which gave a much more agreeable price (similar to excel’s now to 7sf):

b0 = 0.2316419;
b1 = 0.31938153;
b2 = -0.356563782;
b3 = 1.781477937;
b4 = -1.821255978;
b5 = 1.330274429;

t = {1 \over 1+b_0 x}

\phi(x) = {1 \over \sqrt{2\pi}} e^{-{1 \over 2}x^2}

\Phi(x) = 1 - \phi(x)\cdot(b_1t + b_2t^2 +b_3t^3+b_4t^4+b_5t^5)

It’s worth noting that although this implementation seems to give better accuracy for most positive x, it relies on some library functions (square root and exponential) itself, so we’re leaving ourselves slightly at the mercy of how those are programmed. It’s also going to take much longer than the previous implementation, which only used polynomial arithmetic. Since we only need a couple of evaluations in a vanilla pricer, it’s not a big concern, but for applications that call it repeatedly we might want to do better.

That’s all for today!

-QuantoDrifer

Leave a Reply

Your email address will not be published. Required fields are marked *