Some Results for Common Distributions

[NB. I try to make note of technical conditions where relevant, from now on I’ll put these in square brackets. On a first reading these can probably be ignored]

 

I review here a few commonly used results for normal and lognormal distributions. They get used over and over across quantitative finance, so I wanted to have lots of them in once place where I can refer back to them later.

Univariate Normal Distribution

The probability density function for X is

p(X=x) = {1\over \sigma\sqrt{2\pi}}e^{-{{(x-\mu)^2}\over 2 \sigma^2}}

with mean \inline \mu and standard deviation \inline \sigma. I tend to use the notation \inline X\sim{\mathbb N}(\mu,\sigma^2) to describe a normally distributed variable with these parameters.

As already referred to, we can take the location and the scale out of the variable X as follows:

X = a + bz

where a and b are constants and z \sim {\mathbb N}(0,1) is called a ‘standard normal variable’.

This comes from the result that the sum of any two normal variables is also normal – for \inline X_1\sim{\mathbb N}(\mu_1, \sigma_1) and \inline X_2\sim{\mathbb N}(\mu_2, \sigma_2), then [as long as \inline X_1 and \inline X_2 are independent]

aX_1 + bX_2 = X_3\sim{\mathbb N}(a\mu_1 + b\mu_2, a^2 \sigma_1^2 + b^2\sigma_2^2)

The cdf of the normal distribution isn’t analytically tractable, but I’ve already discussed a few numerical approximations to it here. I include a doodle that re-implements this function for the standard normal cdf for you to play with:

\Phi() =

The normal distribution comes up time and time again due to the central limit theorem. This says that [usually], as we take the mean value of a sequence of [independent, identically distributed] random variables, it will tend to a normally distributed variate regardless of the distribution of the underlying variables:

\lim_{n\rightarrow \infty} \Bigl( {1 \over n}\sum_{i=0}^n X_i\Bigr) \sim{\mathbb N}(\mu,{\sigma^2\over n})

This is of very broad importance. For example, it is the basis of Monte Carlo and the square root N convergence rate, since by taking many simulations, we are sampling the distribution of the mean of N realisations of the payoff of the option. Although the payoff probably isn’t normally distributed across the r-n probability distribution of the underlying, a very large number of payoffs will approach a normal distribution, and its mean is an estimator for the payoff mean. The variance also gives us error bounds, as variance should decrease with increasing number of samples as 1/n.

Lognormal Distribution

A lognormal variable is one whose log is distributed normally – so if \inline X\sim {\mathbb N}(\mu,\sigma^2) then \inline S \sim e^{a + bX} is lognormally distributed.

The pdf for a lognormal distribution is

p(X=x) = {1\over x\sigma\sqrt{2\pi}}e^{-{{(\ln x-\mu)^2}\over 2 \sigma^2}}; x>0

once again with mean \inline \mu and standard deviation \inline \sigma [Exercise: \inline \mu and \inline \sigma are actually the mean and std. dev. for the normal distribution in the exponent, NOT for the lognormal itself – calculate their true values for the lognormal distribution]. I tend to use the notation \inline X\sim {\mathbb L}{\mathbb N}(\mu,\sigma^2) to describe a lognormally distributed variable with these parameters.

Special properties of lognormal variables are mostly due to the properties of normal in the exponent. The two most important are related to the products of lognormal variates [here I am still assuming independence – I’ll generalise this another time], if X_1 \sim {\mathbb L}{\mathbb N}(\mu_1, \sigma_1^2) and X_2 \sim {\mathbb L}{\mathbb N}(\mu_2, \sigma_2^2) then:

X_1 \cdot X_2 = X_3 \sim {\mathbb L} {\mathbb N}(\mu_1 + \mu_2, \sigma_1^2+\sigma_2^2)

(X_1)^n \sim {\mathbb L}{\mathbb N}(n \mu_1, n^2\sigma_1^2)

{1\over X_1} = (X_1)^{-1}\sim{\mathbb L}{\mathbb N}(-\mu_1,\sigma_1^2)

Although the third of these is a special case of the second, it is worth taking a little bit of time to think about. It says that the distribution of the inverse of a lognormal variable is also lognormal. This will be useful areas like foreign exchange, since it says that if the future exchange rate is randomly distributed in this way, then the inverse of the exchange rate (which is of course just the exchange rate from the opposite currency perspective) is also lognormal. Secondly, what does \inline -\mu mean for the distribution? Well, don’t forget that this isn’t the distribution mean but the mean of the normal in the exponent – even when this is negative, the lognormal is positive, so this is an acceptable value.

Unlike the normal, there is no closed form expression for the sum of two lognormal variables. Two approaches are typically used in this case – for a large, independent selection of variables with the same mean and variance the CLT implies that the distribution of the average will be roughly normal, while for small, highly correlated sequences with similar means they are still roughly lognormal with an adjusted mean and variance. The second technique is an example of ‘moment matching’, I’ll discuss it later in more detail.

Multivariate Normal Distribution

In the case that we have more than one normally distributed random variable [we assume here that they are ‘jointly normally distributed’], we need to build into our calculations the possibility that they might not be independent, which will lead to the multivariate normal distribution (a trivial example of the failure of our above expressions for non-independent variables is if \inline X_2 = –\inline X_1; in which case \inline X_2 + \inline X_1 = 0, and it is certainly NOT true that \inline 0\sim {\mathbb N}(\mu_1+\mu_2,\sigma_1^2 + \sigma_2^2)!

To measure the dependence of two normal variables, our starting point is their covariance. Similarly to variance, this measures the difference between the expectation of the product of two of these variables from the individual expectations

{\rm Cov}[X_1,X_2] = {\mathbb E}[X_1\cdot X_2] - {\mathbb E}[X_1]{\mathbb E}[X_2]

which is 0 for independent variables. A scaled version of the covariance is called the correlation

\rho(X_1,X_2) = {{\rm Cov}[X_1,X_2] \over \sqrt{{\rm Var}[X_1]{\rm Var}[X_2]}}

so that \inline \rho \in [-1,1]. The combined pdf for two or more variables becomes rapidly more complicated, I’ll look at ways of thinking about it another time but state here the case for two variables each with a standard normal distribution and correlation \inline \rho:

P(X=x, Y=y) = {1 \over 2\pi \sqrt{1-\rho^2}}\exp{\Bigl(-{1 \over 2(1-\rho^2)}(x^2 + y^2 - 2\rho xy)\Bigr)}

[Exercise: derive the product rule for two normal and for two lognormal variables when the normals involved are allowed to be correlated]

Finally, for normal variables only, we have the important result that if X and Y have correlation \inline \rho, we can re-write X as the sum of Y and Z as

X = \rho \cdot Y + \sqrt{1-\rho^2}\cdot Zwhere Y and Z are uncorrelated. I’ll be using this expression lots in the future!

Leave a Reply

Your email address will not be published. Required fields are marked *