In this post I’m going to give a little introduction to Monte Carlo as a method for integration, and try to get server-side scripting working via WordPress!
Monte Carlo is fundamentally a technique for doing numerical integration. If we’re confronted with an integral that we can’t solve analytically, we have an array of possible techniques available to us. For 1d integrals, probably the easiest thing for well-behaved functions is to use Simpson’s rule and integrate across the part we’re interested in. But to do this in many dimensions can be tricky – for each dimension we probably want the same number of partitions, so the overall number of function evaluations scales exponentially with dimension – very bad news! By contrast, Monte Carlo essentially samples the function at random points and takes an average. The higher value points contribute more to the average than lower value points, and the overall error in the average scales as , giving a good approximation relatively quickly for high dimensional problems.
The quintessential example of a Monte Carlo experiment is a simple process to approximate pi. Consider drawing a circle inscribed into a square, just as shown in the following image:
Now, imagine scattering dried cous-cous over the whole area. They will land randomly all over the surface, and first of all we will sweep away any that fall outside of the square.
What next? Well, if we count the number of grains that fell inside the circle, and compare that to the total amount that fell inside the square, what do we expect the ratio to be? As long as they’re not too clustered, the expectation is that it will be , which is of course just the ratio of the areas.
Rather than actually counting grains, we can simulate this process on the computer much more quickly. To represent each grain, we simulate two uniform variables, each on the interval [-0.5,0.5]. We treat these as the grain’s x-coordinate and y-coordinate, and calculate the distance from the origin (0,0) by taking the sum of the squares of these. If the sum is less than the square of the radius of the circle (ie. 0.25) then the grain is ‘inside’ the circle, if it is greater then the grain is ‘outside’.
Here is the simulation run for 1000 grains of cous-cous (refresh for a repeat attempt):
Grains Simulated: 1000
Grains Inside Circle: 780
Our estimate of pi is 3.12
The estimate here is probably fairly close – you may have been lucky (or unlucky!), but we claimed that the estimate will converge to the real value of pi with a progressively larger number of grains [exercise: can you demonstrate this using the central limit theorem?]. Well, below you’ll find another simulation, this time going up to a little over a million grains but making estimates each time the number of grains is doubled, and the error in the estimate is compared to the number of grains so far at each step.
|number of steps||estimate||error|
It should b fairly straight-forward to copy-paste these figures into excel – does the error fall like as claimed? In fact, the central limit theorem tells us that the mean of the ratio of grains inside should go to a normal distribution with standard deviation proportional to this quantity, so we should expect it to be outside the bound about 30% of the time (if you have calculated the right constant!).
Of course, in the 2 dimensional circle case, a better idea might be to try and put points evenly over the square and count how many of these fall inside/outside. As mentioned before, the benefits of Monte Carlo are most pronounced when there are many dimensions involved. But, this is something like the procedure involved in quasi-Monte Carlo, a procedure that I’ll talk about some other time that doesn’t use random numbers at all…