Addressing Uncertainty Numerically

Casino Monte Carlo, Monaco.  Image credit: Positiv (CC-BY-SA).
Casino Monte Carlo, Monaco. Image credit: Positiv (CC-BY-SA).

Recently I wrote about what scientific uncertainty is, and why it might be important. There are a few things which can be done to understand the “fuzziness” of a number. However, the real world can get complicated in a way which most introductory courses don’t deal with.

For instance, a standard introduction would help you through what to do if you are adding or multiplying two numbers. In the former case, you can add the uncertainties together, and in the latter you would take the root mean square uncertainty. This can be a little bit of math, but in general it’s fairly straightforward: plug in the numbers and out pops an answer.

Sometimes, though, figuring out the proper formula can be time-consuming or nearly impossible. In these cases, there is still a way to get a satisfactory answer: a Monte Carlo simulation, so named for the iconic casino in Monaco (pictured above).

The Monte Carlo simulation is effectively an exercise in rolling dice. Rather than handle the uncertainties analytically, random numbers are used (typically a Gaussian distribution of appropriate mean and width) and combined together in the equation in question—with no uncertainties attached. After a large number of repetitions (for my purposes around 104–107), the statistical distribution can be evaluated. In many cases that evaluation means taking a mean and standard deviation, although it is very informative to look at the distribution to be sure it is at least roughly Gaussian. If the distribution is bimodal or highly skewed, a standard deviation may not be an appropriate metric.

There is another place where Monte Carlo simulations are useful. Suppose you want to know the average (and distribution) produced when you roll four 6-sided dice, and sum the three highest. Is there a way to solve it analytically? Probably, but I can code (and run) the Monte Carlo simulation much more quickly.*

Here’s how to do that in the statistics program R:

my_repetitions <- 10000; # Set the number of trials
# Use a variable for this so when you change N, you only have to do so once.
# Tracking down every place you entered 10000 by hand is no fun.

my_roll <- function(){ # Declare a function to roll four dice and sum the highest three
# returns: sum of highest three of four six-sided dice rolls (integer)

roll_four <- sample(seq(1, 6), size=4, replace=TRUE); # Create a four-element vector representing four 6-sided dice
return(sum(roll_four) - min(roll_four)) # Sum the four dice, and subtract the minimum roll
}

my_results <- replicate(my_repetitions, my_roll()); # Create a vector of dimension my_repetitions, with each element the result of my_roll()

summary(my_results); # Show a statistical summary of the results
hist(my_results); # Plot a quick-and-dirty histogram of the results

Monte Carlo results (N=10,000) for rolling four 6-sided dice and summing the highest three.
Monte Carlo results (N=10,000) for rolling four 6-sided dice and summing the highest three.

Monte Carlo simulations are very handy when there are non-linear interactions between parameters with uncertainties. There may be no analytical solution, but creating a Monte Carlo procedure for determining the final distribution is relatively straightforward. In fact, this is what many weather models and climate models do. An ensemble of different numerical predictions will be put together, and the result gives a statistical likelihood of different events. With weather models, the trick is to keep the code running quickly enough that when it’s done the result is still a prediction—predicting what the weather would have been like a week ago is not useful in the way that knowing what tomorrow may have in store is.

This post is not meant to be an introduction to R (which I used), or to good programming practices (which I hope I used). Many of these are available online, or at colleges and universities in facilities like UC Berkeley’s D-Lab. I may also write about these topics at a later point.

* For those keeping score at home, the mean of this dice rolling scheme is 12.25, median is 12.00 with the middle 50% ranging from 10 to 14.

Advertisements