Tag Archives: Uncertainty

Addressing Uncertainty Numerically

Casino Monte Carlo, Monaco.  Image credit: Positiv (CC-BY-SA).
Casino Monte Carlo, Monaco. Image credit: Positiv (CC-BY-SA).

Recently I wrote about what scientific uncertainty is, and why it might be important. There are a few things which can be done to understand the “fuzziness” of a number. However, the real world can get complicated in a way which most introductory courses don’t deal with.

For instance, a standard introduction would help you through what to do if you are adding or multiplying two numbers. In the former case, you can add the uncertainties together, and in the latter you would take the root mean square uncertainty. This can be a little bit of math, but in general it’s fairly straightforward: plug in the numbers and out pops an answer.

Sometimes, though, figuring out the proper formula can be time-consuming or nearly impossible. In these cases, there is still a way to get a satisfactory answer: a Monte Carlo simulation, so named for the iconic casino in Monaco (pictured above).

The Monte Carlo simulation is effectively an exercise in rolling dice. Rather than handle the uncertainties analytically, random numbers are used (typically a Gaussian distribution of appropriate mean and width) and combined together in the equation in question—with no uncertainties attached. After a large number of repetitions (for my purposes around 104–107), the statistical distribution can be evaluated. In many cases that evaluation means taking a mean and standard deviation, although it is very informative to look at the distribution to be sure it is at least roughly Gaussian. If the distribution is bimodal or highly skewed, a standard deviation may not be an appropriate metric.

There is another place where Monte Carlo simulations are useful. Suppose you want to know the average (and distribution) produced when you roll four 6-sided dice, and sum the three highest. Is there a way to solve it analytically? Probably, but I can code (and run) the Monte Carlo simulation much more quickly.*

Here’s how to do that in the statistics program R:

my_repetitions <- 10000; # Set the number of trials
# Use a variable for this so when you change N, you only have to do so once.
# Tracking down every place you entered 10000 by hand is no fun.

my_roll <- function(){ # Declare a function to roll four dice and sum the highest three
# returns: sum of highest three of four six-sided dice rolls (integer)

roll_four <- sample(seq(1, 6), size=4, replace=TRUE); # Create a four-element vector representing four 6-sided dice
return(sum(roll_four) - min(roll_four)) # Sum the four dice, and subtract the minimum roll
}

my_results <- replicate(my_repetitions, my_roll()); # Create a vector of dimension my_repetitions, with each element the result of my_roll()

summary(my_results); # Show a statistical summary of the results
hist(my_results); # Plot a quick-and-dirty histogram of the results

Monte Carlo results (N=10,000) for rolling four 6-sided dice and summing the highest three.
Monte Carlo results (N=10,000) for rolling four 6-sided dice and summing the highest three.

Monte Carlo simulations are very handy when there are non-linear interactions between parameters with uncertainties. There may be no analytical solution, but creating a Monte Carlo procedure for determining the final distribution is relatively straightforward. In fact, this is what many weather models and climate models do. An ensemble of different numerical predictions will be put together, and the result gives a statistical likelihood of different events. With weather models, the trick is to keep the code running quickly enough that when it’s done the result is still a prediction—predicting what the weather would have been like a week ago is not useful in the way that knowing what tomorrow may have in store is.

This post is not meant to be an introduction to R (which I used), or to good programming practices (which I hope I used). Many of these are available online, or at colleges and universities in facilities like UC Berkeley’s D-Lab. I may also write about these topics at a later point.

* For those keeping score at home, the mean of this dice rolling scheme is 12.25, median is 12.00 with the middle 50% ranging from 10 to 14.

Advertisements

Geoscientist’s Toolkit: Understanding Uncertainties

Tyrannosaurus rex skull, at the University of California Museum of Paleontology, University of California, Berkeley.  Image credit: EncycloPetey (CC-BY-SA).
Tyrannosaurus rex skull, at the University of California Museum of Paleontology, University of California, Berkeley. Image credit: EncycloPetey (CC-BY-SA).

Uncertainty, in the analytical or numeric sense, is a topic many people approach with trepidation. However, it is much more approachable and intuitive than it may seem.

Consider the following examples.

First, the local meteorologist has forecast a high temperature tomorrow of 45 °F (7 °C). Suppose I take my thermometer out tomorrow afternoon, let it sit in the shade for a while, and it reads 43 °F (6 °C). Was the forecast wrong?
What if it were 37 °F (3 °C)?

Perhaps a different example may be more familiar. You are on your way to meet a friend for lunch, and you tell them that you’ll be there in fifteen minutes. Seventeen minutes later, you arrive. Were you on time? Were you late?

Finally, here is yet another example, and one I struggle with: you are in a science museum, and the sign next to the T. rex says that it went extinct around 65 Ma. However, there was a paper published two years ago which determined that the Chixulub impact occurred at or slightly before 66.043 Ma (+/-22 ka, 2-sigma).[1] Is the sign on the exhibit wrong?

I hope that from these examples, you can see that uncertainty is something you encounter more often than you necessarily think about. You may also find you have some intuitive sense for what it means and where it might be important.

Here are my interpretations of the above scenarios.

The meteorologist’s forecast is correct. Although it is not usually mentioned, meteorologists in temperate climes expect their one-day forecasts to be accurate to about 2 °F. They have a fairly large area to cover, which may have some temperature differences within it, and there is some uncertainty in the reading given by the thermometer.

When meeting a friend for lunch, being off by two minutes in fifteen is probably fine. Most people understand that such an arrangement may be accurate only to within five minutes. For some people, even five minutes may be well within the understood uncertainty.

As for the museum, I would think it is a great opportunity to teach about uncertainty! On one hand, the casual use of 65 Ma (implying +/-2.5 Ma) is correct. On the other hand, that figure is quite imprecise compared to how well we know when the event happened. Even to the nearest million years, 66 Ma would be more appropriate. This brings up another point: our understanding of geochronology is based on the latest research. Advances in the field, analysis of different materials, or even different interpretations of previously existing data can lead to changes. Despite the great amount of effort put into compiling the best science and geochronology for the latest Geologic Time Scale (mostly not open access, sadly), parts became obsolete even as it went to press. Mostly, these changes are small, and primarily concern scientists who are studying events near these boundaries. If the Cretaceous-Paleogene boundary were to be moved from 66 Ma to 90 Ma, that radical change would need to be backed up with a lot of data, and would necessitate a new sign for T. rex or an acknowledgement that dinosaurs lasted well into the Paleogene (there is no widely accepted evidence that they did).

Hopefully these examples have illustrated that you already have an intuitive understanding of what analytical or numeric uncertainties are and why they are important. In science, where measurements are being made, and calculations based on those measurements are being computed, understanding the uncertainties becomes increasingly important. There are many resources available which outline the process for determining and reporting uncertainties, ranging from a ten-page worksheet to a [very handy and accessible] small book to a comprehensive tome (open access!).

***
[1] Renne, P. R., Deino, A. L., Hilgen, F. J., Kuiper, K. F., Mark, D. F., Mitchell, W. S., III, Morgan, L. E., Mundil, R., Smit, J. (2013) Time Scales of Critical Events Around the Cretaceous-Paleogene Boundary. Science 339: 684-687, doi: 10.1126/science.1230492.