When Counting Gets Difficult, Part 2

Prion sp., March 22, 2016, seen just west of Heard Island.  Image credit: Bill Mitchell.
Prion sp., March 22, 2016, seen just west of Heard Island. Image credit: Bill Mitchell.

Earlier I posed a question: suppose a group of 40 birds are identified to genus level (prion sp.). Four photographs of random birds are identified to species level, all of one species that was expected to be in the minority (fulmar prion) and likely would be present in mixed flocks. How many birds of the 40 should be upgraded from genus-level ID to species-level ID?

Clearly there is a minimum of one fulmar prion present, because it was identified in the photographs. With four photographs and 40 birds, the chance of randomly catching the same bird all four times is quite small, so the number of fulmar prions is probably much higher than 1. At the same time, it would not be reasonable from a sample of our photographs to say all 40 were fulmar prions.

If we have four photographs of fulmar prions (A), what is the minimum number of non-fulmar prions (B) needed in a 40-prion flock to have a 95% chance of photographing at least one non-fulmar prion?

To answer this question, I used a Monte Carlo simulation, which I wrote in R. I generated 40-element combinations of A and B ranging from all A to all B. Then for each of those populations, I ran 100,000 trials, sampling 4 random birds from each population (with replacement). By tracking the proportion of trials for each population that had at least one B, it becomes possible to find the 95% confidence limit.

pop_size <- 40  # Set the population size
sample_n <- 4  # Set the number of samples (photographs)
n_trials <- 100000  # Set the number of trials for each population

x <- seq(0:pop_size)  # Create a vector of the numbers from 0 to pop_size (i.e. how many B in population)

sample_from_pop <- function(population, sample_size, n_trials){
	# Run Monte Carlo sampling, taking sample_size samples (with replacement)
                # from population (vector of TRUE/FALSE), repeating n_trials times
	# population: vector of TRUE/FALSE representing e.g. species A (TRUE) and B (FALSE)
	# sample_size: the number of members of the population to inspect
	# n_trials: the number of times to repeat the sampling
	my_count <- 0
	for(k in 1:n_trials){  # Repeat sampling n_trials times
		my_results <- sample(population, sample_size, replace=TRUE)  # Get the samples
		if(FALSE %in% my_results){  # Look for whether it had species B
			my_count <- my_count + 1  # Add one to the count if it did
		}
	}
	return(my_count/n_trials)  # Return the proportion of trials detecting species B
}

create_pop <- function(n,N){  # Make the populations
	return(append(rep(TRUE,N-n),rep(FALSE,n)))  # Populations have N-n repetitions of TRUE (sp. A), n reps of FALSE (sp. B)
}

mypops <- lapply(0:pop_size, create_pop, pop_size)  # Create populations for sampling

# Apply the sampling function to the populations, recording the proportion of trials sampling at least one of species B
my_percentages <- sapply(mypops, sample_from_pop, sample_size=sample_n, n_trials=n_trials)

My simulation results showed that with 22 or more birds of species B (non-fulmar prions), there was a >95% that they would be detected. In other words, from my photographic data, there is a 95% probability that the flock of 40 prions contained no fewer than 19 fulmar prions.

Let’s take a look at it graphically.

library(ggplot2)

mydata <- data.frame(my_percentages, 0:pop_size)  # Make a data.frame with the results and the # of species B
names(mydata) <- c("DetProb", "B")  # Rename the columns to something friendly and vaguely descriptive

p <- ggplot(mydata2, aes(x=B,y=DetProb)) + geom_point() # Create the basic ggplot2 scatterplot
p <- p + geom_hline(yintercept=0.95)  # Add a horizontal line at 95%
p <- p + theme_bw() + labs(x="# of species B (pop. 40)", y="Detection probability of B")  # Tidy up the presentation and labeling
print(p)  # Display it!
Results of the Monte Carlo simulation.  At left is all A, while at right is a population with all B.  The horizontal line is the 95% probability line.  Points above the line have a >95% chance of detecting species B.
Results of the Monte Carlo simulation. At left is all A, while at right is a population with all B. The horizontal line is the 95% probability line. Points above the line have a >95% chance of detecting species B.

With 22 or more non-fulmar prions, there’s a >95% chance one would be photographed. With 19 fulmar prions and 21 non-fulmar prions, there’s a >5% chance the non-fulmar prions would be missed. So our minimum number of fulmar prions is 19. I may have seen a flock of 40 fulmar prions, but there aren’t enough observations to say with statistical confidence that they were all fulmar prions.