Tag Archives: Computers

Science on a Plane

Temperature profile flying in to MSP around 2120 UTC on April 25, 2016.  Image credit: Bill Mitchell (CC-BY).
Temperature profile flying in to MSP around 2120 UTC on April 25, 2016. Image credit: Bill Mitchell (CC-BY).

One of my favorite things to do on an airplane, when I can, is to take a temperature profile during the descent. Until recently, this could generally only be done on long international flights, when they had little screens which showed the altitude and temperature along with other flight data. However, I found on my latest trip that sometimes now even domestic flights have this information in a nice tabular form.

To take a temperature profile, when the captain makes the announcement that the descent is beginning, get out your notebook and set your screen to the flight information, where hopefully it tells you altitude (m) and temperature (°C). Record the altitude and temperature as frequently as they are updated on the way down, though you might set a minimum altitude change (20 m) to avoid lots of identical points if the plane levels off for a while. When you land, be sure to include the time, date, and location of arrival.

When you get a chance, transfer the data to a CSV (comma-separated value) file, including the column headers like in the example below.

Alt (m),Temp (C)

You can then use your favorite plotting program (I like R with ggplot) to plot up the data. I’ve included my R script for plotting at the bottom of the page. Just adjust the filename for infile, and it should do the rest for you.

At the top of the page is the profile I took on my way in to Minneapolis on the afternoon of April 25th. There were storms in the area, and we see a clear inversion layer (warmer air above than below) about 1 km up, with a smaller inversion at 1.6 km. From the linear regression, the average lapse rate was -6.44 °C/km, a bit lower than the typical value of 7 °C/km.

On the way in to Los Angeles the morning of April 25th, no strong inversion layer was present and temperature increased to the ground.

Temperature profile descending into Los Angeles on the morning of April 25, 2016.  Image credit: Bill Mitchell (CC-BY).
Temperature profile descending into Los Angeles on the morning of April 25, 2016. Image credit: Bill Mitchell (CC-BY).

This is a pretty easy way to do a little bit of science while you’re on the plane, and to practice the your plotting skills when you’re on the ground. For comparison, the University of Wyoming has records of weather balloon profiles from around the world. You can plot them yourself from the “Text: List” data, or use the “GIF: to 10mb” option to have it plotted for you.

Here is the code, although the long lines have been wrapped and will need to be rejoined before use.

# Script for plotting Alt/Temp profile
# File in format Alt (m),Temp (C)

infile <- "20160425_MSP_profile.csv" # Name of CSV file for plotting

library(ggplot2) # Needed for plotting
library(tools) # Needed for removing file extension to automate output filename

mydata <- read.csv(infile) # Import data
mydata[,1] <- mydata[,1]/1000 # convert m to km
mystats <- lm(mydata[,2]~mydata[,1]) # Run linear regression to get lapse rate
myslope <- mystats$coefficients[2] # Slope of regression
myint <- mystats$coefficients[1] # Intercept of regression

p <- ggplot(mydata, aes(x=mydata[,2], y=mydata[,1])) + stat_smooth(method="lm", color="blue") + geom_point() + labs(x="Temp (C)",y="Altitude (km)") + annotate("text", x=-30, y=1, label=sprintf("y=%.2fx + %.2f",myslope,myint)) + theme_classic() # Create plot

png(file=paste(file_path_sans_ext(infile),"png",sep="."), width=800, height=800) # Set output image info
print(p) # Plot it!
dev.off() # Done plotting


Satellite Image Processing Revisited

Heard Island on Nov. 20, 2015, with image processing underway in QGIS.  Image credit: Bill Mitchell (CC-BY) with satellite imagery from USGS (EO-1 satellite, ALI instrument).
Heard Island on Nov. 20, 2015, with image processing underway in QGIS. Image credit: Bill Mitchell (CC-BY) with satellite imagery from USGS (EO-1 satellite, ALI instrument).

Following up on my earlier post about satellite image processing, I am happy to report that I have made progress in being able to process images myself! Through a fortunate combination of search terms, timing, and luck, I managed to come across two key pieces of information that I needed.

First, I found out how to make RGB images from raster data layers, such as different spectral bands on a satellite, fairly easily with QGIS. That was a big step forward from how I had been doing it previously, which was inelegant, inefficient, and only mostly worked. Stacking three layers (one each for red, green, and blue) into a virtual raster catalog was just a few clicks away (Raster | Miscellaneous | Build Virtual Raster (Catalog)).

Encouraged by the success with that project, I continued clicking around and stumbled across some mention of pan-sharpening (also pan sharpening), where a panchromatic (all-color) detector at high resolution is used to enhance the resolution of a colored image (sharpen). Alternately, you can think of it in the complementary way, where lower-resolution color data is added to a high-resolution greyscale image. So thanks to this blog post, I was able to find out what I needed to do to make that happen in QGIS (and Orfeo Toolbox).

Of course, it would be too easy for that to work. I didn’t have the Orfeo Toolbox installed which that needed, and ended up having to compile that from source code.* When the compiler finished and the program was installed, I went to tell QGIS where it was—but a bug in QGIS prevented me from entering the folder location. First, having just installed and compiled stuff, I attempted to get the latest version of QGIS and many of the tools on which QGIS relies. Being unsuccessful in making all of those and some of the compiler configuration software play nicely with each other, I eventually remembered I could get updated packages through apt-get, which gets pre-compiled binary files put out by the maintainers of Debian Linux. That solution worked, I added the folder location, and now I can have my pan-sharpened images.

Here for your viewing pleasure is my first properly pan-sharpened image: Heard Island on Nov. 20, 2015, seen in “true color” by the Advanced Land Imager (ALI) on the EO-1 satellite.** I’m not convinced it’s right, and I think the contrast needs to be brought down a bit, but I think it’s close.

Heard Island in true color on Nov. 20, 2015.  Image processing: Bill Mitchell (CC-BY) using data from USGS/EO-1.
Heard Island in true color on Nov. 20, 2015. Image processing: Bill Mitchell (CC-BY) using data from USGS (EO-1/ALI).

* Knowing how to compile software from source code is a rather handy skill.
** Emily Lakdawalla has written a great explanation of what “true color” means.

Expedition Software

Preparing a Heard Island image taken with the NASA Aqua/MODIS instrument using GIMP.  Image credit: Bill Mitchell (CC-BY).
Preparing a Heard Island image taken with the NASA Aqua/MODIS instrument using GIMP. Image credit: Bill Mitchell (CC-BY).

Previously, I’ve written a little about the computer issues that may come up on the Heard Island expedition, as well as some of my views on open access (I’m generally for it). Now I’d like to talk a little bit more about the software which will be on the expedition computers.

For the most part, we’re using open-source software wherever it’s practical. My heart broke a little when I realized we would not be able to run Linux for many of the computers because of some of the programs needed to support the ham radio side of the expedition. I prefer open-source software because it can help encourage experimentation and learning among amateur programmers, it has code which is verifiable (not subject to security through obscurity), can be shared freely, and does a better job supporting open formats free of restrictive proprietary specifications that force vendor lock-in.

Here are some of the software packages which are coming with us:

  • GIMP (the GNU Image Manipulation Program), for raster graphics
  • Inkscape, for scalable vector graphics
  • Firefox, for standards-compliant HTML browsing
  • FileZilla, an FTP client for sharing files over the (local) network
  • PuTTY, a command-line client for accessing remote computers
  • VLC (VideoLAN Client), for playing sound/video files
  • Audacity, an audio recording/editing program
  • Pidgin, a chat client
  • LAMP (Linux, Apache, MySQL, PHP), a web server and related programs/languages (probably coming, but not confirmed)
  • QGIS, a mapping/geospatial information system

Just as with ecology, having a healthy, diverse software ecosystem is important. It allows new ideas a chance to thrive, and for users who know what they are doing to add features and patch bugs themselves. More and more academic research is turning toward open-source tools, from R (statistics) to Git (of GitHub fame) to WordPress. We are choosing as much open source software and as many open formats as possible to help preserve the data for the future. Proprietary formats are subject to the corporation changing the format to force upgrades, vendors going out of business, and other issues. It also happens with open-source projects, but there are generally compatible programs which can handle your data.

Most of these programs above are ones I have been using for quite a while, and have significant user bases. Online support is pretty easily accessed via search engine. Try them out at home, and you may find them quite to your liking!

Geoscientist’s Toolkit: QGIS

QGIS screenshot, showing Heard Island.  Brown is land/rock, blue are lagoons, and the dotted white is glacier.
QGIS screenshot, showing Heard Island. Brown is land/rock, blue are lagoons, and the dotted white is glacier.

One of a geoscientist’s most useful tools is a geographic information system, or GIS. This is a computer program which allows the creation and analysis of maps and spatial data. Perhaps the most widely used in academia is ArcGIS, from ESRI. However, as a student and hobbyist who likes to support the open-source software ecosystem, I use the free/open-source QGIS.

QGIS can be used to make geologic maps of an area, chart streams, and note where certain geologic features (e.g. volcanic cones) are present. For instance, at the top of this post is a map of Heard Island that I’ve been playing with, from the Australian Antarctic Division. It is composed of three different layers, each published in 2009: an island layer (base, brown), a lagoon layer (middle, blue), and a glacier layer (top, dotted bluish-white).

I believe I have mentioned here previously that one interesting thing about working with Heard Island is that with major surface changes underway (glacial retreat, erosion, minor volcanic activity), the maps become obsolete fairly quickly. This week I have been learning about creating polygons in a layer, so that I can recreate a geologic map from Barling et al. 1994.[1] One issue I’ve come up against, though, is that the 1994 paper has some areas covered in glacier (from 1986/7 field work), whereas my 2009 glacier extent map shows them to be presently uncovered. In fact, even the 2009 map shows a tongue of glacier protruding into Stephenson Lagoon (in the southeast corner), while recent satellite imagery shows no such tongue.

During the Heard Island Expedition in March and April, 2016, I hope that we will have time to go do a little geologic mapping. Creating some datasets showing the extent of glaciation (particularly along the eastern half of the island) and vegetation, as well as updating the geologic map to include portions which were glaciated in 1986/7, would be a worthwhile and seemingly straightforward project.

QGIS itself is much more than a mapping tool (not that I know how to use it), and can analyze numeric data which is spatially distributed, like the concentration of chromium in soil or water samples from different places on a study site. QGIS provides a free way to get your hands dirty with spatial data and mapping, and is powerful enough to use professionally. Users around the globe share information on how to use it, and contribute to its development.

For those looking to go into geoscience as a career, I would strongly recommend learning how to use it. I didn’t learn GIS in college (chemists don’t use it much), and somehow avoided it in grad school. But I regret not having put time in to learn it sooner. There’s all kinds of interesting spatial data, and a good job market for people with a GIS skillset (or so I hear). I have only scratched the surface of QGIS’s capabilities with my use of it, but I definitely intend to keep learning. You can probably follow the day-to-day frustrations and victories on my Twitter account (@i_rockhopper).


[1] Barling, J.; Goldstein, S. L.; Nicholls, I. A. 1994 “Geochemistry of Heard Island (Southern Indian Ocean): Characterization of an Enriched Mantle Component and Implications for Enrichment of the Sub-Indian Ocean Mantle” Journal of Petrology 35, p. 1017–1053. doi: 10.1093/petrology/35.4.1017

Heard Island Expedition Update: T-7 Months

Visualization of a proposed Heard Island shelter setup, using two HDT Global airbeam tents.  Each shelter is 20'x21'.  Image credit: Bob Schmieder [?].
Visualization of a proposed Heard Island shelter setup, using two HDT Global airbeam tents. Each shelter is 20’x21′. Image credit: Bob Schmieder [?].

It’s only seven months until the Heard Island expedition leaves Cape Town, South Africa, heading for Heard Island. Preparations are really beginning to get going!

This morning (Minnesota time) we had a conference call with the entire on-island team (such as were able to join). Scheduling that can be tricky, because we have team members scattered around the globe, including from Australia, the US, and Ukraine.

From the conference call, it was clear that things are coming along nicely. We are gaining familiarity at least with the voices of other team members, so that when people are speaking they don’t need to identify who they are. Planning for the shelters is mostly done. Camp layouts have been presented, and are up for argument. Logistics are coming along, but there is a lot to discuss: how much testing of equipment is required, where should it take place, and how do we get the materials from that place to Cape Town in an efficient manner?

For the past few weeks, the satellite link has been worrisome. Although there are two satellites which may be “visible” from Heard Island (in the radio sense, not the optical), they were not very high above the horizon. With terrain being significant on the island (camp is in a valley), and potential for local weather—especially low-layer marine weather—to negatively affect the satellite radio link, we were concerned that there would not be reliable data/phone connection from the island. Our expedition relies on that data link for safety, to keep in touch with off-island expedition headquarters, as well as to help the VK0EK ham radio operations with real-time contact reporting.

Fortunately, while discussing the expedition with satellite service providers, our satellite team found that one of the satellites in the constellation has been repositioned over the Indian Ocean. We will now have a satellite quite high in the sky, and communications are likely to be reliable. Bandwidth may not be very high still, but it’s better than from Pluto.

I’ve been doing some things for the expedition recently, too. Our Bay Area team has acquired laptops which will be used for the radio operation, and I have been helping with software configuration specifications for that. I have also been involved in radio team discussions about how to set up these portable stations—as an apartment-dweller, I know some things about setting up and tearing down stations. Simpler is better, as are plans with fewer moving parts (and less to haul on and off the island).

Last month, I tweeted a live Q&A session, discussing some of the science that has been done (or is proposed) on Heard Island. Check out the hashtag #HeardQuestions for that, and keep an eye out for another Q&A sometime (in a few months).

My physical training continues as well. I’ve been running, biking a little, doing core strength exercises, and stretching a lot more. Yesterday I was even convinced to take part in a 5k run. It has been several years since I last ran a 5k race, and while I’m not in the shape I was ten years ago, I definitely achieved my goals.

With seven months to go, I’m feeling really good about this expedition. Here’s hoping it comes off that well!

Addressing Uncertainty Numerically

Casino Monte Carlo, Monaco.  Image credit: Positiv (CC-BY-SA).
Casino Monte Carlo, Monaco. Image credit: Positiv (CC-BY-SA).

Recently I wrote about what scientific uncertainty is, and why it might be important. There are a few things which can be done to understand the “fuzziness” of a number. However, the real world can get complicated in a way which most introductory courses don’t deal with.

For instance, a standard introduction would help you through what to do if you are adding or multiplying two numbers. In the former case, you can add the uncertainties together, and in the latter you would take the root mean square uncertainty. This can be a little bit of math, but in general it’s fairly straightforward: plug in the numbers and out pops an answer.

Sometimes, though, figuring out the proper formula can be time-consuming or nearly impossible. In these cases, there is still a way to get a satisfactory answer: a Monte Carlo simulation, so named for the iconic casino in Monaco (pictured above).

The Monte Carlo simulation is effectively an exercise in rolling dice. Rather than handle the uncertainties analytically, random numbers are used (typically a Gaussian distribution of appropriate mean and width) and combined together in the equation in question—with no uncertainties attached. After a large number of repetitions (for my purposes around 104–107), the statistical distribution can be evaluated. In many cases that evaluation means taking a mean and standard deviation, although it is very informative to look at the distribution to be sure it is at least roughly Gaussian. If the distribution is bimodal or highly skewed, a standard deviation may not be an appropriate metric.

There is another place where Monte Carlo simulations are useful. Suppose you want to know the average (and distribution) produced when you roll four 6-sided dice, and sum the three highest. Is there a way to solve it analytically? Probably, but I can code (and run) the Monte Carlo simulation much more quickly.*

Here’s how to do that in the statistics program R:

my_repetitions <- 10000; # Set the number of trials
# Use a variable for this so when you change N, you only have to do so once.
# Tracking down every place you entered 10000 by hand is no fun.

my_roll <- function(){ # Declare a function to roll four dice and sum the highest three
# returns: sum of highest three of four six-sided dice rolls (integer)

roll_four <- sample(seq(1, 6), size=4, replace=TRUE); # Create a four-element vector representing four 6-sided dice
return(sum(roll_four) - min(roll_four)) # Sum the four dice, and subtract the minimum roll

my_results <- replicate(my_repetitions, my_roll()); # Create a vector of dimension my_repetitions, with each element the result of my_roll()

summary(my_results); # Show a statistical summary of the results
hist(my_results); # Plot a quick-and-dirty histogram of the results

Monte Carlo results (N=10,000) for rolling four 6-sided dice and summing the highest three.
Monte Carlo results (N=10,000) for rolling four 6-sided dice and summing the highest three.

Monte Carlo simulations are very handy when there are non-linear interactions between parameters with uncertainties. There may be no analytical solution, but creating a Monte Carlo procedure for determining the final distribution is relatively straightforward. In fact, this is what many weather models and climate models do. An ensemble of different numerical predictions will be put together, and the result gives a statistical likelihood of different events. With weather models, the trick is to keep the code running quickly enough that when it’s done the result is still a prediction—predicting what the weather would have been like a week ago is not useful in the way that knowing what tomorrow may have in store is.

This post is not meant to be an introduction to R (which I used), or to good programming practices (which I hope I used). Many of these are available online, or at colleges and universities in facilities like UC Berkeley’s D-Lab. I may also write about these topics at a later point.

* For those keeping score at home, the mean of this dice rolling scheme is 12.25, median is 12.00 with the middle 50% ranging from 10 to 14.

Geoscientist’s Toolkit: LaTeX

A LaTeX file.  Image/text credit: Bill Mitchell.
A LaTeX file. Image/text credit: Bill Mitchell.

LaTeX is a typesetting program used for preparing documents—generally articles and books, but sometimes posters and presentation slides. It is available for Linux, Mac, and Windows, and is free, open-source software. The primary output file format these days is PDF, but other options are available.

When you are putting together a document with figures, citations, and sections which get moved around, it is tough to use a common word processing program and maintain sanity. However, because LaTeX is a markup language (like HTML, the HyperText Markup Language), it is explicit which text is grouped where. For instance, suppose you are trying to have both superscripts and subscripts following a letter, such as in CO32-. If you need to edit the superscripts or subscripts in Word, it can get confused easily. In LaTeX, it is explicit which parts are superscript and which are subscript. A little more work up front saves a lot of frustration later.

Above is a small excerpt from my dissertation, written in LaTeX. I am not sure I would have survived grad school had I attempted to write my dissertation in a word processor.

Yes, there is a learning curve to using LaTeX, and you don’t see changes in the finished document immediately when you make them in your text editor, but there are tons of advantages.

First, the format is all plain text, so it will be readable for a long time and across platforms (although the OpenDocument formats are attempting to make word processor documents future-compatible). Plain text is also very convenient when combined with things like version control software. Track changes isn’t just for word processors!

LaTeX separates the content from the formatting. Most of the formatting is done automatically. Yes, you manually specify that something is a emphasized, or is part of a quote, or a heading, but LaTeX will make sure that the formatting is consistent throughout (unless you intervene), and the defaults are generally good.

One place where LaTeX really shines is in mathematical equations. Greek letters and many mathematical symbols are input as commands such as \beta or \sqrt{n}, so your hands need never leave the keyboard. Once typeset, the equations are neat and properly sized.

Many journals accept submissions in LaTeX, if they do not outright encourage its use, because it is easy to keep the formatting consistent from article to article. The fonts will match, the font sizes will match, and in general things are awesome and look professional.

I have given several presentations made with LaTeX (pdf output). The outline slides are automatically maintained, and slide headers/footers can show where in the presentation you are. Those indicators link to the sections if you click on the section name, and it’s all done automatically. LaTeX is totally worth the effort of learning, and do it soon while you can take your time and experiment. Writing your dissertation while learning LaTeX is a recipe for unhappiness.

So, now that you’re ready to get started, here are some tutorials and reference materials:

It really bothers me when the justification for doing something slow, inefficient, and expensive is “that’s what most people use, and I can’t be bothered to learn something new.” There comes a time to do things differently, and a good ecosystem is one where there are several options based around open standards. Case in point: USB ports are great! The proprietary charger connection on my (old-school) phone? Awful. Lock-in is expensive. Choose open source.