Plane Crash Data - Part 2: Google Maps Geocoding API Request

16.08.2017 13:19

This is the second part of our series about plane crash data. To execute the code below, you'll first need to execute the code from the first part of this series  to obtain the prepared plane crash dataset.

In this part I'd like to get the geocoordinates from the Google Maps Geocoding API for the crash location and the point of departure as well as for the intended point of arrival. The location of the crash is contained in the location variable. The other two pieces of information are contained in the route variable, so we first need to extract them and store them in separate variables.

separators <- " - |- | -"

data <- data %>%
  # split route variable into "from" and "to"
           sep = separators,
           into = c("from", "to"),
           extra = "merge") %>% 
  # if there was a pit stop, "to" sometimes still contains two locations.
  # we need only the last one.
           sep = separators,
           into = c("pitStop", "to"),
           fill = "left") %>%

In order to prevent weird results, we exclude incomplete cases right from the start.

# exclude observations with NA
data <- data[complete.cases(data), ]

Now, in order to send requests to the Google Maps Geocoding API - which converts addresses into geocoordinates - you need to get yourself an API key. Here you go: Get Google API key

Let us store our key in an R object:


Now we have almost everything ready: We have complete data containing locations of departure, intended arrival, and crash as strings, and we have an API that converts these strings into geocoordinates. However this API returns the geocoordinates in form of a JSON string which we can't use right away. So what we need is a function to extract the relevant information from this JSON string and store it in our dataset. Therefore we need to load the jsonlite package.


Look at the following function. It takes two arguments: the location and the API key. The return value is a vector containing the geocoordinates of the location. If the status of the request is "OK", the API returns the geocoordinates (latitude lat and longitude lng) which our function writes directly into a dataframe. However if the Google API cannot return any coordinates for the requested location, the API will return the string "ZERO_RESULTS". Then our function returns NAs. This case may occur if the location is unknown (?) or given as Sightseeing for example.

getGeoCoord <- function(loc, apiKey) {
  # create request
  request <- paste0("", 
                    "address=", gsub(" ", "+", loc), "&key=", apiKey)
  # extract results and convert them to strings
  result <- request %>% lapply(fromJSON) %>% .[[1]]
  if (result$status == "OK") {
    result <- result$results$geometry$location[1, ]
  } else if (result$status == "ZERO_RESULTS") {
    result <- data.frame(lat = NA, lng = NA)
  result %>% data.frame

Now let us use the function and extract geocoordinates for the plane crash locations, the departure locations and the locations of intended arrival. We first store them in objects called coordCrash, coordFrom and coordTo. Then we add them to our existing dataframe.

# send requests:

# crash location
coordCrash <- lapply(data$location, getGeoCoord, apiKey = apiKey) %>% 
  bind_rows %>% setNames(paste0(names(.), "CrashLoc"))

# departure location
coordFrom <- lapply(data$from, getGeoCoord, apiKey = apiKey) %>% 
  bind_rows %>% setNames(paste0(names(.), "From"))

# intended arrival location
coordTo <- lapply(data$to, getGeoCoord, apiKey = apiKey) %>% 
  bind_rows %>% setNames(paste0(names(.), "To"))

# add the new columns to data
data <- cbind(data, coordCrash, coordFrom, coordTo)

Now the data is ready to be visualised. This happens in the third part of the series.


Further parts of the Plane Crash series:

Go back