The number of earthquakes in Oklahoma has increased in the past decade. In fact, scientists now claim that Oklahoma surpasses California in being the “earthquake capital of the world!” http://www.wfaa.com/news/local/investigates/oklahoma-earthquakes/143828814

In this case study, we will examine data obtained from the U. S. Geological Survey (USGS) on earthquakes in Oklahoma since 2000 http://earthquake.usgs.gov/earthquakes/search/

We will utilize several packages for this lab.

for (package in c("stringr", "lubridate", "dplyr", "ggplot2", "ggformula")){
  library(package, character.only = TRUE)
}

The data are in the file OKEarthquakes.csv.

OKquakes <- read.csv("OKEarthquakes.csv", stringsAsFactors = FALSE)
head(OKquakes)
##                       time latitude longitude depth mag
## 1 2015-12-31T20:31:14.300Z  35.7860  -97.5687 6.479 2.6
## 2 2015-12-31T16:31:32.270Z  37.1308  -97.6583 6.060 2.7
## 3 2015-12-31T11:35:26.400Z  36.6123  -98.8055 6.285 3.2
## 4 2015-12-31T06:26:22.400Z  35.6710  -97.4083 5.969 2.5
## 5 2015-12-31T03:26:38.500Z  36.8328  -97.7676 5.410 3.0
## 6 2015-12-31T00:51:00.300Z  35.6644  -97.3975 6.180 2.8
##                          place
## 1 16km NNW of Edmond, Oklahoma
## 2 11km NNW of Caldwell, Kansas
## 3   24km SSW of Alva, Oklahoma
## 4  6km ENE of Edmond, Oklahoma
## 5  4km NW of Medford, Oklahoma
## 6    7km E of Edmond, Oklahoma
summary(OKquakes)
##      time              latitude       longitude           depth       
##  Length:5587        Min.   :33.70   Min.   :-103.30   Min.   : 0.000  
##  Class :character   1st Qu.:35.82   1st Qu.: -97.84   1st Qu.: 4.694  
##  Mode  :character   Median :36.29   Median : -97.50   Median : 5.000  
##                     Mean   :36.24   Mean   : -97.58   Mean   : 5.254  
##                     3rd Qu.:36.69   3rd Qu.: -97.26   3rd Qu.: 6.000  
##                     Max.   :37.19   Max.   : -94.30   Max.   :56.210  
##       mag          place          
##  Min.   :2.50   Length:5587       
##  1st Qu.:2.60   Class :character  
##  Median :2.80   Mode  :character  
##  Mean   :2.86                     
##  3rd Qu.:3.00                     
##  Max.   :5.60

The place variable describes the location of the earthquakes. Notice that in this data set, there are several earthquakes that are outside of OKlahoma; for example, the second earthquake in the data set took place in Kansas.

We will need to extract just those observations that are in Oklahoma.

out <- str_detect(OKquakes$place, "Oklahoma")
head(out)
## [1]  TRUE FALSE  TRUE  TRUE  TRUE  TRUE
OK <- filter(OKquakes, out) #filter from dplyr package
dim(OK)
## [1] 5204    6

Notice that in the time variable, the date is given first, followed by a T, then the time (ex. “2015-12-31T20:31:14.300Z”). The International Organization for Standardization (IS) specifies this format in ISO 8601, the international standard for representing times and dates. The Z designation after the time indicates that the time is UTC (Cordinated Universal Time, or Zulu time). We will extract the date information using commands from stringr and lubridate.

#str_split from stringr
out <- str_split(OK$time, "T")
head(out) 
## [[1]]
## [1] "2015-12-31"    "20:31:14.300Z"
## 
## [[2]]
## [1] "2015-12-31"    "11:35:26.400Z"
## 
## [[3]]
## [1] "2015-12-31"    "06:26:22.400Z"
## 
## [[4]]
## [1] "2015-12-31"    "03:26:38.500Z"
## 
## [[5]]
## [1] "2015-12-31"    "00:51:00.300Z"
## 
## [[6]]
## [1] "2015-12-30"    "22:24:35.600Z"

The object out is a list and each component of this list is a vector with two elements, the date and the time.

times <- sapply(out, "[[", 1)   
head(times)
## [1] "2015-12-31" "2015-12-31" "2015-12-31" "2015-12-31" "2015-12-31"
## [6] "2015-12-30"

The sapply command takes the name of the list as its first argument. The second argument is a function to apply to this list; here we want each component of this list (specified by the [[), and then the first element of each component (the 1).

We will now isolate the month and the year of occurrence of each earthquake using commands from the lubridate package.

OK$year <- year(times)
OK$month <-month(times)
table(OK$year)
## 
## 2000 2002 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 
##    1    4    5    3    8    5    9   33  141  149   79  252 1865 2650

Clearly, the number of earthquakes in Oklahoma is increasing!

The USGS classifies earthquakes by the magnitude. Quakes with magnitude between 2.0 and 2.9 are considered “very minor,” quakes with magnitude between 3 and 3.9 are “minor,” between 4 and 4.9, “light,” and between 5 and 5.9, “moderate.” Is there a relationship between the magnitude of the earthquake and year?

sort(unique(OK$mag))
##  [1] 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4.0 4.1
## [18] 4.2 4.3 4.4 4.5 4.7 4.8 5.6
cutoff <- c(2.0, 2.9, 3.9, 4.9, 5.9)

OK$type <- cut(OK$mag, breaks = cutoff, labels = c("very minor", "minor", "light", "moderate" ))

table(OK$type)
## 
## very minor      minor      light   moderate 
##       3503       1646         54          1
table(OK$type, OK$year)
##             
##              2000 2002 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
##   very minor    1    1    3    2    5    4    7   13  100   86   45  150
##   minor         0    3    2    1    3    1    2   20   39   59   33   99
##   light         0    0    0    0    0    0    0    0    2    3    1    3
##   moderate      0    0    0    0    0    0    0    0    0    1    0    0
##             
##              2014 2015
##   very minor 1285 1801
##   minor       565  819
##   light        15   30
##   moderate      0    0

Most of the earthquakes have been very minor or minor. There were no earthquakes classified as “light” from 2000 to 2009, but then several “light” ones occurred since 2010.

We will use the ggplot2 package to visualize the table just created:

gf_bar( ~year, data = OK, fill = ~type)

Has there been a change in the depth of these earthquakes?

gf_point(depth ~ year, data = OK)

We can definitely see the change in variability of the depth of earthquakes starting in 2010.

Where are these earthquakes in Oklahoma occurring? We have information on the exact location of these earthquakes so we will map this data. First, we will use the map_data command from lubridate to extract the boundaries of Oklahoma and then create a map of the state.

OKmap <- map_data("state", region="Oklahoma")

gf_polygon(lat ~ long, data = OKmap, fill = "wheat")

Next, we add the location of the earthquakes and color-code these by the year of occurrence.

gf_polygon(lat ~ long, data = OKmap, fill = "wheat") %>% 
  gf_point(latitude ~ longitude, data = OK, color = ~year)

Another way to incorporate the year data is by using facets:

gf_polygon(lat ~ long, data = OKmap, fill = "wheat") %>% 
  gf_point(latitude ~ longitude |year, data = OK)

On Your own

  1. The package RColorBrewer provides a sequential color palette. Investigate this palette and recreate the maps using one of these.

  2. The file CalifQuakes.csv contains data on earthquakes in a rectangular region around California from January 1, 2010 through the end of 2015.