The number of earthquakes in Oklahoma has increased in the past decade. In fact, scientists now claim that Oklahoma surpasses California in being the “earthquake capital of the world!” http://www.wfaa.com/news/local/investigates/oklahoma-earthquakes/143828814
In this case study, we will examine data obtained from the U. S. Geological Survey (USGS) on earthquakes in Oklahoma since 2000 http://earthquake.usgs.gov/earthquakes/search/
We will utilize several packages for this lab.
for (package in c("stringr", "lubridate", "dplyr", "ggplot2", "ggformula")){
library(package, character.only = TRUE)
}
The data are in the file OKEarthquakes.csv.
OKquakes <- read.csv("OKEarthquakes.csv", stringsAsFactors = FALSE)
head(OKquakes)
## time latitude longitude depth mag
## 1 2015-12-31T20:31:14.300Z 35.7860 -97.5687 6.479 2.6
## 2 2015-12-31T16:31:32.270Z 37.1308 -97.6583 6.060 2.7
## 3 2015-12-31T11:35:26.400Z 36.6123 -98.8055 6.285 3.2
## 4 2015-12-31T06:26:22.400Z 35.6710 -97.4083 5.969 2.5
## 5 2015-12-31T03:26:38.500Z 36.8328 -97.7676 5.410 3.0
## 6 2015-12-31T00:51:00.300Z 35.6644 -97.3975 6.180 2.8
## place
## 1 16km NNW of Edmond, Oklahoma
## 2 11km NNW of Caldwell, Kansas
## 3 24km SSW of Alva, Oklahoma
## 4 6km ENE of Edmond, Oklahoma
## 5 4km NW of Medford, Oklahoma
## 6 7km E of Edmond, Oklahoma
summary(OKquakes)
## time latitude longitude depth
## Length:5587 Min. :33.70 Min. :-103.30 Min. : 0.000
## Class :character 1st Qu.:35.82 1st Qu.: -97.84 1st Qu.: 4.694
## Mode :character Median :36.29 Median : -97.50 Median : 5.000
## Mean :36.24 Mean : -97.58 Mean : 5.254
## 3rd Qu.:36.69 3rd Qu.: -97.26 3rd Qu.: 6.000
## Max. :37.19 Max. : -94.30 Max. :56.210
## mag place
## Min. :2.50 Length:5587
## 1st Qu.:2.60 Class :character
## Median :2.80 Mode :character
## Mean :2.86
## 3rd Qu.:3.00
## Max. :5.60
The place
variable describes the location of the earthquakes. Notice that in this data set, there are several earthquakes that are outside of OKlahoma; for example, the second earthquake in the data set took place in Kansas.
We will need to extract just those observations that are in Oklahoma.
out <- str_detect(OKquakes$place, "Oklahoma")
head(out)
## [1] TRUE FALSE TRUE TRUE TRUE TRUE
OK <- filter(OKquakes, out) #filter from dplyr package
dim(OK)
## [1] 5204 6
Notice that in the time
variable, the date is given first, followed by a T, then the time (ex. “2015-12-31T20:31:14.300Z”). The International Organization for Standardization (IS) specifies this format in ISO 8601, the international standard for representing times and dates. The Z designation after the time indicates that the time is UTC (Cordinated Universal Time, or Zulu time). We will extract the date information using commands from stringr
and lubridate
.
#str_split from stringr
out <- str_split(OK$time, "T")
head(out)
## [[1]]
## [1] "2015-12-31" "20:31:14.300Z"
##
## [[2]]
## [1] "2015-12-31" "11:35:26.400Z"
##
## [[3]]
## [1] "2015-12-31" "06:26:22.400Z"
##
## [[4]]
## [1] "2015-12-31" "03:26:38.500Z"
##
## [[5]]
## [1] "2015-12-31" "00:51:00.300Z"
##
## [[6]]
## [1] "2015-12-30" "22:24:35.600Z"
The object out
is a list and each component of this list is a vector with two elements, the date and the time.
times <- sapply(out, "[[", 1)
head(times)
## [1] "2015-12-31" "2015-12-31" "2015-12-31" "2015-12-31" "2015-12-31"
## [6] "2015-12-30"
The sapply
command takes the name of the list as its first argument. The second argument is a function to apply to this list; here we want each component of this list (specified by the [[
), and then the first element of each component (the 1
).
We will now isolate the month and the year of occurrence of each earthquake using commands from the lubridate
package.
OK$year <- year(times)
OK$month <-month(times)
table(OK$year)
##
## 2000 2002 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
## 1 4 5 3 8 5 9 33 141 149 79 252 1865 2650
Clearly, the number of earthquakes in Oklahoma is increasing!
The USGS classifies earthquakes by the magnitude. Quakes with magnitude between 2.0 and 2.9 are considered “very minor,” quakes with magnitude between 3 and 3.9 are “minor,” between 4 and 4.9, “light,” and between 5 and 5.9, “moderate.” Is there a relationship between the magnitude of the earthquake and year?
sort(unique(OK$mag))
## [1] 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4.0 4.1
## [18] 4.2 4.3 4.4 4.5 4.7 4.8 5.6
cutoff <- c(2.0, 2.9, 3.9, 4.9, 5.9)
OK$type <- cut(OK$mag, breaks = cutoff, labels = c("very minor", "minor", "light", "moderate" ))
table(OK$type)
##
## very minor minor light moderate
## 3503 1646 54 1
table(OK$type, OK$year)
##
## 2000 2002 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
## very minor 1 1 3 2 5 4 7 13 100 86 45 150
## minor 0 3 2 1 3 1 2 20 39 59 33 99
## light 0 0 0 0 0 0 0 0 2 3 1 3
## moderate 0 0 0 0 0 0 0 0 0 1 0 0
##
## 2014 2015
## very minor 1285 1801
## minor 565 819
## light 15 30
## moderate 0 0
Most of the earthquakes have been very minor or minor. There were no earthquakes classified as “light” from 2000 to 2009, but then several “light” ones occurred since 2010.
We will use the ggplot2
package to visualize the table just created:
gf_bar( ~year, data = OK, fill = ~type)
Has there been a change in the depth of these earthquakes?
gf_point(depth ~ year, data = OK)
We can definitely see the change in variability of the depth of earthquakes starting in 2010.
Where are these earthquakes in Oklahoma occurring? We have information on the exact location of these earthquakes so we will map this data. First, we will use the map_data
command from lubridate
to extract the boundaries of Oklahoma and then create a map of the state.
OKmap <- map_data("state", region="Oklahoma")
gf_polygon(lat ~ long, data = OKmap, fill = "wheat")
Next, we add the location of the earthquakes and color-code these by the year of occurrence.
gf_polygon(lat ~ long, data = OKmap, fill = "wheat") %>%
gf_point(latitude ~ longitude, data = OK, color = ~year)
Another way to incorporate the year data is by using facets:
gf_polygon(lat ~ long, data = OKmap, fill = "wheat") %>%
gf_point(latitude ~ longitude |year, data = OK)
The package RColorBrewer
provides a sequential color palette. Investigate this palette and recreate the maps using one of these.
The file CalifQuakes.csv contains data on earthquakes in a rectangular region around California from January 1, 2010 through the end of 2015.
Extract those quakes that occurred just in California. Notice that some observations specify a California location by “CA” while others use “California”.
Investigate when earthquakes have been occurring in California. Has the distribution of the number of earthquakes been uniform across months? across years?
Is there a pattern to where earthquakes in California occur?
How severe have the earthquakes in California been? Are most of minor severity?