ohi logo
OHI Science | Citation policy

If you’d like to use spatial data from the Ocean Health Index global assessments for your region, this script walks through how to go about that.

Setup

You will need the rgdal and raster R libraries for this to work.

knitr::opts_chunk$set(message=F,warning=F, fig.width = 8, fig.height = 6,strip.white=TRUE)

options(scipen = 999) #this forces reporting in non scientific notation

library(rgdal)
library(tidyverse)
library(raster)


#Defining the wgs84 coordinate reference system
wgs_crs <- CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0")

Regions

Extracting raster data with shapefiles requires both the raster and shapefile to be in the same projection. Most of the global data provided is in the Mollweide projection. If your shapefile is in a different projection there are a couple options;

  1. you can reproject the global data to the projection you are using for your assessment and then clip the data
  2. you can reproject your shapefile to Mollweide to match the projection of the data before clipping.

We suggest option b as it is less time intensive.

The shapefile provided is of the US Northeast region in a WGS84 long/lat projection.

par(mfrow=c(1,2)) #set plotting window to show 2 plots side by side

#read in the US Northeast shapefile and reproject to Mollweide
rgn  = readOGR('shapefile','ohine_rgn', verbose = F)

plot(rgn,main = "US Northeast Regions \n WGS84 projection")

#define the mollweide projection coordinate reference system (crs)
mollCRS=crs('+proj=moll +lon_0=0 +x_0=0 +y_0=0 +ellps=WGS84 +units=m +no_defs')

#reproject the shapefile to mollweide using spTransform
rgn_moll = spTransform(rgn, mollCRS)

#plot
plot(rgn_moll, main = "US Northeast Regions \n Mollweide projection")

Pressure Data

For this example we are going to use the Sea Surface Temperature Data from 2012, provided in this repository as sst_anoms_2012.tif.

Here is the global data with our US Northeast regions overlayed on top. The data shown here are the number of weeks in 2012 each cell was anomalous, which is defined as above the climatological mean plus one standard deviation.

sst <- raster('sst_anoms_2012.tif')

plot(sst,title="Sea Suface Temperature Anomalies 2012",axes=F, box=F,main = "Sea Surface Temperatue Anomalies 2012",
     legend.args=list(text='Anomalous Weeks', side=4, font=2, line=2.5, cex=0.8))
plot(rgn_moll,add=T)

Crop global data to your region

Using the raster package crop function, you can crop the data to your region. This isn’t necessary since you can extract values directly from the global data, but it does help to visualize the data in your region if you crop the raster first.

#crop sst to your rgn
sst_crop <- crop(sst,rgn_moll)

plot(sst_crop,axes=F,
     legend.args=list(text='Anomalous Weeks', side=4, font=2, line=2.5, cex=0.8))
plot(rgn_moll,add=T)

Get regional data

There are various ways of getting the data you might for each of your subregions. Here we provide two ways of getting the average number of anomalous weeks per subregion using raster::extract() and raster::zonal().

extract()

Use raster::extract() to get all values of the raster within each of your subregions. This function returns all cell values in a list in length equal to the number of subregions. The list can be turned into a dataframe with two columns;

  • value are the cell values (in this case number of anomalous weeks)
  • rgn_name are the names of the sub regions
# get all values within each region
vals = extract(sst_crop,rgn_moll,method='simple')%>%
              setNames(rgn_moll@data$rgn_nam)

# plot distribution of data per region

df <- data.frame(unlist(vals))%>%
      rename(value = unlist.vals.)%>%
      mutate(rgn_name = gsub("\\d+", "",row.names(.))) #the gsub here removes all numbers from the rgn_name

#now we have each value assigned to rgn_name.
head(df)
##        value     rgn_name
## 1  1.3622589 Rhode Island
## 2  0.7557677 Rhode Island
## 3 34.5055254 Rhode Island
## 4 36.2092952 Rhode Island
## 5 39.2003898 Rhode Island
## 6 40.8668855 Rhode Island

This in itself is not very useful but it can be used to look at a few different things like mean per region

mean_df <- df%>%
          group_by(rgn_name)%>%
          summarise(mean_ssta = mean(value,na.rm=T))

mean_df
## # A tibble: 9 × 2
##                      rgn_name mean_ssta
##                         <chr>     <dbl>
## 1                 Connecticut  20.29954
## 2  Gulf of Maine/Bay of Fundy  35.14518
## 3                       Maine  29.18494
## 4 Massachusetts-Gulf of Maine  29.23637
## 5     Massachusetts-Virginian  28.54399
## 6               New Hampshire  29.26963
## 7                    New York  25.17330
## 8                Rhode Island  25.48335
## 9                   Virginian  26.50533

You can see on average the Gulf of Maine sub-region has a mean of 35 anomalous weeks in 2012.

zonal()

You can also get these values using the zonal function from the raster package. This function requires a base raster of the same extent, resolution and projection as your spatial data with the cell numbers representing the zones you’d like to use for your analysis. In this case, our zones can be identified by the rgn_id in the shapefile dataframe.

To create the zone raster, use the function rasterize() from the raster package.

#turning the rgn_moll shapefile into a raster with the same dimensions as sst_crop but with cell values equal to the rgn_id of each polygon in the shapefile
zones = rasterize(rgn_moll, sst_crop, field = rgn_moll@data$rgn_id)

plot(zones, axes=F,main = "Zonal Raster for US Northeast \n (cell values are unique to sub-regions)")

Now using the zonal() function, get the mean number of anomalous weeks per subregion.

mean_z <- zonal(sst_crop,zones,fun=mean) #we want to get the mean per region, this could also be min/max or another function
mean_z
##       zone    value
##  [1,]    1 18.20821
##  [2,]    2 25.17330
##  [3,]    3 20.29954
##  [4,]    4 29.23637
##  [5,]    5 26.69914
##  [6,]    6 29.26963
##  [7,]    7 29.18494
##  [8,]    8 35.14518
##  [9,]    9 26.54559

The “zone” column indicates the region id associated with each sub-region. The value column is the mean for that region.

This data can be saved as a CSV (using write_csv) to use in the Ocean Health Index toolbox.