ohi logo
OHI Science | Citation policy

1 Summary

Spatial data from IUCN and Aquamaps is combined with extinction risk information from IUCN to generate regional scores for the Species subgoal. A region’s status is based upon an area-weighted average of species health across each global reporting region.

From Halpern et al (2012):

The target for the Species sub-goal is to have all species at a risk status of Least Concern. We scaled the lower end of the biodiversity goal to be 0 when 75% species are extinct, a level comparable to the five documented mass extinctions and would constitute a catastrophic loss of biodiversity. The Status of assessed species was calculated as the area- and threat status-weighted average of the number of threatened species within each 0.5 degree grid cell.

Mean risk status per cell:

\[\bar{R}_{cell} = \frac{\displaystyle\sum_{species}(Risk)}{n_{spp}}\]

Mean risk status per region:

\[\bar{R}_{SPP} = \frac{\displaystyle\sum_{cells}(\bar{R}_{cell} * A_{cell} * pA_{cell-rgn})}{A_{rgn}}\]

Species goal model

\[X_{SPP} = \frac{((1 - \bar{R}_{SPP}) - 0.25)}{(1 - 0.25)} * 100%\]

where:

2 Updates from previous assessment

Changes since 2015 SPP subgoal for global OHI:


3 Data Sources

IUCN:

AquaMaps:


4 Methods

4.1 Extract AquaMaps data from .sql files

AquaMaps data for the 2016 assessment was provided as .sql files, as in previous years, that can be used to generate an SQL database. Each line in the .sql is a command to populate the SQL database.

  • Original files extracted from aquamaps_2015_full_dataset_ohi.zip:
    • hcaf_ohi.sql
    • speciesoccursum_ohi.sql
    • hcaf_species_native_ohi.sql
  • Processed files saved to git-annex:
    • hcaf_truncated.csv
    • speciesoccursum.csv
    • hcaf_sp_native_trunc.csv

To extract data, we instead scan each line for CREATE TABLE and INSERT INTO commands to create and save dataframes. Note that the am_extract_2015.R script discards much of the data from these .sqls that is not used within the OHI Species Goal processing (thus “truncated”). This speeds up read time and processing time and avoids parsing issues with some of the rows/columns. Note also that the hcaf_species_native_ohi.sql file does not originally contain LOICZID information; this is added in the extract script, since LOICZID as a cell identifier is faster and less memory intensive than CsquareCode (integer vs character string).

The spp_ico/R/am_extract_2015.R script performs these operations. This can be a time consuming process, so typically this code chunk is run once and then set to eval = FALSE once the outputs have been generated.

reload <- FALSE

source(file.path(dir_git, 'R/am_extract_2015.R'))
Half-degree cell info from hcaf_truncated.csv (first few rows)
csquarecode loiczid nlimit slimit wlimit elimit centerlat centerlong cellarea oceanarea
5207:363:1 167254 -26.0 -26.5 -73.5 -73.0 -26.25 -73.25 2772.29 2772.29
5207:363:2 167253 -26.0 -26.5 -74.0 -73.5 -26.25 -73.75 2772.29 2772.29
5207:363:3 167974 -26.5 -27.0 -73.5 -73.0 -26.75 -73.25 2760.25 2760.25
5207:363:4 167973 -26.5 -27.0 -74.0 -73.5 -26.75 -73.75 2760.25 2760.25
5207:360:1 167260 -26.0 -26.5 -70.5 -70.0 -26.25 -70.25 2772.29 0.00
Species info from speciesoccursum.csv (first few rows)
speciesid reviewed speccode genus species fbname occurcells kingdom phylum class order family iucn_id iucn_code iucn_version
Fis-156671 null 62612 Abalistes filamentosus null 12 Animalia Chordata Actinopterygii Tetraodontiformes Balistidae null N.E. 2015-2
Fis-53544 1 9 Abalistes stellaris Starry triggerfish 198 Animalia Chordata Actinopterygii Tetraodontiformes Balistidae null N.E. 2015-2
Fis-142700 1 58334 Abalistes stellatus null 235 Animalia Chordata Actinopterygii Tetraodontiformes Balistidae null N.E. 2015-2
Fis-27725 1 10232 Ablabys taenianotus Cockatoo waspfish 59 Animalia Chordata Actinopterygii Scorpaeniformes Tetrarogidae null N.E. 2015-2
Fis-22975 1 972 Ablennes hians Flat needlefish 397 Animalia Chordata Actinopterygii Beloniformes Belonidae null N.E. 2015-2
Species-to-cell lookup from hcaf_sp_native_trunc.csv (first few rows)
speciesid probability loiczid
Fis-29358 1.00 129241
Fis-139729 0.97 129241
Fis-23185 0.81 129241
Fis-29263 1.00 129241
Fis-29290 1.00 129241

4.2 Ingest IUCN species list

To identify appropriate IUCN species for the analysis, we identified all IUCN Red List species whose habitat included “marine” designation. The ingest_iucn.R script scrapes this data directly from the IUCN Red List website.

  • Starting point: Access Red List API at http://api.iucnredlist.org/index/all.csv. This provides a list of all Red List species, including unique IUCN code, taxonomic nomenclature, and Red List extinction risk.
  • For all identified species, we access species-specific info at http://api.iucnredlist.org/details/X/0 where X is the unique IUCN species ID; this page is saved to a cache directory on git-annex as a .htm file. At the same time, we scrape the “habitats” field from the saved page using XML tags.
  • All species with “marine” habitat are then scraped to find details on population trend and subpopulation ID numbers.
  • After some cleaning of scientific names (accents, spaces, html tags, etc) the file is saved to git-annex for later use.

Processed files, saved to git-annex/globalprep/spp_ico/v201x/int: * spp_iucn_all.csv - full list of IUCN species pulled from web, some cleaning. * spp_iucn_habitats.csv - list of IUCN species (by iucn_sid) and corresponding habitat. * spp_iucn_marine.csv - prepped list: cleaned marine list with subpops and trends.

The spp_ico/R/ingest_iucn.R script performs these functions. This can be a time consuming process, so typically this code chunk is run once and then set to eval = FALSE once the outputs have been generated.

reload <- FALSE

source(file.path(dir_git, 'R/ingest_iucn.R'))
IUCN marine species list and info from spp_iucn_marine.csv (first few rows)
sciname class order family genus species authority iucn_sid modified_year category criteria habitat popn_trend subpop_sid parent_sid
Abantennarius analis ACTINOPTERYGII LOPHIIFORMES ANTENNARIIDAE Abantennarius analis (Schultz, 1957) 155277 2010 LC NA Marine Unknown NA NA
Ablennes hians ACTINOPTERYGII BELONIFORMES BELONIDAE Ablennes hians (Valenciennes, 1846) 13486514 2015 LC NA Marine Unknown NA NA
Ablennes pacificus ACTINOPTERYGII BELONIFORMES BELONIDAE Ablennes pacificus (Valenciennes, 1846) 13486514 2015 LC NA Marine Unknown NA NA
Aboma etheostoma ACTINOPTERYGII PERCIFORMES GOBIIDAE Aboma etheostoma Jordan & Starks, 1895 183435 2010 DD NA Marine Unknown NA NA
Aboma snyderi ACTINOPTERYGII PERCIFORMES GOBIIDAE Aboma snyderi (Temminck & Schlegel, 1845) 181137 2012 LC NA Marine Unknown NA NA

4.3 Generate full species lookup table

Having processed AquaMaps and IUCN species raw data, we can now prepare a full combined list of all species to be included in the OHI SPP goal. The function create_spp_master_lookup() creates the full lookup table:

  • Load AquaMaps species info from speciesoccursum.csv
    • Verify sciname field; the verify_scinames() function uses taxize::gnr_resolve() to compare AM scinames to accepted names from Encyclopedia of Life and NCBI databases - this helps resolve differences in naming conventions and species aliases.
  • Load IUCN species info from spp_iucn_marine.csv
    • Verify sciname field as above.
  • Join the two species lists by verified scientific names as the best common identifier.
  • Create an extinction risk category field for each row, using IUCN category, or failing that the category listed in AquaMaps; failing that, assign NA.

At this point, the list is all AquaMaps species and all marine-identified IUCN species. Next, identify the source of spatial distribution data for each species, if available.

  • From the downloaded IUCN shapefiles, determine which species have spatial data.
  • Depending on preference and availability of spatial source (AquaMaps or IUCN), tag each row with a specific spatial_source (“am” or “iucn”).
  • Follow a similar process for IUCN species included in the BirdLife International geodatabase (tag as “iucn-bli”).

Now having identified spatial source availability and preference for all species on the list:

  • Clean out duplicated species
  • Assign numeric values to population Red List category and population trend.
  • Save result to git-annex/globalprep/spp_ico/v2016/int/spp_all_raw.csv
  • Filter out subpop species rows (species with a non-NA parent_sid) that do not have a named subpopulation location (i.e. iucn_subpop field is NA, instead of the name of a subpopulation)
  • Save final list to git-annex/globalprep/spp_ico/v2016/int/spp_all_cleaned.csv

The spp_ico/v2016/prep_spp_list.R script performs these functions. This does not take all that long, but the saved output can be used instead instead of reloading the entire process. Note the optional arguments source_pref and fn_tag that can be used to customize the run (passed to create_spp_master_lookup() from spp_fxn.R); 'iucn' and '' are standard defaults.

reload      <- FALSE
source_pref <- 'iucn'
fn_tag      <- ''

source(file.path(dir_git, scenario, 'prep_spp_list.R'))
Combined species list from spp_all_cleaned.csv (first few rows)
am_sid am_cat sciname iucn_sid pop_trend pop_cat spp_group id_no iucn_subpop spatial_source cat_score trend_score
Fis-26169 LC Apolemichthys trimaculatus 165835 Stable LC ANGELFISH 165835 NA iucn 0 0
Fis-28014 LC Apolemichthys xanthotis 165853 Stable LC ANGELFISH 165853 NA iucn 0 0
Fis-28015 LC Apolemichthys xanthurus 165844 Unknown LC ANGELFISH 165844 NA iucn 0 NA
Fis-27342 LC Centropyge acanthops 155083 Stable LC ANGELFISH 155083 NA iucn 0 0
Fis-22168 LC Centropyge argi 165837 Unknown LC ANGELFISH 165837 NA iucn 0 NA

4.4 Spatialize species information using AquaMaps and IUCN spatial data

4.4.1 Extract IUCN polygons to half-degree cells

We extract IUCN polygon presence to the same half-degree cells as AquaMaps to simplify the analysis. * The spp_all species list includes a field spp_group that identifies which shapefile contains the spatial information for a given species. * for each species group, specific species are identified by comparing iucn_sid from dataframe to id_no within the shapefile. * Extract loiczid cell IDs for each species within each species group. Save a .csv file for that group, with fields: * sciname | iucn_sid | presence | subpop | LOICZID | prop_area * presence codes: 1 extant; 2 prob extant (discontinued); 3 Possibly Extant; 4 Possibly Extinct; 5 Extinct (post 1500); 6 Presence Uncertain * NOTE: this takes a long time - multiple hours for some of the shape files.
* by passing a filtered data frame to the function, you can focus the process only on new or updated shapefiles * reload = FALSE allows the function to skip extraction on groups with files already present. Set to TRUE if you need to extract an updated shapefile (or change the shapefile name, or delete the previous extraction…).

The function extract_loiczid_per_spp() performs these functions, and is contained in the spp_ico/v2016/spp_fxn.R script. This can be a time consuming process, so typically this code chunk is run once and then set to eval = FALSE once the outputs have been generated.

spp_all <- read_csv(file.path(dir_anx, scenario, 'int/spp_all_cleaned.csv'))

### set up maps_list for all standard (non-bird) IUCN species...
maps_list_iucn <- spp_all %>%
  filter(str_detect(spatial_source, 'iucn') & !str_detect(spatial_source, 'bli')) %>%
  dplyr::select(sciname, iucn_sid, spp_group) %>%
  unique()

extract_loiczid_per_spp(maps_list_iucn,
                        shp_dir = file.path(dir_data_iucn, 'iucn_shp'), 
                        fn_tag = scenario, 
                        reload = FALSE)

### set up maps_list for all bird IUCN species...
maps_list_bli <- spp_all %>%
  filter(str_detect(spatial_source, 'bli')) %>%
  dplyr::select(sciname, iucn_sid, spp_group) %>%
  mutate(spp_group = 'BOTW') %>%
  unique()

extract_loiczid_per_spp(maps_list_bli,
                        shp_dir = dir_data_bird, 
                        fn_tag = scenario, 
                        reload = FALSE)

4.4.2 Generate cell-by-cell summary of species

For each half-degree cell, tally up the number of species present and determine a mean species risk value and population trend value for the cell.

  • At this point, the species-cell lists are filtered to species with valid extinction risk categories - i.e. not DD and not NA.
  • Since this averaging is done for each data set separately, we also track the species count per cell used to determine both the risk and the trend (separately, since many species with a risk value have no trend information, i.e. NA). These counts are used to weight the values when the two are combined.
  • Data-set specific idiosyncracies:
    • For AquaMaps, we apply a threshold to set the minimum probability of occurrence that determines species “presence.”
    • For IUCN, no threshold is needed; but the shapefiles include a “presence” attribute in which a value of 5 indicates a region in which a subpopulation has become extinct. We use this to manually reset local extinction risk and trend to EX and NA respectively.
    • Note that for IUCN, we determine the proportional area when extracting polygons; currently we just consider any presence to fill the cell (similar to assuming even a low AquaMaps probability to indicate presence within the entire cell).

The following code chunk executes the functions that perform these tasks. Note the optional arguments fn_tag and prob_filter that can be changed to facilitate custom runs (including different spp_all species info lists, different AquaMaps thresholds, and different filename tags to uniquely identify the custom run)

spp_all <- read_csv(file.path(dir_anx, scenario, 'int/spp_all_cleaned.csv'))

am_cells_spp_sum <- process_am_summary_per_cell(spp_all, fn_tag = '', prob_filter = 0, reload = FALSE) %>%
  read_csv(col_types = 'dddddc')
### NOTE: keyed data.table works way faster than the old inner_join or merge.
### loiczid | mean_cat_score | mean_trend_score | n_cat_species | n_trend_species
### AM does not include subspecies or subpops: every am_sid corresponds to exactly one sciname.

iucn_cells_spp_sum <- process_iucn_summary_per_cell(spp_all, fn_tag = '', reload = FALSE) %>%
  read_csv(col_types = 'dddddc')
### loiczid | mean_cat_score | mean_trend_score | n_cat_species | n_trend_species
### IUCN includes subpops - one sciname corresponds to multiple iucn_sid values.

sum_by_loiczid_file  <- process_means_per_cell(am_cells_spp_sum, iucn_cells_spp_sum, fn_tag = '')
### This returns location of dataframe with variables:
### loiczid | weighted_mean_cat | weighted_mean_trend | n_cat_spp | n_tr_spp
AquaMaps cell summary (first few rows)
loiczid mean_cat_score mean_pop_trend_score n_cat_species n_trend_species source
8205 0 NaN 1 0 aquamaps
8206 0 NaN 1 0 aquamaps
8207 0 NaN 1 0 aquamaps
8209 0 NaN 1 0 aquamaps
8210 0 NaN 1 0 aquamaps
8211 0 NaN 1 0 aquamaps
IUCN cell summary (first few rows)
loiczid mean_cat_score mean_pop_trend_score n_cat_species n_trend_species source
1 0.4 -0.5 1 1 iucn
2 0.4 -0.5 1 1 iucn
3 0.4 -0.5 1 1 iucn
4 0.4 -0.5 1 1 iucn
5 0.4 -0.5 1 1 iucn
6 0.4 -0.5 1 1 iucn

4.5 Summarize status and trend by region

Finally we take the two cell-by-cell summaries and combine, using a species-count weighting to determine the mean category and trend per cell. Cells are aggregated to regions, to calculate an area-weighted regional mean category, trend, and status.

These are then saved to status and trend layer outputs for global (shown in table) as well as 3 nautical mile, Antarctic, and High Seas regions.

The script spp_ico/v2016/layer_prep_spp_global.R performs these tasks.

source(file.path(dir_git, scenario, 'layer_prep_spp_global.R'))

These analyses are repeated for additional scenarios: 3 nautical mile coastal buffer (for resilience calculations), High Seas, and Antarctic.

source(file.path(dir_git, scenario, 'layer_prep_spp_3nm.R'))
source(file.path(dir_git, scenario, 'layer_prep_spp_hs_aq.R'))

4.6 Summarize status and trend by region, excluding BirdLife International data

spp_all_nobirds <- read_csv(file.path(dir_anx, scenario, 'int/spp_all_cleaned.csv')) %>%
  filter(!(spp_group == 'BOTW' & is.na(am_sid))) %>% ### remove any BOTW with no AquaMaps map
  mutate(spatial_source = ifelse(spp_group == 'BOTW', 'am', spatial_source))

am_cells_spp_sum_nobirds <- process_am_summary_per_cell(spp_all_nobirds, fn_tag = 'nobirds', prob_filter = 0, reload = FALSE) %>%
  read_csv(col_types = 'dddddc')
### NOTE: keyed data.table works way faster than the old inner_join or merge.
### loiczid | mean_cat_score | mean_trend_score | n_cat_species | n_trend_species
### AM does not include subspecies or subpops: every am_sid corresponds to exactly one sciname.

iucn_cells_spp_sum_nobirds <- process_iucn_summary_per_cell(spp_all_nobirds, fn_tag = 'nobirds', reload = FALSE) %>%
  read_csv(col_types = 'dddddc')
### loiczid | mean_cat_score | mean_trend_score | n_cat_species | n_trend_species
### IUCN includes subpops - one sciname corresponds to multiple iucn_sid values.

sum_by_loiczid_file_nobirds  <- process_means_per_cell(am_cells_spp_sum_nobirds, iucn_cells_spp_sum_nobirds, fn_tag = 'nobirds')
### This returns location of dataframe with variables:
### loiczid | weighted_mean_cat | weighted_mean_trend | n_cat_spp | n_tr_spp
source(file.path(dir_git, scenario, 'layer_prep_spp_global_nobirds.R'))

library(ggplot2)
nobirds_df <- read_csv(file.path(dir_git, scenario, 'output/spp_status_global.csv')) %>% 
  rename(status = score) %>%
  left_join(read_csv(file.path(dir_git, scenario, 'output/spp_status_global_nobirds.csv'))  %>%
              rename(status_nobirds = score),
            by = 'rgn_id') %>%
  left_join(read_csv(file.path(dir_git, 'v2015', 'data/spp_status_global.csv'))  %>%
              rename(status_2015 = score),
            by = 'rgn_id')
  
scatter_nobirds <- ggplot(nobirds_df, aes(x = status_nobirds, y = status)) +
  geom_point(alpha = .5) +
  geom_point(aes(x = status_nobirds, y = status_2015), color = 'blue', alpha = .5) +
  geom_abline(color = 'red') +
  scale_x_continuous(limits = c(.5, 1)) +
  scale_y_continuous(limits = c(.5, 1)) +
  labs(x = 'Status: v2016 excluding Bird Life data',
       y = 'Status: v2015(blue), v2016 all (black)',
       title = 'SPP Status: excluding birds')
ggsave(file.path(dir_git, scenario, 'Figs/scatterplot_spp_status_global_excl_bli.png'),
       plot = scatter_nobirds)

4.7 Determine species by region

The calc_rgn_spp() function takes in lookup tables of species by cell (for both IUCN and AM), a cell-to-region lookup, and a species info lookup. From this it generates a list of which species occur in which regions, including basic species information.

Species by region - first few rows
iucn_sid am_sid sciname pop_cat pop_trend spatial_source rgn_id rgn_name n_cells presence n_spp_rgn
NA Fis-156671 Abalistes filamentosus NA NA am 210 Japan 194 NA 11503
NA Fis-156671 Abalistes filamentosus NA NA am 20 South Korea 35 NA 5861
NA Fis-156671 Abalistes filamentosus NA NA am 255 DISPUTED 145 NA 18267
NA Fis-156671 Abalistes filamentosus NA NA am 209 China 151 NA 10906
NA Fis-156671 Abalistes filamentosus NA NA am 14 Taiwan 67 NA 10938
NA Fis-156671 Abalistes filamentosus NA NA am 207 Vietnam 113 NA 10348

5 Comparing scenarios and changes in data

The following plots compare the status scores generated for the 2015 assessment to those generated for 2016.

The third examines one possible reason for the large shift in scores between the 2015 scores (d2014 data) and 2016 scores (d2015 data): the addition of birds to the IUCN spatial information. Bird species have a fairly low area-weighted mean risk category, but include a very large number of total cells of coverage. This means these lower-risk species have a very large impact on the final score compared to other species groups.

Finally, this plot compares the 2016 scores as calculated with and without BirdLife International data. AquaMaps bird species were left in. Blue points compare 2015 scores (which did not include BirdLife International data) to the 2016 scores excluding BirdLife International data.