Each of us is responsible for organizing the data we prepare for OHI assessments. This document describes: 1) how to save data obtained from outside sources; 2) a description of how to organize data/scripts in Github; 3) how to deal with intermediate/working files that are too large for Github; and, 4) how to document the gapfilling of missing data.
These data will be saved on NCEAS private server (Mazu).
Every raw data folder should have a README.md (keep the caps so it is consistent and easy to see). *Note we are using .md rather than .txt even for README on Mazu.
Each README should include the following (template):
All of the R scripts and metadata used to prepare the data, as well as the final data layers will be saved on Github (the globalprep folder for OHI global assessments) (https://github.com/OHI-Science/ohiprep/tree/master/globalprep).
The only data that will not be saved on Github are files that are too large or incompatible with Github (see below).
Primary goal/component folder The folder should be named according to the component not the data source. For example the folder for the tourism and recreation goal would be called: globalprep/tr (see table below). These recommendations should be modified as needed, for example goals can be combined in a single folder (e.g., spp_ico) or, there may be several folders for different components of a single goal (e.g. tr_sustainability and tr_tourists).
target | suggested folder name |
---|---|
Artisanal Fishing Opportunity | ao |
Carbon Storage | cs |
Clean Waters | cw |
Coastal Protection | cp |
Coastal Livelihoods | liv |
Coastal Economies | eco |
Fisheries | fis |
Habitats | hab |
Iconic Species | ico |
Lasting Special Places | lsp |
Mariculture | mar |
Natural Products | np |
Species | spp |
Tourism and Recreation | tr |
Pressure | prs_additional_pressure_id |
Resilience | res_additional_resilience_id |
This folder will contain:
a README.md that will link to the appropriate information pages on ohi-science.org The README.md should follow this template.
raw
for ‘raw-ish’ type files that would not be on the server. This is more for piecemeal raw data gathered from many places than a single dataset downloaded or emailed to use. In many cases, this folder will not be used.int
for intermediate files (previously we’ve used tmp, working, or other naming conventions).output
for the final data layer that is used in the OHI toolbox.The final datasets (the ones stored in the output
folder) will be preceeded by the component abbreviation followed by an underscore that provides a brief description of the data, e.g., tr_sustainability.csv).