The R package EchoNet2Fish estimates fish abundance from acoustic echoes and midwater trawl catch. At its core are functions that explore the data, exploreACMT()
or exploreACMT2()
, and generate the estimates, estimateLake()
. The code is tailored to the format and procedures used by the Great Lakes Acoustic Users Group (Parker-Stetter et al. 2009).
Install the EchoNet2Fish package. If you don’t already have the devtools package installed, you can follow the instructions at Readme instead.
Then load the EchoNet2Fish package.
Before any estimates are made, the acoustic and midwater trawl data must be prepared in the following way.
For each set of data for which you would like to generate estimates (e.g., for one year of data from one lake), you should have a single sub directory (subdir
) containing all of the relevant files. For example, you might have a subdirectories called H13, H14, and M14, containing all of the data for Lake Huron in 2013 and 2014 and Lake Michigan in 2014. Within this subdirectory, you should have two more subdirectories for the acoustic data (one for all the Sv files and one for all the TS files), and you should have all of your midwater trawl files (operations, catches, lengths, and age-length keys). Above this subdirectory should be an overarching directory (refdir
) that contains a reference csv file (refcsv
).
refdir
refcsv
.csvsubdir
(e.g., H13)
svsubdir
tssubdir
optropf
.csvtrcatchf
.csvtrlff
.csvkeyfile1
.csvkeyfile2
.csvsubdir
(e.g., H14)
svsubdir
tssubdir
optropf
.csvtrcatchf
.csvtrlff
.csvkeyfile1
.csvkeyfile2
.csvsubdir
…The reference csv file contains information, primarily the directory and file names for the acoustic and midwater trawl data, for all of the subdirectories (one row for each subdirectory). The file must contain these 10 columns:
subdir
= a subdirectory of refdir
containing all the other subdirectories and files,svsubdir
= the Sv subdirectory,tssubdir
= the TS subdirectory,optropf
= the midwater trawl operations file,trcatchf
= the midwater trawl catch file,trlff
= the midwater trawl lengths file,keysp1
= the species code for keyfile1,keyfile1
= the age-length csv file for specieskeysp1,keysp2
= the species code for keyfile2, andkeyfile2
= the age-length csv file for specieskeysp2.There should also be one or more additional columns for keyvars
, the key variable(s) used to define each unique run of the exploration and estimation process. In the example below, LAKE
and YEAR
are used as the key variables.
LAKE | YEAR | subdir | svsubdir | tssubdir | optropf | trcatchf | trlff | keysp1 | keyfile1 | keysp2 | keyfile2 |
---|---|---|---|---|---|---|---|---|---|---|---|
3 | 2013 | H13 | SV | TS | H.mtr.op13 | H.catch13 | Htr_lf13 | NA | NA | NA | NA |
3 | 2014 | H14 | SV | TS | H.mtr.op14 | H.catch14 | H.tr_lf14 | NA | NA | NA | NA |
2 | 2014 | M14 | SV | TS | MtrawlOp | M.catch | M.tr_lf | 106 | aleagelenkey | NA | NA |
Except where noted, the order of the rows and columns in any of the acoustic or midwater trawl files is not important. Additional columns, i.e., other than those that are required, may be included in the files, but are not necessary.
In both the SV and TS csv files, the first column is automatically assigned the name Dummy_ID. This name write-over is needed to handle occasional problems with byte order marks at the beginning of the csv files. The variable Dummy_ID is not actually used in the exploration or estimation procedures.
The following 8 columns must be included in both the SV and TS csv files:
These columns serve to uniquely identify each row in the files, and are used to combine the information from both types of files.
The following 4 columns must be included in the SV csv files:
Dummy_ID | Region_name | Interval | Layer | Date_M | Sv_mean | Depth_mean | Layer_depth_min | Layer_depth_max | Lat_M | Lon_M | PRC_ABC |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | no6 | 4 | 14 | 20140807 | -102.8 | 135.0 | 130 | 140 | 45.36 | -86.28 | 5.26e-10 |
1 | no6 | 9 | 16 | 20140807 | -78.0 | 154.9 | 150 | 160 | 45.36 | -86.45 | 1.56e-07 |
1 | no2 | 2 | 12 | 20140807 | -96.2 | 115.0 | 110 | 120 | 44.08 | -86.86 | 2.42e-09 |
1 | no4 | 3 | 8 | 20140806 | -95.9 | 75.0 | 70 | 80 | 44.72 | -86.54 | 2.60e-09 |
1 | nn9 | 4 | 3 | 20140819 | -84.3 | 25.0 | 20 | 30 | 45.07 | -86.92 | 3.70e-08 |
Several columns must be included in the TS csv files indicating the number of targets in each target strength bin. Each target count columns is named according to the integer target strength, with an X. prefix, e.g.:
Dummy_ID | Region_name | Interval | Layer | Layer_depth_min | Layer_depth_max | Lat_M | Lon_M | X.36 | X.37 | X.38 | X.64 | X.65 | X.66 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | no3 | 2 | 15 | 140 | 150 | 44.39679 | -87.07275 | 0 | 0 | 0 | 10 | 13 | 22 |
1 | no1 | 6 | 3 | 20 | 30 | 43.74271 | -87.32775 | 0 | 0 | 0 | 2 | 0 | 1 |
1 | nn1 | 4 | 6 | 50 | 60 | 43.78454 | -86.58780 | 4 | 7 | 8 | 0 | 0 | 0 |
1 | no4 | 6 | 5 | 40 | 50 | 44.71496 | -86.42347 | 0 | 0 | 0 | 0 | 0 | 1 |
1 | no6 | 6 | 7 | 60 | 70 | 45.36183 | -86.35453 | 0 | 0 | 0 | 0 | 0 | 0 |
The following 1 column must be included in the midwater trawl operation, catch, and length csv files:
This column serves to uniquely identify each trawl haul in the files.
The following 9 columns must be included in the midwater trawl operation csv files:
Op.Id | Year | Lake | Beg.Depth | End.Depth | Fishing_Depth | Transect | Latitude | Longitude |
---|---|---|---|---|---|---|---|---|
89426 | 2014 | 2 | 23.0 | 27.4 | 1.8 | nn7 | 45.86 | -86.09 |
89421 | 2014 | 2 | 50.0 | 46.0 | 10.0 | sn4 | 42.76 | -86.32 |
89419 | 2014 | 2 | 58.0 | 52.0 | 45.0 | sn2 | 42.11 | -86.69 |
88608 | 2014 | 2 | 29.3 | 24.1 | 6.0 | wn1 | 42.16 | -87.59 |
88617 | 2014 | 2 | 111.9 | 116.1 | 60.0 | so0 | 42.34 | -87.28 |
The following 3 columns must be included in the midwater trawl catch csv files:
Op.Id | Species | Weight | N |
---|---|---|---|
89414 | 204 | 3544.7 | 65 |
89428 | 204 | 604.0 | 16 |
89425 | 109 | 24.8 | 5 |
89421 | 106 | 342.1 | 15 |
89415 | 106 | 4.2 | 5 |
The following 3 columns must be included in the midwater trawl length csv files:
Op.Id | Species | Length | N |
---|---|---|---|
89429 | 106 | 170 | 1 |
89424 | 106 | 161 | 1 |
88649 | 106 | 162 | 3 |
89426 | 106 | 173 | 1 |
89426 | 106 | 149 | 3 |
Note that the number of fish measured (N in the length csv file) need not be the same as the total number captured (N in the catch csv file). All proportions based on size are calculated by first scaling up the measured fish to the total catch, to account for those instances when only a subset of the catch is measured.
No age-length key is required, but you may include age-length keys in separate csv files for up to 2 different species. Each file should be arranged such that rows represent 10-mm length categories and columns represent 1-yr ages. The file must contain one variable called mmgroup giving the midpoint of the length category, e.g., mmgroup 15 represents fish \(\geq\) 10 mm and < 20 mm. For each length category, the proportion of fish in each age is represented by a series of columns which sum to 1 (or 0, if no fish used to derive the key were found that length category). Each proportional column is named according to the integer age, with an Age prefix, e.g.:
mmgroup | Age0 | Age1 | Age2 | Age3 | Age4 | Age5 | Age6 |
---|---|---|---|---|---|---|---|
95 | 1 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0 |
105 | 0 | 1.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0 |
115 | 0 | 0.500 | 0.500 | 0.000 | 0.000 | 0.000 | 0 |
125 | 0 | 0.133 | 0.800 | 0.000 | 0.067 | 0.000 | 0 |
135 | 0 | 0.145 | 0.527 | 0.273 | 0.036 | 0.018 | 0 |
With the data organized as described and the reference file as a guide to the directory and file names, you can now read in the acoustic and midwater trawl data using the readAll()
function.
In the example below, the overarching directory refdir
is C:/Temp (use forward slashes for paths), and the reference file refcsv
is Reference.csv. Only data from the 3rd row of the reference file will be read, specified by keyvals
for the keyvars
, i.e., LAKE==2
and YEAR==2014
. The data are saved to an RData file rdat
named ACMT, and the subdirectory path is returned.
mydir <- readAll(refdir="C:/Temp", keyvals=c(2, 2014),
keyvars=c("LAKE", "YEAR"), rdat="ACMT", refcsv="Reference")
## [1] "C:/Temp/M14/"
Now you’re ready to explore the data with the exploreACMT()
or exploreACMT2()
function. The exploreACMT()
function was designed specifically for use with data from the US Geological Survey - Great Lakes Science Center. The exploreACMT2()
function is a simplified version that does not expect the additional columns the USGS data contains; use this one if in doubt.
The AC
and MT
arguments are used to indicate if you want to explore the AC and MT data. The ageSp
argument gives the species codes for species that you want to apply age-length keys to; in this example ageSp=106
for alewife. The short
argument is used to indicate if the surveyed area is wider (in the east-west direction) than tall (in the north-south direction); in this example short=FALSE
because Lake Michigan is not wider than tall.
When you run this function, a rich text file (rtf) with a *.doc file extension (so that it will be opened with Word by default) is saved to maindir
. It includes a long series of tables and figures summarizing the variables in the acoustic and midwater trawl data files. These are designed to help the investigator look for potential problems in the data.
Once you’ve error checked the data and are confident that they are in good shape, you are ready to generate lake-wide estimates.
Define the strata used to design the survey. These are assumed to be non-overlapping two-dimensional regions that correspond to Region_name
in the SV and TS files and Transect
in the midwater trawl operation file. Also supply the surface areas (in ha) that correspond to each stratum. If you did not use strata in the design of your survey, just use a single region.
Create a data frame of species-specific size information for each species group for which you wish to generate abundance estimates. The data frame has five variables: sp
species code, spname
species name, lcut
length cut off (in mm), lwa
and lwb
parameters of the length-weight relation, \(Wg = lwa*Lmm^{lwb}\), where Wg is the weight (in g) and Lmm is the total length (in mm). The length cut off is used to divide the data for a given species into two groups for abundance estimation, those with fish lengths \(\leq\) lcut
and > lcut
) for estimation. If you don’t wish to divide a species into two length groups, set lcut
to 0 for that species. This will be used as input to the estimateLake()
function.
The fish density in acoustic cell (interval-layer) is apportioned to species using the composition from the nearest midwater trawl within a given slice. Slices are not necessarily the same as strata, which are used in the design of the survey and the estimation of the total population.
The slices can be defined by combinations of fishing depths (fdp
), bottom depths (bdp
), longitudes (lon
), latitudes (lat
), and regions (reg
). The slice definition is described by a list of slices; each slice is a list of defining characteristics; and each characteristic is a named vector giving the range of values. This will be used as input to the sliceCat()
function.
This is easier to explain with some examples. Let’s say you want to define two slices, based on a dividing line at a fishing depth of 20 m. You would write your slice definition as
This names the upper slice “epi” and the lower slice “hypo” (you can name the slices whatever you want, but they must be named). And it defines these slices by the corresponding range of fishing depths (fdp
), from \(\geq\) negative infinity to < 20 for epi and from \(\geq\) 20 to < positive infinity for hypo.
Or, perhaps you want to define four slices, using the same premise as before, but now dividing the epilimnion into three parts according to latitude. You might write your slice definition as
myslice2 <- list(
epi.south = list(fdp=c(-Inf, 20), lat=c(-Inf, 43)),
epi.central = list(fdp=c(-Inf, 20), lat=c( 43, 45)),
epi.north = list(fdp=c(-Inf, 20), lat=c( 45, Inf)),
hypo = list(fdp=c( 20, Inf))
)
Similarly, slices may also be defined by ranges of bottom depths and longitudes. If you wish to define slices by regions that are not simply described by longitudinal or latitudinal breakpoints, for example separating the bays from the main basin, you might write your slice definition as
myslice3 <- list(
epi.bays = list(fdp=c(-Inf, 20), reg=c("Bay A", "Bay B")),
epi.main = list(fdp=c(-Inf, 20), reg="Main"),
hypo = list(fdp=c( 20, Inf))
)
I’ll provide some fake data to demonstrate how these slice definitions compare.
fishingD <- c(13, 10, 17, 15, 18, 22, 21, 25, 24, 26)
latitude <- c(42, 44, 44, 46, 47, 46, 46.1, 66, 43.2, 41)
region <- c("Bay A", "Bay B", "Main")[c(3, 1, 3, 2, 3, 1, 2, 3, 3, 3)]
s1 <- sliceCat(myslice1, fdp=fishingD)
s2 <- sliceCat(myslice2, fdp=fishingD, lat=latitude)
s3 <- sliceCat(myslice3, fdp=fishingD, reg=region)
data.frame(fishingD, latitude, region, s1, s2, s3)
## fishingD latitude region s1 s2 s3
## 1 13 42.0 Main epi epi.south epi.main
## 2 10 44.0 Bay A epi epi.central epi.bays
## 3 17 44.0 Main epi epi.central epi.main
## 4 15 46.0 Bay B epi epi.north epi.bays
## 5 18 47.0 Main epi epi.north epi.main
## 6 22 46.0 Bay A hypo hypo hypo
## 7 21 46.1 Bay B hypo hypo hypo
## 8 25 66.0 Main hypo hypo hypo
## 9 24 43.2 Main hypo hypo hypo
## 10 26 41.0 Main hypo hypo hypo
Now you’re ready to use the estimateLake()
function to generate lake-wide estimates in both number (millions) and biomass (t). The region
and regArea
arguments are used to indicate the regional strata and their corresponding areas (in ha). The TSrange
is the target strength range of interest. The TSthresh
is the minimum threshold for the number of binned targets in a cell (layer by interval) for calculating target strength. The psi
is the transducer-specific two-way equivalent beam angle in steradians. Species of interest are specified by soi
, and each of these species should have size information in the data frame specified by spInfo
. The slice definitions by which the estimates should be summarized are specified by the sliceDef
argument. Finally, the descr
argument is used to add a little descriptive text to the names of the output files.
estimateLake(maindir=mydir, rdat="ACMT", ageSp=106, region=Mreg, regArea=MArea,
TSrange=c(-60, -30), TSthresh=1, psi=0.007997566, soi=c(106, 109, 129),
spInfo=myspInfo, sliceDef=myslice1, short=FALSE, descr="vignette")
When you run this function, a rich text file (rtf) with a *.doc file extension (so that it will be opened with Word by default) is saved to maindir
. The document includes figures showing the location and apportionment of midwater trawl hauls in the slices and spatial maps of density for each species group, as well as tables of the lake-wide estimates in both number (millions) and biomass (t) for each species group and slice.
In addition, six different data frames of estimates are saved as objects in an RData file and are written to csv files:
Lakes
= lake-wide totals (in millions and t) and means (in numbers and g per ha), with a row for each species group and estimate type and columns for estimates, standard errors, and relative standard errors.Regions
= region means (in fish per ha and g per ha), with a row for each region, species group, and estimate type and columns for estimates and corresponding (surface) areas.intmeans_nph
= interval means (in fish per ha), with a row for each region and interval, a column for each species group, and additional columns for region area, and the interval bottom depth, latitude and longitude.intmeans_gph
= interval means (in g per ha), similar to intmeans_nph
.intlaymeans_nph
= interval and layer means (in fish per ha), with a row for each region, interval, and layer, a column for each species group, and many additional columns.intlaymeans_gph
= interval and layer means (in g per ha), similar to intlaymeans_nph
.The rtf, RData, and csv files are all named using the lake, the year, and the descr
text.
Below is a summary of the code used in this vignette, simplified somewhat by relying on default values wherever possible.
library(EchoNet2Fish)
# read in the data
mydir <- readAll(refdir="C:/Temp", keyvals=c(2, 2014))
# explore the data
exploreACMT(maindir=mydir, ageSp=106, short=FALSE)
# define survey strata
Mreg <- c("nn", "sn", "wn", "no", "so")
MArea <- c(10933, 8716, 6010, 12630, 10487)
# define species size info
myspInfo <- data.frame(
sp = c(106, 109, 129),
spname = c("alewife", "rainbow smelt", "threespine stickleback"),
lcut = c(100, 90, 0),
lwa = c(1.41e-05, 4.85e-06, 3.95e-05),
lwb = c(2.87, 3.03, 2.59)
)
# define summary slices
myslice1 <- list(
epi = list(fdp=c(-Inf, 20)),
hypo = list(fdp=c( 20, Inf))
)
# generate estimates
estimateLake(maindir=mydir, ageSp=106, region=Mreg, regArea=MArea,
soi=c(106, 109, 129), spInfo=myspInfo, sliceDef=myslice1, short=FALSE, descr="vignette")
Parker-Stetter, S. L., Rudstam, L. G., Sullivan, P. J., and Warner, D. M. 2009. Standard operating procedures for fisheries acoustic surveys in the Great Lakes. Great Lakes Fish. Comm. Spec. Pub. 09-01. www.glfc.org/pubs/SpecialPubs/Sp09_1.pdf.