The R package EchoNet2Fish estimates fish abundance from acoustic echoes and midwater trawl catch. At its core are functions that explore the data, exploreACMT() or exploreACMT2(), and generate the estimates, estimateLake(). The code is tailored to the format and procedures used by the Great Lakes Acoustic Users Group (Parker-Stetter et al. 2009).

Install

Install the EchoNet2Fish package. If you don’t already have the devtools package installed, you can follow the instructions at Readme instead.

devtools::install_github("JVAdams/EchoNet2Fish")

Then load the EchoNet2Fish package.

library(EchoNet2Fish)

Prepare the data

Before any estimates are made, the acoustic and midwater trawl data must be prepared in the following way.

File organization

For each set of data for which you would like to generate estimates (e.g., for one year of data from one lake), you should have a single sub directory (subdir) containing all of the relevant files. For example, you might have a subdirectories called H13, H14, and M14, containing all of the data for Lake Huron in 2013 and 2014 and Lake Michigan in 2014. Within this subdirectory, you should have two more subdirectories for the acoustic data (one for all the Sv files and one for all the TS files), and you should have all of your midwater trawl files (operations, catches, lengths, and age-length keys). Above this subdirectory should be an overarching directory (refdir) that contains a reference csv file (refcsv).

refdir
- refcsv.csv
- subdir (e.g., H13)
  - svsubdir
    - x.csv
    - y.csv
    - z.csv
  - tssubdir
    - x.csv
    - y.csv
    - z.csv
  - optropf.csv
  - trcatchf.csv
  - trlff.csv
  - keyfile1.csv
  - keyfile2.csv
- subdir (e.g., H14)
  - svsubdir
    - x.csv
    - y.csv
    - z.csv
  - tssubdir
    - x.csv
    - y.csv
    - z.csv
  - optropf.csv
  - trcatchf.csv
  - trlff.csv
  - keyfile1.csv
  - keyfile2.csv
- subdir …

Reference file

The reference csv file contains information, primarily the directory and file names for the acoustic and midwater trawl data, for all of the subdirectories (one row for each subdirectory). The file must contain these 10 columns:

subdir = a subdirectory of refdir containing all the other subdirectories and files,
svsubdir = the Sv subdirectory,
tssubdir = the TS subdirectory,
optropf = the midwater trawl operations file,
trcatchf = the midwater trawl catch file,
trlff = the midwater trawl lengths file,
keysp1 = the species code for keyfile1,
keyfile1 = the age-length csv file for specieskeysp1,
keysp2 = the species code for keyfile2, and
keyfile2 = the age-length csv file for specieskeysp2.

There should also be one or more additional columns for keyvars, the key variable(s) used to define each unique run of the exploration and estimation process. In the example below, LAKE and YEAR are used as the key variables.

LAKE	YEAR	subdir	svsubdir	tssubdir	optropf	trcatchf	trlff	keysp1	keyfile1	keysp2	keyfile2
3	2013	H13	SV	TS	H.mtr.op13	H.catch13	Htr_lf13	NA	NA	NA	NA
3	2014	H14	SV	TS	H.mtr.op14	H.catch14	H.tr_lf14	NA	NA	NA	NA
2	2014	M14	SV	TS	MtrawlOp	M.catch	M.tr_lf	106	aleagelenkey	NA	NA

Acoustic files

Except where noted, the order of the rows and columns in any of the acoustic or midwater trawl files is not important. Additional columns, i.e., other than those that are required, may be included in the files, but are not necessary.

In both the SV and TS csv files, the first column is automatically assigned the name Dummy_ID. This name write-over is needed to handle occasional problems with byte order marks at the beginning of the csv files. The variable Dummy_ID is not actually used in the exploration or estimation procedures.

The following 8 columns must be included in both the SV and TS csv files:

Dummy_ID = dummy variable not used at all, must be the FIRST COLUMN
Region_name = transect name, composed of region (the first two characters) and transect ID number (the following characters)
Interval = interval ID numbers, identifying acoustic transect segments
Layer = layer ID numbers, identifying 10-m acoustic layers in the water column, numbered from the surface to the bottom
Layer_depth_min = minimum of corresponding depth layer in m
Layer_depth_max = maximum of corresponding depth layer in m
Lat_M = latitude in decimal degrees
Lon_M = longitude in decimal degrees

These columns serve to uniquely identify each row in the files, and are used to combine the information from both types of files.

The following 4 columns must be included in the SV csv files:

Date_M = date, YYYYMMDD
Sv_mean = mean volume backscattering strength in dB
Depth_mean = range to target in m
PRC_ABC = area backscattering coefficient (unitless)

Dummy_ID	Region_name	Interval	Layer	Date_M	Sv_mean	Depth_mean	Layer_depth_min	Layer_depth_max	Lat_M	Lon_M	PRC_ABC
1	no6	4	14	20140807	-102.8	135.0	130	140	45.36	-86.28	5.26e-10
1	no6	9	16	20140807	-78.0	154.9	150	160	45.36	-86.45	1.56e-07
1	no2	2	12	20140807	-96.2	115.0	110	120	44.08	-86.86	2.42e-09
1	no4	3	8	20140806	-95.9	75.0	70	80	44.72	-86.54	2.60e-09
1	nn9	4	3	20140819	-84.3	25.0	20	30	45.07	-86.92	3.70e-08

Several columns must be included in the TS csv files indicating the number of targets in each target strength bin. Each target count columns is named according to the integer target strength, with an X. prefix, e.g.:

X.16
X.17
…
X.75
X.76

Dummy_ID	Region_name	Interval	Layer	Layer_depth_min	Layer_depth_max	Lat_M	Lon_M	X.36	X.37	X.38	X.64	X.65	X.66
1	no3	2	15	140	150	44.39679	-87.07275	0	0	0	10	13	22
1	no1	6	3	20	30	43.74271	-87.32775	0	0	0	2	0	1
1	nn1	4	6	50	60	43.78454	-86.58780	4	7	8	0	0	0
1	no4	6	5	40	50	44.71496	-86.42347	0	0	0	0	0	1
1	no6	6	7	60	70	45.36183	-86.35453	0	0	0	0	0	0

Midwater trawl files

The following 1 column must be included in the midwater trawl operation, catch, and length csv files:

Op.Id = operation ID number

This column serves to uniquely identify each trawl haul in the files.

The following 9 columns must be included in the midwater trawl operation csv files:

Year = year
Lake = lake ID code
Beg.Depth = bottom depth at beginning of trawl haul in m
End.Depth = bottom depth at end of trawl haul in m
Fishing_Depth = fishing depth of trawl haul in m
Transect = transect name, composed of region (the first two characters) and transect ID number (the following characters), the same as Region_name in the SV and TS files
Latitude = latitude in decimal degrees
Longitude = longitude in decimal degrees

Op.Id	Year	Lake	Beg.Depth	End.Depth	Fishing_Depth	Transect	Latitude	Longitude
89426	2014	2	23.0	27.4	1.8	nn7	45.86	-86.09
89421	2014	2	50.0	46.0	10.0	sn4	42.76	-86.32
89419	2014	2	58.0	52.0	45.0	sn2	42.11	-86.69
88608	2014	2	29.3	24.1	6.0	wn1	42.16	-87.59
88617	2014	2	111.9	116.1	60.0	so0	42.34	-87.28

The following 3 columns must be included in the midwater trawl catch csv files:

Species = species code
Weight = aggregate weight in g
N = number of fish weighed

Op.Id	Species	Weight	N
89414	204	3544.7	65
89428	204	604.0	16
89425	109	24.8	5
89421	106	342.1	15
89415	106	4.2	5

The following 3 columns must be included in the midwater trawl length csv files:

Species = species code
Length = length of individual fish in mm
N = number of fish measured

Op.Id	Species	Length	N
89429	106	170	1
89424	106	161	1
88649	106	162	3
89426	106	173	1
89426	106	149	3

Note that the number of fish measured (N in the length csv file) need not be the same as the total number captured (N in the catch csv file). All proportions based on size are calculated by first scaling up the measured fish to the total catch, to account for those instances when only a subset of the catch is measured.

Age-length key files

No age-length key is required, but you may include age-length keys in separate csv files for up to 2 different species. Each file should be arranged such that rows represent 10-mm length categories and columns represent 1-yr ages. The file must contain one variable called mmgroup giving the midpoint of the length category, e.g., mmgroup 15 represents fish \(\geq\) 10 mm and < 20 mm. For each length category, the proportion of fish in each age is represented by a series of columns which sum to 1 (or 0, if no fish used to derive the key were found that length category). Each proportional column is named according to the integer age, with an Age prefix, e.g.:

mmgroup = midpoint of 10-mm length category
Age0 = proportion of all fish in corresponding length category that are age 0
Age1 = proportion … that are age 1
…
Age8 = proportion … that are age 8
Age9 = proportion … that are age 9

mmgroup	Age0	Age1	Age2	Age3	Age4	Age5
95	1	0.000	0.000	0.000	0.000	0.000
105	0	1.000	0.000	0.000	0.000	0.000
115	0	0.500	0.500	0.000	0.000	0.000
125	0	0.133	0.800	0.000	0.067	0.000
135	0	0.145	0.527	0.273	0.036	0.018

Read in the data

With the data organized as described and the reference file as a guide to the directory and file names, you can now read in the acoustic and midwater trawl data using the readAll() function.

In the example below, the overarching directory refdir is C:/Temp (use forward slashes for paths), and the reference file refcsv is Reference.csv. Only data from the 3rd row of the reference file will be read, specified by keyvals for the keyvars, i.e., LAKE==2 and YEAR==2014. The data are saved to an RData file rdat named ACMT, and the subdirectory path is returned.

mydir <- readAll(refdir="C:/Temp", keyvals=c(2, 2014), 
  keyvars=c("LAKE", "YEAR"), rdat="ACMT", refcsv="Reference")

mydir

## [1] "C:/Temp/M14/"

Explore the data

Now you’re ready to explore the data with the exploreACMT() or exploreACMT2() function. The exploreACMT() function was designed specifically for use with data from the US Geological Survey - Great Lakes Science Center. The exploreACMT2() function is a simplified version that does not expect the additional columns the USGS data contains; use this one if in doubt.

The AC and MT arguments are used to indicate if you want to explore the AC and MT data. The ageSp argument gives the species codes for species that you want to apply age-length keys to; in this example ageSp=106 for alewife. The short argument is used to indicate if the surveyed area is wider (in the east-west direction) than tall (in the north-south direction); in this example short=FALSE because Lake Michigan is not wider than tall.

exploreACMT(maindir=mydir, rdat="ACMT", AC=TRUE, MT=TRUE, ageSp=106, short=FALSE)

When you run this function, a rich text file (rtf) with a *.doc file extension (so that it will be opened with Word by default) is saved to maindir. It includes a long series of tables and figures summarizing the variables in the acoustic and midwater trawl data files. These are designed to help the investigator look for potential problems in the data.

Generate estimates

Once you’ve error checked the data and are confident that they are in good shape, you are ready to generate lake-wide estimates.

Regions

Define the strata used to design the survey. These are assumed to be non-overlapping two-dimensional regions that correspond to Region_name in the SV and TS files and Transect in the midwater trawl operation file. Also supply the surface areas (in ha) that correspond to each stratum. If you did not use strata in the design of your survey, just use a single region.

Mreg <- c("nn", "sn", "wn", "no", "so")
MArea <- c(10933, 8716, 6010, 12630, 10487)

Size information

Create a data frame of species-specific size information for each species group for which you wish to generate abundance estimates. The data frame has five variables: sp species code, spname species name, lcut length cut off (in mm), lwa and lwb parameters of the length-weight relation, \(Wg = lwa*Lmm^{lwb}\), where Wg is the weight (in g) and Lmm is the total length (in mm). The length cut off is used to divide the data for a given species into two groups for abundance estimation, those with fish lengths \(\leq\) lcut and > lcut) for estimation. If you don’t wish to divide a species into two length groups, set lcut to 0 for that species. This will be used as input to the estimateLake() function.

myspInfo <- data.frame(
  sp = c(106, 109, 129),
  spname = c("alewife", "rainbow smelt", "threespine stickleback"),
  lcut = c(100, 90, 0),
  lwa = c(1.41e-05, 4.85e-06, 3.95e-05),
  lwb = c(2.87, 3.03, 2.59)
)

Slices

The fish density in acoustic cell (interval-layer) is apportioned to species using the composition from the nearest midwater trawl within a given slice. Slices are not necessarily the same as strata, which are used in the design of the survey and the estimation of the total population.

The slices can be defined by combinations of fishing depths (fdp), bottom depths (bdp), longitudes (lon), latitudes (lat), and regions (reg). The slice definition is described by a list of slices; each slice is a list of defining characteristics; and each characteristic is a named vector giving the range of values. This will be used as input to the sliceCat() function.

This is easier to explain with some examples. Let’s say you want to define two slices, based on a dividing line at a fishing depth of 20 m. You would write your slice definition as

myslice1 <- list(
  epi  = list(fdp=c(-Inf,  20)),
  hypo = list(fdp=c(  20, Inf))
  )

This names the upper slice “epi” and the lower slice “hypo” (you can name the slices whatever you want, but they must be named). And it defines these slices by the corresponding range of fishing depths (fdp), from \(\geq\) negative infinity to < 20 for epi and from \(\geq\) 20 to < positive infinity for hypo.

Or, perhaps you want to define four slices, using the same premise as before, but now dividing the epilimnion into three parts according to latitude. You might write your slice definition as

myslice2 <- list(
  epi.south   = list(fdp=c(-Inf,  20), lat=c(-Inf,  43)),
  epi.central = list(fdp=c(-Inf,  20), lat=c(  43,  45)),
  epi.north   = list(fdp=c(-Inf,  20), lat=c(  45, Inf)),
  hypo        = list(fdp=c(  20, Inf))
  )

Similarly, slices may also be defined by ranges of bottom depths and longitudes. If you wish to define slices by regions that are not simply described by longitudinal or latitudinal breakpoints, for example separating the bays from the main basin, you might write your slice definition as

myslice3 <- list(
  epi.bays = list(fdp=c(-Inf,  20), reg=c("Bay A", "Bay B")),
  epi.main = list(fdp=c(-Inf,  20), reg="Main"),
  hypo     = list(fdp=c(  20, Inf))
  )

I’ll provide some fake data to demonstrate how these slice definitions compare.

fishingD <- c(13, 10, 17, 15, 18, 22, 21, 25, 24, 26)
latitude <- c(42, 44, 44, 46, 47, 46, 46.1, 66, 43.2, 41)
region <- c("Bay A", "Bay B", "Main")[c(3, 1, 3, 2, 3, 1, 2, 3, 3, 3)]
s1 <- sliceCat(myslice1, fdp=fishingD)
s2 <- sliceCat(myslice2, fdp=fishingD, lat=latitude)
s3 <- sliceCat(myslice3, fdp=fishingD, reg=region)
data.frame(fishingD, latitude, region, s1, s2, s3)

##    fishingD latitude region   s1          s2       s3
## 1        13     42.0   Main  epi   epi.south epi.main
## 2        10     44.0  Bay A  epi epi.central epi.bays
## 3        17     44.0   Main  epi epi.central epi.main
## 4        15     46.0  Bay B  epi   epi.north epi.bays
## 5        18     47.0   Main  epi   epi.north epi.main
## 6        22     46.0  Bay A hypo        hypo     hypo
## 7        21     46.1  Bay B hypo        hypo     hypo
## 8        25     66.0   Main hypo        hypo     hypo
## 9        24     43.2   Main hypo        hypo     hypo
## 10       26     41.0   Main hypo        hypo     hypo

Estimates

Now you’re ready to use the estimateLake() function to generate lake-wide estimates in both number (millions) and biomass (t). The region and regArea arguments are used to indicate the regional strata and their corresponding areas (in ha). The TSrange is the target strength range of interest. The TSthresh is the minimum threshold for the number of binned targets in a cell (layer by interval) for calculating target strength. The psi is the transducer-specific two-way equivalent beam angle in steradians. Species of interest are specified by soi, and each of these species should have size information in the data frame specified by spInfo. The slice definitions by which the estimates should be summarized are specified by the sliceDef argument. Finally, the descr argument is used to add a little descriptive text to the names of the output files.

estimateLake(maindir=mydir, rdat="ACMT", ageSp=106, region=Mreg, regArea=MArea,
  TSrange=c(-60, -30), TSthresh=1, psi=0.007997566, soi=c(106, 109, 129),
  spInfo=myspInfo, sliceDef=myslice1, short=FALSE, descr="vignette")

When you run this function, a rich text file (rtf) with a *.doc file extension (so that it will be opened with Word by default) is saved to maindir. The document includes figures showing the location and apportionment of midwater trawl hauls in the slices and spatial maps of density for each species group, as well as tables of the lake-wide estimates in both number (millions) and biomass (t) for each species group and slice.

In addition, six different data frames of estimates are saved as objects in an RData file and are written to csv files:

Lakes = lake-wide totals (in millions and t) and means (in numbers and g per ha), with a row for each species group and estimate type and columns for estimates, standard errors, and relative standard errors.
Regions = region means (in fish per ha and g per ha), with a row for each region, species group, and estimate type and columns for estimates and corresponding (surface) areas.
intmeans_nph = interval means (in fish per ha), with a row for each region and interval, a column for each species group, and additional columns for region area, and the interval bottom depth, latitude and longitude.
intmeans_gph = interval means (in g per ha), similar to intmeans_nph.
intlaymeans_nph = interval and layer means (in fish per ha), with a row for each region, interval, and layer, a column for each species group, and many additional columns.
intlaymeans_gph = interval and layer means (in g per ha), similar to intlaymeans_nph.

The rtf, RData, and csv files are all named using the lake, the year, and the descr text.

Introduction to EchoNet2Fish

Jean V. Adams

2021-02-08