This vignette highlights the application of bespoke R functions designed to quantify and visualise model extrapolation in environmental space, with particular relevance to density surface model (DSMs) as implemented in package dsm
(Miller et al. 2015). The underlying theory behind extrapolation detection is covered at length in Mesgaran et al. (2014) and King and Zeng (2007). Illustrative case studies rely on a combined dataset of shipborne and aerial line transect surveys for cetaceans conducted in the Northwestern Atlantic and the Gulf of Mexico, and fully described in Roberts et al. (2016) and Mannocci et al. (2017) (Figure 1). A basic level of familiarity with density surface modelling is assumed - for technical details, see Miller et al. (2013).
Note that this vignette accompanies a technical report on extrapolation in cetacean density surface models. See Bouchet et al. (2019) for more information.
Fig. 1: Visual line transect surveys undertaken in the North Atlantic and Gulf of Mexico. Source: Roberts et al. (2016).
A number of R packages are required in order to run this vignette. The code below both installs any missing packages (if not already present on your system), and loads them (and their dependencies, as required), in addition to setting some general options. Note that the pacman
package must first be installed
#'--------------------------------------------------------------------
# Load (+ install, if needed) all packages
#'--------------------------------------------------------------------
pacman::p_load(tidyverse, # Tidyverse
tools, # Tools in base R
plyr, # The split-apply-combine paradigm for R
raster, # Raster and GIS operations
smoothr, # Smooth and Tidy Spatial Features
utils, # R utility functions
htmltools, # HTML things
leaflet, # Interactive maps
maps, # Basic maps
cowplot, # Streamlined theme + annotations for 'ggplot2'
knitr, # Dynamic report generation
dichromat, # Colour palettes
# http://stat545.com/block018_colors.html
pals, # Colour palettes
WhatIf) # Counterfactuals - Gower's distances
#'--------------------------------------------------------------------
# Set tibble options
#'--------------------------------------------------------------------
options(tibble.width = Inf) # All tibble columns shown
options(pillar.neg = FALSE) # No colouring negative numbers
options(pillar.subtle = TRUE)
options(pillar.sigfig = 4)
#'--------------------------------------------------------------------
# Set knitr options
#'--------------------------------------------------------------------
knitr::opts_chunk$set(echo = TRUE)
All the code needed for the analysis has been pre-packaged in the ExDet-functions.R
R file. It builds upon the functions available in the ecospat
(Broennimann et al. 2016) and WhatIf
(Gandrud et al. 2017) packages to provide user-friendly tools relevant to the analysis of distance sampling data. Specifically, the functions enable an a priori evaluation of environmental extrapolation, when using a DSM model built in a reference system to make predictions in a separate target system (eg. a different geographical area and/or a past/future time period) (Sequeira et al. 2018). A description of each individual function is available in the Appendix.
#'---------------------------------------------
# Import required functions
#'---------------------------------------------
source(file.path("../ExDet_functions.R"))
This section covers all the steps required to undertake an extrapolation assessment, and explains how to:
compute_extrapolation
.summarise_extrapolation
.compare_covariates
.compute_nearby
.map_extrapolation
.analyse_extrapolation
.The data for this example come from two shipboard surveys run by the National Oceanic and Atmospheric Administration (NOAA)’s Northeast and Southeast Fisheries Science Centers. They represent a subset of the survey effort shown in Roberts et al. (2016) and Mannocci et al. (2017), and are stored in an .RData
file called sperm_whale.RData
.
#'---------------------------------------------
# Import the sperm whale data
#'---------------------------------------------
load("/Users/philippebouchet/Dropbox/bouchet-extrapolation/data/Models/AFTT_Sperm_whales/sperm_whale.RData")
We are primarily interested in the following two data.frames
:
segs
holds the segment data, ie. the line transects that have been ‘chopped’ into segments. These are referred to as the reference (or calibration) data/system/conditions (Sequeira et al. 2018).
knitr::kable(head(segs), format = "pandoc")
Sample.Label | CenterTime | SegmentID | Depth | DistToCAS | SST | EKE | NPP | x | y | Effort | Survey |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2004/06/24 07:27:04 | 1 | 118.5027 | 14468.1533 | 15.54390 | 0.0014443 | 1908.129 | 214544.0 | 689074.3 | 10288.91 | en04395 |
2 | 2004/06/24 08:08:04 | 2 | 119.4853 | 10262.9648 | 15.88358 | 0.0014198 | 1889.540 | 222654.3 | 682781.0 | 10288.91 | en04395 |
3 | 2004/06/24 09:03:18 | 3 | 177.2779 | 6900.9829 | 16.21920 | 0.0011705 | 1842.057 | 230279.9 | 675473.3 | 10288.91 | en04395 |
4 | 2004/06/24 09:51:27 | 4 | 527.9562 | 1055.4124 | 16.45468 | 0.0004102 | 1823.942 | 239328.9 | 666646.3 | 10288.91 | en04395 |
5 | 2004/06/24 10:25:39 | 5 | 602.6378 | 1112.6293 | 16.62554 | 0.0002553 | 1721.949 | 246686.5 | 659459.2 | 10288.91 | en04395 |
6 | 2004/06/24 11:00:22 | 6 | 1094.4402 | 707.5795 | 16.83725 | 0.0006556 | 1400.281 | 254307.0 | 652547.2 | 10288.91 | en04395 |
predgrid
holds the coordinates and covariate values for grid cells over which density surface model predictions are ultimately sought (although model fitting is outside the scope of this vignette). These are referred to as the target data/system/conditions (Sequeira et al. 2018).
knitr::kable(head(predgrid), format = "pandoc")
x | y | Depth | SST | NPP | DistToCAS | EKE | off.set | LinkID | |
---|---|---|---|---|---|---|---|---|---|
126 | 547984.6 | 788254 | 153.5983 | 12.0461 | 1462.521 | 11788.974 | 0.0008 | 1e+08 | 1 |
127 | 557984.6 | 788254 | 552.3107 | 12.8138 | 1465.410 | 5697.248 | 0.0010 | 1e+08 | 2 |
258 | 527984.6 | 778254 | 96.8199 | 12.9025 | 1429.432 | 13722.626 | 0.0012 | 1e+08 | 3 |
259 | 537984.6 | 778254 | 138.2376 | 13.2139 | 1424.862 | 9720.671 | 0.0013 | 1e+08 | 4 |
260 | 547984.6 | 778254 | 505.1439 | 13.7566 | 1379.351 | 8018.690 | 0.0027 | 1e+08 | 5 |
261 | 557984.6 | 778254 | 1317.5952 | 14.4252 | 1348.544 | 3775.462 | 0.0046 | 1e+08 | 6 |
Further, we also have a record of individual sperm whale sightings, as well as a spatial representation of survey track lines (Figure 2). These are stored in the data.frame
object obs
and the SpatialLinesDataFrame
object transects
, respectively, and will be useful for plotting.
knitr::kable(head(obs), format = "pandoc")
Survey | SeaState | SightingTime | coords.x1 | coords.x2 | distance | object | Sample.Label | size | detected | observer |
---|---|---|---|---|---|---|---|---|---|---|
en04395 | 3.0 | 2004/06/28 10:22:21 | -65.636 | 39.576 | 246.0173 | 1 | 48 | 2 | 1 | 1 |
en04395 | 2.5 | 2004/06/28 13:18:14 | -65.648 | 39.746 | 1632.3934 | 2 | 50 | 2 | 1 | 1 |
en04395 | 3.0 | 2004/06/28 14:13:34 | -65.692 | 39.843 | 2368.9941 | 3 | 51 | 1 | 1 | 1 |
en04395 | 3.5 | 2004/06/28 15:06:01 | -65.717 | 39.967 | 244.6977 | 4 | 52 | 1 | 1 | 1 |
en04395 | 4.0 | 2004/06/29 10:48:31 | -65.820 | 40.279 | 2081.3468 | 5 | 56 | 1 | 1 | 1 |
en04395 | 2.4 | 2004/06/29 14:35:34 | -65.938 | 40.612 | 1149.2632 | 6 | 59 | 1 | 1 | 1 |
A quick map in base R is useful to get a feel for the distribution of sightings throughout the area.
#'---------------------------------------------
# Create an outline of the study area boundaries
#'---------------------------------------------
study_area <- predgrid[, c("x", "y")]
study_area$value <- 1
study_area <- raster::rasterFromXYZ(study_area, crs = sp::CRS("+proj=aea +lat_1=38 +lat_2=30 +lat_0=34 +lon_0=-73 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0"))
study_area <- raster::rasterToPolygons(study_area, dissolve = TRUE)
study_area <- sp::spTransform(study_area, CRSobj = sp::proj4string(transects))
study_area <- smoothr::smooth(study_area, method = "ksmooth", smoothness = 5)
#'---------------------------------------------
# Produce a simple plot
#'---------------------------------------------
plot(study_area, border = "darkorange") # Region boundary
plot(transects, add = TRUE) # Survey tracks
maps::map("world", fill = TRUE, col = "grey",
xlim = range(obs$coords.x1),
ylim = range(obs$coords.x2), add = TRUE)
pts <- obs # Sightings
coordinates(pts) <- ~ coords.x1 + coords.x2
axis(1)
axis(2)
box()
points(pts, pch = 21, col = "#2082C4")
Fig. 2: Distribution of sperm whale (Physeter macrocephalus) sightings (in blue) within the study area (orange outline) off the US East coast. Surveyed transects are shown as solid lines.
In this particular example, we already know what projected coordinate system is in use (ie. Albers Equal Area, +proj=aea
). The explanatory covariates of interest are: seabed depth (Depth
), sea surface temperature (SST
), primary productivity (NPP
), distance to the nearest canyons and seamounts (DistToCAS
), and eddy kinetic energy (EKE
) – see Roberts et al. (2016) for details.
#'---------------------------------------------
# Define projected coordinate system
#'---------------------------------------------
aftt_crs <- sp::CRS("+proj=aea +lat_1=38 +lat_2=30 +lat_0=34 +lon_0=-73 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0")
#'---------------------------------------------
# Define environmental covariates of interest
#'---------------------------------------------
covariates.spermw <- c("Depth", "SST", "NPP", "DistToCAS", "EKE")
To assess extrapolation in environmental space, we can run the extrapolation detection (ExDet) tool proposed by Mesgaran et al. (2014) using compute_extrapolation
.
Note that this tool was originally implemented in the ecospat
package, although with more limited capabilities.
The function takes the following arguments:
segments
: reference datacovariate.names
: names of covariates of interestprediction.grid
: target datacoordinate.system
: projected coordinate system relevant to study locationprint.summary
: if TRUE
, prints a summary of the results in the R consoleprint.precision
: number of significant figures printed in the summarysave.summary
: if TRUE
, saves summary to the function’s output objectHere, we store the results in an object called spermw.extrapolation
. Values are expressed as both the number (n
) of grid cells subject to extrapolation, and the proportion (%
) of the target system that this represents.
spermw.extrapolation <- compute_extrapolation(segments = segs,
covariate.names = covariates.spermw,
prediction.grid = predgrid,
coordinate.system = aftt_crs,
print.summary = TRUE,
save.summary = FALSE,
print.precision = 2)
##
##
## Table: Extrapolation
##
## -------------- ------------ ------------
## Analogue n = 5,175 97.92 %
## ----------- ----------- -----------
## Univariate n = 65 1.23 %
## Combinatorial n = 45 0.85 %
## ----------- ----------- -----------
## Sub-total n = 110 2.08 %
## ----------- ----------- -----------
## Total n = 5,285 100 %
## -------------- ------------ ------------
##
##
## Table: Most influential covariates (MIC)
##
## -------------- ------------ ------------ ------------
## Univariate EKE n = 24 0.45 %
## Univariate SST n = 17 0.32 %
## Univariate Depth n = 16 0.3 %
## Univariate NPP n = 6 0.11 %
## Univariate DistToCAS n = 2 0.04 %
## ----------- ----------- ----------- -----------
## Sub-total n = 65 1.23 %
## ----------- ----------- ----------- -----------
## Combinatorial NPP n = 39 0.74 %
## Combinatorial Depth n = 6 0.11 %
## ----------- ----------- ----------- -----------
## Sub-total n = 45 0.85 %
## ----------- ----------- ----------- -----------
## Total n = 110 2.08 %
## -------------- ------------ ------------ ------------
In this example, the vast majority of prediction grid cells (n = 5,175 – ie. 97.92%) are characterised by environmental conditions (defined along the axes of the covariates specified above) similar to those captured in the original sample (segs
).
A quick check reveals that those results align with our expectations from the data in predgrid
.
# Number of cells subject to univariate extrapolation (see below for definition)
predgrid %>%
dplyr::filter(!dplyr::between(Depth, min(segs$Depth), max(segs$Depth)) |
!dplyr::between(SST, min(segs$SST), max(segs$SST)) |
!dplyr::between(NPP, min(segs$NPP), max(segs$NPP)) |
!dplyr::between(DistToCAS, min(segs$DistToCAS), max(segs$DistToCAS)) |
!dplyr::between(EKE, min(segs$EKE), max(segs$EKE))) %>%
nrow()
## [1] 65
In practice, compute_extrapolation
returns a list containing both data.frames
(extrapolation values) and rasters
(spatial representation of these values), with the following structure:
tibble::glimpse(spermw.extrapolation)
## List of 2
## $ data :List of 4
## ..$ all :'data.frame': 5285 obs. of 6 variables:
## .. ..$ x : num [1:5285] 547985 557985 527985 537985 547985 ...
## .. ..$ y : num [1:5285] 788254 788254 778254 778254 778254 ...
## .. ..$ ExDet : num [1:5285] -0.1067 -0.0587 -0.0532 -0.0337 0.3445 ...
## .. ..$ mic_univariate : int [1:5285] 2 2 2 2 NA NA NA 2 2 NA ...
## .. ..$ mic_combinatorial: int [1:5285] NA NA NA NA NA NA NA NA NA NA ...
## .. ..$ mic : num [1:5285] 2 2 2 2 0 0 0 2 2 0 ...
## ..$ univariate :'data.frame': 65 obs. of 6 variables:
## .. ..$ x : num [1:65] 547985 557985 527985 537985 507985 ...
## .. ..$ y : num [1:65] 788254 788254 778254 778254 768254 ...
## .. ..$ ExDet : num [1:65] -0.1067 -0.0587 -0.0532 -0.0337 -0.0394 ...
## .. ..$ mic_univariate : int [1:65] 2 2 2 2 2 2 2 2 2 2 ...
## .. ..$ mic_combinatorial: int [1:65] NA NA NA NA NA NA NA NA NA NA ...
## .. ..$ mic : num [1:65] 2 2 2 2 2 2 2 2 2 2 ...
## ..$ combinatorial:'data.frame': 45 obs. of 6 variables:
## .. ..$ x : num [1:45] 557985 567985 577985 587985 607985 ...
## .. ..$ y : num [1:45] 728254 728254 728254 728254 728254 ...
## .. ..$ ExDet : num [1:45] 1.11 1.09 1.45 1.44 1.19 ...
## .. ..$ mic_univariate : int [1:45] NA NA NA NA NA NA NA NA NA NA ...
## .. ..$ mic_combinatorial: int [1:45] 3 3 3 3 3 3 3 3 3 3 ...
## .. ..$ mic : num [1:45] 3 3 3 3 3 3 3 3 3 3 ...
## ..$ analogue :'data.frame': 5175 obs. of 6 variables:
## .. ..$ x : num [1:5175] 547985 557985 567985 527985 537985 ...
## .. ..$ y : num [1:5175] 778254 778254 778254 768254 768254 ...
## .. ..$ ExDet : num [1:5175] 0.344 0.263 0.212 0.346 0.279 ...
## .. ..$ mic_univariate : int [1:5175] NA NA NA NA NA NA NA NA NA NA ...
## .. ..$ mic_combinatorial: int [1:5175] NA NA NA NA NA NA NA NA NA NA ...
## .. ..$ mic : num [1:5175] 0 0 0 0 0 0 0 0 0 0 ...
## $ rasters:List of 2
## ..$ ExDet:List of 4
## .. ..$ all :Formal class 'RasterLayer' [package "raster"] with 12 slots
## .. ..$ univariate :Formal class 'RasterLayer' [package "raster"] with 12 slots
## .. ..$ combinatorial:Formal class 'RasterLayer' [package "raster"] with 12 slots
## .. ..$ analogue :Formal class 'RasterLayer' [package "raster"] with 12 slots
## ..$ mic :List of 4
## .. ..$ all :Formal class 'RasterLayer' [package "raster"] with 12 slots
## .. ..$ univariate :Formal class 'RasterLayer' [package "raster"] with 12 slots
## .. ..$ combinatorial:Formal class 'RasterLayer' [package "raster"] with 12 slots
## .. ..$ analogue :Formal class 'RasterLayer' [package "raster"] with 12 slots
Three types of extrapolation can be identified (Figure 3):
ExDet
values < 0. It is also known as mathematical, strict, or Type 1 extrapolation, and represents conditions outside the range of individual covariates in the reference sample.ExDet
values > 1. It is also known as multivariate or Type 2 extrapolation, and describes novel combinations of values encountered within the univariate range of reference covariates. Such combinations are identified based on the Mahalanobis distance metric (D2), a well-known and scale-invariant measure of multivariate outliers (Rousseeuw and Zomeren 1990).Fig. 3: Schematic representation of extrapolation in multivariate environmental space, based on two hypothetical covariates. Reference data points are represented as grey circles. Shaded areas correspond to different types of extrapolation outside the envelope of the reference data. Univariate extrapolation occurs beyond the range of individual covariates. Combinatorial extrapolation occurs within this range, but outside the reference hyperspace/hypervolume. Source: Bouchet et al. (2019), adapted from Mesgaran et al. (2014).
compute_extrapolation
also determines which covariate makes the largest contribution to extrapolation for any given grid cell, ie. the most influential covariate (mic
). In univariate extrapolation, this is the covariate that leads to the highest negative univariate distance from the initial covariate range. In combinatorial extrapolation, this corresponds to the covariate whose omission (while retaining all others) makes the largest reduction in the Mahalanobis distance to the centroid of the reference data. See Mesgaran et al. (2014) for formulae and technical explanations.
ExDet
values can be retrieved from the returned list, as below:
# Example from combinatorial extrapolation
head(spermw.extrapolation$data$combinatorial)
## x y ExDet mic_univariate mic_combinatorial mic
## 1 557984.6 728254 1.106042 NA 3 3
## 2 567984.6 728254 1.090527 NA 3 3
## 3 577984.6 728254 1.447405 NA 3 3
## 4 587984.6 728254 1.437475 NA 3 3
## 5 607984.6 728254 1.186687 NA 3 3
## 6 527984.6 718254 1.240072 NA 3 3
The summary table generated by compute_extrapolation
is the result of an internal call to the function summarise_extrapolation
. This function can also be run separately on the output of compute_extrapolation
, if needed.
The function takes the following arguments:
extrapolation.object
: output from compute_extrapolationcovariate.names
: names of covariates of interestextrapolation
: if TRUE
, returns a summary of univariate/combinatorial extrapolationmic
: if TRUE
, returns a summary of the most influential covariatesprint.precision
: number of significant figures printed in the summaryNote that values of the summary statistics can be saved to an object in list form (here the summary.values
object).
summary.values <- summarise_extrapolation(extrapolation.object = spermw.extrapolation,
covariate.names = covariates.spermw,
extrapolation = TRUE,
mic = TRUE,
print.precision = 2)
##
##
## Table: Extrapolation
##
## -------------- ------------ ------------
## Analogue n = 5,175 97.92 %
## ----------- ----------- -----------
## Univariate n = 65 1.23 %
## Combinatorial n = 45 0.85 %
## ----------- ----------- -----------
## Sub-total n = 110 2.08 %
## ----------- ----------- -----------
## Total n = 5,285 100 %
## -------------- ------------ ------------
##
##
## Table: Most influential covariates (MIC)
##
## -------------- ------------ ------------ ------------
## Univariate EKE n = 24 0.45 %
## Univariate SST n = 17 0.32 %
## Univariate Depth n = 16 0.3 %
## Univariate NPP n = 6 0.11 %
## Univariate DistToCAS n = 2 0.04 %
## ----------- ----------- ----------- -----------
## Sub-total n = 65 1.23 %
## ----------- ----------- ----------- -----------
## Combinatorial NPP n = 39 0.74 %
## Combinatorial Depth n = 6 0.11 %
## ----------- ----------- ----------- -----------
## Sub-total n = 45 0.85 %
## ----------- ----------- ----------- -----------
## Total n = 110 2.08 %
## -------------- ------------ ------------ ------------
summary.values
## $extrapolation
## $extrapolation$univariate.n
## [1] 65
##
## $extrapolation$univariate.p
## [1] 1.229896
##
## $extrapolation$combinatorial.n
## [1] 45
##
## $extrapolation$combinatorial.p
## [1] 0.8514664
##
## $extrapolation$analogue.n
## [1] 5175
##
## $extrapolation$analogue.p
## [1] 97.91864
##
##
## $mic
## $mic$Depth
## $mic$Depth$.n
## [1] 16
##
## $mic$Depth$.p
## [1] 0.3027436
##
##
## $mic$DistToCAS
## $mic$DistToCAS$.n
## [1] 2
##
## $mic$DistToCAS$.p
## [1] 0.03784295
##
##
## $mic$EKE
## $mic$EKE$.n
## [1] 24
##
## $mic$EKE$.p
## [1] 0.4541154
##
##
## $mic$NPP
## $mic$NPP$.n
## [1] 6
##
## $mic$NPP$.p
## [1] 0.1135289
##
##
## $mic$SST
## $mic$SST$.n
## [1] 17
##
## $mic$SST$.p
## [1] 0.3216651
##
##
## $mic$Depth
## $mic$Depth$.n
## [1] 6
##
## $mic$Depth$.p
## [1] 0.1135289
##
##
## $mic$NPP
## $mic$NPP$.n
## [1] 39
##
## $mic$NPP$.p
## [1] 0.7379376
The extent and magnitude of extrapolation naturally varies with the type and number of covariates considered. It may be useful, therefore, to test different combinations of covariates to inform their selection a priori, ie. before model fitting, thereby supporting model parsimony.
The compare_covariates
function is available for this purpose. It is designed to assess extrapolation iteratively for all combinations of n.covariates
, and return an overview of those that minimise/maximise extrapolation (as measured by the number of grid cells subject to extrapolation in the prediction area).
n.covariates
can be supplied as any vector of integers from 1 to p
, where p
is the total number of covariates available. Its default value of NULL
is equivalent to 1:p
, meaning that all possible combinations are tested, unless otherwise specified.
Additional arguments include:
extrapolation.type
: one of 'univariate'
, 'combinatorial'
or 'both'
segments
: reference datacovariate.names
: names of covariates of interestprediction.grid
: target datacoordinate.system
: projected coordinate system relevant to study locationcreate.plots
: if TRUE
, generates boxplots of resultsdisplay.percent
: if TRUE
, the y-axis of the boxplots is expressed as a percentage of the total number of grid cells in prediction.grid
Below is an example for both types of extrapolation, and the full set of five covariates.
compare_covariates(extrapolation.type = "both",
covariate.names = covariates.spermw,
n.covariates = NULL,
segments = segs,
prediction.grid = predgrid,
coordinate.system = aftt_crs,
create.plots = TRUE,
display.percent = TRUE)
## Computing ...
##
## Creating summaries ...
## Done!
##
##
## Extrapolation Minimum n_min Maximum n_max
## -------------- ----------------- ------ -------------------------------- ------
## Univariate DistToCAS 2 Depth, SST, NPP, DistToCAS, EKE 65
## Combinatorial Depth 0 Depth, SST, NPP 213
## SST 0 - -
## NPP 0 - -
## DistToCAS 0 - -
## EKE 0 - -
## Depth, DistToCAS 0 - -
## Depth, EKE 0 - -
## NPP, DistToCAS 0 - -
## NPP, EKE 0 - -
## Depth, SST, EKE 0 - -
## Depth, NPP, EKE 0 - -
## Both DistToCAS 2 Depth, SST, NPP 252
The top graph summarises the extent of univariate (yellow) and combinatorial (blue) extrapolation for all combinations of \(1, 2, ..., n\) covariates. The total number of combinations is given above each boxplot as \(N_c\). As expected, univariate extrapolation increases with the number of covariates considered.
The bottom graph displays the same results, summarised by covariate of interest. This may help identify any covariates that consistently lead to extrapolation, and should be avoided. Here, none of the covariates stand out as driving extrapolation more than any others, although combinatorial extrapolation is marginally more prevalent with NPP
and SST
.
While extrapolation is often seen as a binary concept (ie. it either does or does not take place), it is reasonable to expect that predictions made at target points situated just outside the sampled environmental space may be more reliable than those made at points far outside it. The ExDet tool available through compute_extrapolation
inherently quantifies this notion of ‘distance’ from the envelope (solid line) of the reference data (grey circles) (Figure 4A).
However, the multivariate distribution of reference data points is often far from homegeneous. It is possible, therefore, for target points representing analogue conditions to fall within sparsely sampled regions of the reference space; or conversely, for two target points reflecting an equal degree of extrapolation to have very different amounts of reference data within their vicinity.
An example of this is shown in Figure 4B, where three target points \(x_1\), \(x_2\) and \(x_3\) are located equally close to the envelope of the reference data. In essence, these reflect identical degrees of extrapolation, as defined under the ExDet framework. However, given the shape of the data cloud in multivariate space, it is apparent that predictions made at target point \(x_1\) will be much better ‘informed’ than those made at \(x_2\) or \(x_3\), as a bigger cluster of reference points lies nearby (green circle around \(x_1\)).
Fig. 4: Conceptual representation of two key extrapolation metrics. (A) Distance from the sampled environmental space. A target point far from the envelope of the reference data (eg. falling in the yellow area) is arguably ‘more of an extrapolation’ than one close to it (eg. falling in the purple area). (B) Neighbourhood (syn. percentage of data nearby). Due to the shape of the reference data cloud, the amount of sample information available to ‘inform’ predictions made at target points can vary considerably. Source: Bouchet et al. (2019).
The notion of neighbourhood (or percentage/proportion of data nearby, hereafter %N) captures this idea, and provides an additional measure of the reliability of extrapolations in multivariate environmental space (Virgili et al. 2017; Mannocci et al. 2018). In practice, %N
for any target point can be defined as the proportion of reference data within a radius of one geometric mean Gower’s distance (G2, calculated between all pairs of reference points) (King and Zeng 2007). The Gower’s distance between two points \(i\) and \(j\) defined along the axes of \(K\) covariates is calculated as the average absolute distance between the values of these two points in each dimension, divided by the range of the data, such that:
\[G_{ij}^2=\frac{1}{K}\sum_{k=1}^{K}\frac{\left|x_{ik}-x_{jk}\right|}{\textrm{max}(X_k)-\textrm{min}(X_k)}\]
The compute_nearby
function is adapted from the code given in Mannocci et al. (2018) and allows the calculation of Gower’s distances (G2) as a basis for defining the neighbourhood.
The neighbourhood
argument to this function corresponds to the radius within which neighbouhood calculations will occur. It has a default value of 1
, as per Mannocci et al. (2018) and Virgili et al. (2017). Increasing its value (eg. to say, 2
) would mean doubling the size of the green circles in Figure 4B, and would lead to larger percentages of data nearby per target point.
Important: The sperm whale dataset used here is sufficiently small that compute_nearby
should run in a matter of seconds on most machines. However, applying the function to larger datasets may burn out memory allocation and lead to an R session failure. Additional arguments can be passed to compute_nearby
to accommodate such ‘big’ data. Their use is unnecessary here, but is illustrated in the subsequent case study on beaked whales (see below).
#'---------------------------------------------
# Calculate Gower's distances and %N
#'---------------------------------------------
spermw.nearby <- compute_nearby(segments = segs,
prediction.grid = predgrid,
coordinate.system = aftt_crs,
covariate.names = covariates.spermw,
neighbourhood = 1)
## Preprocessing data ...
## Calculating distances ....
## Calculating the geometric variance...
## Calculating cumulative frequencies ...
## Finishing up ...
## Done!
Lastly, we can use the map_extrapolation
function to create a series of interactive html maps of extrapolation in the prediction area.
Arguments to this function are identical to those of compute_extrapolation
, with the exception of:
map.type
: output to be shown on the map. One of 'extrapolation'
for a map of extrapolation values, 'mic'
for a map of the most influential covariates, or 'nearby'
for a neighbourhood map (proportion of reference data near each target grid cell, a.k.a %N).extrapolation.values
: output from compute_extrapolation
(only valid for map.type = 'extrapolation'
and map.type = 'mic'
)gower.values
: output from compute_nearby
(only valid for map.type = 'nearby'
)sightings
and tracks
are optional arguments, and only plotted if specified. If either is provided as a matrix, then coordinates must be labelled x
and y
, lest a warning be returned. Tracks can also be given as a SpatialLinesDataFrame
object.
#'---------------------------------------------
# Rename coordinates and convert to SpatialPointsdf
#'---------------------------------------------
obs.sp <- obs %>%
dplyr::rename(., x = coords.x1, y = coords.x2) %>%
sp::SpatialPointsDataFrame(coords = cbind(.$x, .$y), data = ., proj4string = sp::CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0")) %>%
sp::spTransform(., CRSobj = aftt_crs)
The resulting maps can be panned, zoomed, and all layers toggled on and off independently. However, note that maps rely on the leaflet
package (Cheng et al. 2018), and thus require an internet connection - they will not work offline. Facilities to save map tiles locally for offline viewing are currently not implemented but may be rolled into a future version of the code. Finally, note that prediction.grid
is only used here to set the map extent when plotting.
Here’s a map of extrapolation (ExDet
) values:
map_extrapolation(map.type = "extrapolation",
extrapolation.values = spermw.extrapolation,
covariate.names = covariates.spermw,
prediction.grid = predgrid,
coordinate.system = aftt_crs,
sightings = NULL,
tracks = transects)
Most influential covariates (MIC):
map_extrapolation(map.type = "mic",
extrapolation.values = spermw.extrapolation,
covariate.names = covariates.spermw,
prediction.grid = predgrid,
coordinate.system = aftt_crs,
sightings = obs.sp,
tracks = transects)
Proportion of data nearby (%N):
map_extrapolation(map.type = "nearby",
gower.values = spermw.nearby,
covariate.names = covariates.spermw,
prediction.grid = predgrid,
coordinate.system = aftt_crs,
sightings = obs.sp,
tracks = transects)
For simplicity, all the above analyses can be performed in a single step by calling the wrapper function analyse_extrapolation
. Most arguments are derived from the individual functions presented above, and should thus be self-explanatory. Note, however, that specific parts of the analysis can be run/bypassed using the following arguments:
summarise.extrapolation
: if TRUE
, run summarise_extrapolation
compare.covariates
= if TRUE
, run compare_covariates
compute.nearby
= if TRUE
, run compute_nearby
generate.maps
= if TRUE
, run map_extrapolation
#'---------------------------------------------
# One function to rule them all, one funtion to find them,
# One function to bring them all, and in the darkness bind them.
#'---------------------------------------------
spermw.analysis <- analyse_extrapolation(segments = segs,
covariate.names = covariates.spermw,
prediction.grid = predgrid,
coordinate.system = aftt_crs,
summarise.extrapolation = TRUE,
summary.mic = TRUE,
summary.print.precision = 2,
compare.covariates = TRUE,
compare.extrapolation.type = "both",
compare.n.covariates = NULL,
compare.create.plots = TRUE,
compare.display.percent = TRUE,
compute.nearby = TRUE,
nearby.neighbourhood = 1,
generate.maps = TRUE,
sightings = obs.sp,
tracks = transects)
Results are packaged in a list
object with three levels, ie. extrapolation
, nearby
, and maps
, representing the outputs of each step in the analysis.
purrr::map(spermw.analysis, ~class(.))
## $extrapolation
## [1] "list"
##
## $nearby
## [1] "RasterLayer"
## attr(,"package")
## [1] "raster"
##
## $maps
## [1] "list"
This section demonstrates an application of the code to a much larger dataset comprising 100 times as many segments (ie. reference/calibration data points) as in the preceding sperm whale example, and a target system spanning half the of the North Atlantic Ocean.
Specifically, it shows:
compute_nearby
function on ‘big’ dataThe data for this example are fully described in Roberts et al. (2016) and Mannocci et al. (2017), and are very similar in nature to the sperm whale data from the previous example.
#'---------------------------------------------
# Import the beaked whale data
#'---------------------------------------------
load("/Users/philippebouchet/Dropbox/bouchet-extrapolation/data/Models/AFTT_Beaked_whales/segments_ziphiids.RData")
As before, segs.ziphiids
is a data.frame
containing the segment data. Note, however, that coordinates for the segment mid-points are not available.
knitr::kable(head(segs.ziphiids), format = "pandoc")
Depth | DistToCanyonOrSeamount | Chl1 | CurrentSpeed |
---|---|---|---|
3.359937 | 2.449904 | -1.113293 | -0.9454697 |
3.349395 | 3.655052 | -1.075477 | -0.9704471 |
3.263835 | 4.467572 | -1.034892 | -0.9796294 |
3.355105 | 4.710572 | -1.029381 | -0.9759339 |
3.335688 | 4.556134 | -1.049879 | -0.9735497 |
3.246040 | 4.381402 | -1.092257 | -0.9678591 |
The environmental covariates likely to affect beaked whale density and distribution also differ. Two of those covariates are static (ie. Depth
and DistToCanyonOrSeamount
), and two are dynamic (Chl1
and CurrentSpeed
).
#'---------------------------------------------
# Define environmental covariates of interest
#'---------------------------------------------
covariates.ziphiids <- c("Depth", "DistToCanyonOrSeamount", "Chl1", "CurrentSpeed")
Unlike for sperm whales, we do not have a ready-made prediction data.frame
, so we need to build one from individual covariate rasters. For simplicity, we calculate the yearly mean of the twelve monthly rasters available for each dynamic covariate.
#'---------------------------------------------
# Import environmental covariates
#'---------------------------------------------
# Bathymetry
depth <- raster::raster(file.path("/Users/philippebouchet/Dropbox/bouchet-extrapolation", "data", "covariates", "Depth", "NA_Depth_10km_mean_10km.img"))
# Distance to the nearest canyon or seamount
distcanyonseamount <- raster::raster(file.path("/Users/philippebouchet/Dropbox/bouchet-extrapolation", "data", "covariates", "DistToCanyonOrSeamount", "Dist_to_canyons_or_seamounts_10km.img"))
# Chlorophyll-a concentration
chla <- purrr::map(list.files(file.path("/Users/philippebouchet/Dropbox/bouchet-extrapolation", "data", "covariates", "Chl1"),
pattern = "\\.img$"),
~raster::raster(file.path("/Users/philippebouchet/Dropbox/bouchet-extrapolation", "data", "covariates", "Chl1",.))) %>%
raster::stack(.) %>%
mean(.)
# Current speed
currentspeed <- purrr::map(list.files(file.path("/Users/philippebouchet/Dropbox/bouchet-extrapolation", "data", "covariates", "CurrentSpeed"), pattern = "\\.img$"),
~raster::raster(file.path("/Users/philippebouchet/Dropbox/bouchet-extrapolation", "data", "covariates", "CurrentSpeed",.))) %>%
raster::stack(.) %>%
mean(.)
The next step is to clip these rasters to the extent of the study region, namely the US Navy’s Atlantic Fleet Training and Testing area (AFTT).
#'---------------------------------------------
# Load shapefile of study area
#'---------------------------------------------
aftt <- raster::shapefile(file.path("/Users/philippebouchet/Dropbox/bouchet-extrapolation", "data", "models", "aftt.shp"))
aftt_crs <- sp::CRS(sp::proj4string(aftt))
#'---------------------------------------------
# Clip covariate rasters
#'---------------------------------------------
rasters.ziphiids <- purrr::map(.x = list(depth, distcanyonseamount, chla, currentspeed),
.f = ~raster::crop(x = ., y = aftt) %>%
raster::mask(x = ., mask = aftt)) %>%
purrr::set_names(., covariates.ziphiids)
Transforming explanatory covariates (eg. to reduce the influence of outliers, or make skewed distributions more symmetric) is common practice in ecological modelling. We can explore the effects of data transformations on extrapolation metrics by constructing two separate datasets, ie. one transformed, the other not.
Here, the data in segs.ziphiids
have already been transformed, whereas those in rasters.ziphiids
are not. Note that in this case, \(log_{10}\) and square root transformations are applied to Depth
/CurrentSpeed
and to DistToCanyonOrSeamount
, respectively. Remote-sensed chlorophyll-a values are frequently only used, and already provided, in \(log_{10}\) form - these will not be touched.
We use this to create the complementary datasets below:
segments.ziphiids <- predgrid.ziphiids <- list(transformed = NULL, untransformed = NULL)
#'---------------------------------------------
# Back-transform the segment-level covariate values
#'---------------------------------------------
segments.ziphiids$transformed <- segs.ziphiids
segments.ziphiids$untransformed <- segs.ziphiids %>%
dplyr::mutate(Depth = 10^Depth) %>%
dplyr::mutate(DistToCanyonOrSeamount = (DistToCanyonOrSeamount^2)*1000) %>%
dplyr::mutate(CurrentSpeed = 10^CurrentSpeed)
#'---------------------------------------------
# Transform covariate rasters
#'---------------------------------------------
predgrid.ziphiids$untransformed <- rasters.ziphiids
predgrid.ziphiids$transformed <- purrr::map_at(.x = rasters.ziphiids,
.at = c("Depth", "CurrentSpeed"),
.f = ~log10(.x)) %>%
purrr::map_at(.x = .,
.at = "DistToCanyonOrSeamount",
.f = ~sqrt(.x/1000))
The final prediction grids (ie. target data) can now be put together.
#'---------------------------------------------
# Untransformed data
#'---------------------------------------------
predgrid.ziphiids$untransformed <- purrr::map_dfc(.x = predgrid.ziphiids$untransformed,
.f = ~raster::as.data.frame(.)) %>%
cbind(., coordinates(predgrid.ziphiids$untransformed[[1]]))
names(predgrid.ziphiids$untransformed) <- c(covariates.ziphiids, "x", "y")
predgrid.ziphiids$untransformed <- tidyr::drop_na(predgrid.ziphiids$untransformed) # Remove NAs
#'---------------------------------------------
# Transformed data
#'---------------------------------------------
predgrid.ziphiids$transformed <- purrr::map_dfc(.x = predgrid.ziphiids$transformed,
.f = ~raster::as.data.frame(.)) %>%
cbind(., coordinates(predgrid.ziphiids$transformed[[1]]))
names(predgrid.ziphiids$transformed) <- c(covariates.ziphiids, "x", "y")
predgrid.ziphiids$transformed <- tidyr::drop_na(predgrid.ziphiids$transformed) # Removes NAs
ExDet values can be calculated on both the transformed and untransformed data.
ziphiid.extrapolation <- list(transformed = NULL, untransformed = NULL)
ziphiid.extrapolation$untransformed <- compute_extrapolation(segments = segments.ziphiids$untransformed,
covariate.names = covariates.ziphiids, prediction.grid = predgrid.ziphiids$untransformed,
coordinate.system = aftt_crs, print.summary = TRUE, save.summary = FALSE,
print.precision = 2)
##
##
## Table: Extrapolation
##
## -------------- ------------ ------------
## Analogue n = 88,185 79.98 %
## ----------- ----------- -----------
## Univariate n = 22,067 20.01 %
## Combinatorial n = 2 0 %
## ----------- ----------- -----------
## Sub-total n = 22,069 20.02 %
## ----------- ----------- -----------
## Total n = 110,254 100 %
## -------------- ------------ ------------
##
##
## Table: Most influential covariates (MIC)
##
## -------------- ----------------------- ------------ ------------
## Univariate Chl1 n = 21,818 19.79 %
## Univariate CurrentSpeed n = 239 0.22 %
## Univariate DistToCanyonOrSeamount n = 7 0.01 %
## Univariate Depth n = 3 0 %
## ----------- ----------- ----------- -----------
## Sub-total n = 22,067 20.01 %
## ----------- ----------- ----------- -----------
## Combinatorial CurrentSpeed n = 2 0 %
## ----------- ----------- ----------- -----------
## Sub-total n = 2 0 %
## ----------- ----------- ----------- -----------
## Total n = 22,069 20.02 %
## -------------- ----------------------- ------------ ------------
ziphiid.extrapolation$transformed <- compute_extrapolation(segments = segments.ziphiids$transformed,
covariate.names = covariates.ziphiids, prediction.grid = predgrid.ziphiids$transformed,
coordinate.system = aftt_crs, print.summary = TRUE, save.summary = FALSE,
print.precision = 2)
##
##
## Table: Extrapolation
##
## ------------ ------------ ------------
## Analogue n = 88,187 79.99 %
## ----------- ----------- -----------
## Univariate n = 22,067 20.01 %
## ----------- ----------- -----------
## Total n = 110,254 100 %
## ------------ ------------ ------------
##
##
## Table: Most influential covariates (MIC)
##
## ------------ ----------------------- ------------ ------------
## Univariate Chl1 n = 21,818 19.79 %
## Univariate CurrentSpeed n = 239 0.22 %
## Univariate DistToCanyonOrSeamount n = 7 0.01 %
## Univariate Depth n = 3 0 %
## ----------- ----------- ----------- -----------
## Total n = 22,067 20.01 %
## ------------ ----------------------- ------------ ------------
Transformations appear to have a minimal effect on extrapolation results, as the extent of univariate extrapolation is identical amongst datasets. Current speed is the only covariate responsible for combinatorial extrapolation in the untransformed case, albeit to a minute degree.
Running the code on the untransformed data:
compare_covariates(extrapolation.type = "both",
covariate.names = covariates.ziphiids,
n.covariates = NULL,
segments = segments.ziphiids$untransformed,
prediction.grid = predgrid.ziphiids$untransformed,
coordinate.system = aftt_crs,
create.plots = TRUE,
display.percent = TRUE)
## Computing ...
##
## Creating summaries ...
## Done!
##
##
## Extrapolation Minimum n_min Maximum n_max
## -------------- ------------------------------------------- ------ -------------------------------------------------- ------
## Univariate Depth 3 Depth, DistToCanyonOrSeamount, Chl1, CurrentSpeed 22067
## Combinatorial Depth 0 Depth, DistToCanyonOrSeamount 2
## DistToCanyonOrSeamount 0 Depth, DistToCanyonOrSeamount, Chl1 2
## Chl1 0 Depth, DistToCanyonOrSeamount, CurrentSpeed 2
## CurrentSpeed 0 Depth, DistToCanyonOrSeamount, Chl1, CurrentSpeed 2
## Depth, CurrentSpeed 0 - -
## DistToCanyonOrSeamount, Chl1 0 - -
## DistToCanyonOrSeamount, CurrentSpeed 0 - -
## Chl1, CurrentSpeed 0 - -
## DistToCanyonOrSeamount, Chl1, CurrentSpeed 0 - -
## Both Depth 3 Depth, DistToCanyonOrSeamount, Chl1, CurrentSpeed 22069
Running the code on the transformed data:
compare_covariates(extrapolation.type = "both",
covariate.names = covariates.ziphiids,
n.covariates = NULL,
segments = segments.ziphiids$transformed,
prediction.grid = predgrid.ziphiids$transformed,
coordinate.system = aftt_crs,
create.plots = TRUE,
display.percent = TRUE)
## Computing ...
##
## Creating summaries ...
## Done!
##
##
## Extrapolation Minimum n_min Maximum n_max
## -------------- -------------------------------------------------- ------ -------------------------------------------------- ------
## Univariate Depth 3 Depth, DistToCanyonOrSeamount, Chl1, CurrentSpeed 22067
## Combinatorial Depth 0 Depth, DistToCanyonOrSeamount 905
## DistToCanyonOrSeamount 0 - -
## Chl1 0 - -
## CurrentSpeed 0 - -
## DistToCanyonOrSeamount, Chl1 0 - -
## DistToCanyonOrSeamount, CurrentSpeed 0 - -
## Chl1, CurrentSpeed 0 - -
## Depth, Chl1, CurrentSpeed 0 - -
## DistToCanyonOrSeamount, Chl1, CurrentSpeed 0 - -
## Depth, DistToCanyonOrSeamount, Chl1, CurrentSpeed 0 - -
## Both Depth 3 Depth, DistToCanyonOrSeamount, Chl1, CurrentSpeed 22067
As previously stated, the whatif
function from the Whatif
package (Gandrud et al. 2017), which is called internally by compute_nearby
, may not run on very large datasets. While this was not an issue with the sperm whale data in the previous example, the beaked whale dataset is several orders of magnitude larger.
size.df <- tibble(species = c("Sperm whales", "Beaked whales"), size = c(format(prod(nrow(segs),
nrow(predgrid)), nsmall = 0, big.mark = ","), format(prod(nrow(segs.ziphiids),
nrow(predgrid.ziphiids$transformed)), nsmall = 0, big.mark = ",")))
knitr::kable(size.df, format = "pandoc")
species | size |
---|---|
Sperm whales | 5,015,465 |
Beaked whales | 13,781,198,730 |
To circumvent this problem, the whatif.opt
sets the calculations performed by whatif
to run on partitions of the data instead, for greater efficiency. whatif.opt
can be called internally within compute_nearby
by using two additional arguments, namely:
max.size
: Threshold above which partitioning will be triggered.no.partitions
: Number of required partitions.In practice, a run of compute_nearby
begins with a quick assessment of the dimensions of the input data, ie. the reference and target data.frames
. If the product of their dimensions (ie. number of segments multiplied by number of prediction grid cells) exceeds the value set for max.size
, then no.partitions
subsets of the data will be created and the computations run on each using map
functions from the purrr
package (Henry and Wickham 2019). This means that a smaller max.size
will trigger partitioning on correspondingly smaller datasets. By default, max.size
is set to 1e7. This value was chosen arbitrarily, and is sufficiently large as to obviate the need for partitioning on the sperm whale dataset. That said, note that max.size
makes little difference to computation time in the analysis of the sperm whale surveys.
It is quite crucial, however, to use partitioning on the beaked whale data. Given the large size of the dataset, the default value of max.size
is just fine in this instance, but we could lower it if we wanted. No sensitivity analysis has been conducted on no.partitions
, but manual testing suggests that 10
provides a reasonable all-rounder value.
The below code ran in a couple of hours on a recent machine.
Despite trivial differences in the geographical extent of extrapolation across datasets, the maps below illustrate that the magnitude of (univariate) extrapolation is marginally greater after covariate transformation.
map_extrapolation(map.type = "extrapolation",
extrapolation.values = ziphiid.extrapolation$untransformed,
covariate.names = covariates.ziphiids,
prediction.grid = predgrid.ziphiids$untransformed,
coordinate.system = aftt_crs,
base.layer = "gray")
map_extrapolation(map.type = "extrapolation",
extrapolation.values = ziphiid.extrapolation$transformed,
covariate.names = covariates.ziphiids,
prediction.grid = predgrid.ziphiids$transformed,
coordinate.system = aftt_crs,
base.layer = "gray")
Most influential covariates:
map_extrapolation(map.type = "mic",
extrapolation.values = ziphiid.extrapolation$untransformed,
covariate.names = covariates.ziphiids,
prediction.grid = predgrid.ziphiids$untransformed,
coordinate.system = aftt_crs)
map_extrapolation(map.type = "mic",
extrapolation.values = ziphiid.extrapolation$transformed,
covariate.names = covariates.ziphiids,
prediction.grid = predgrid.ziphiids$transformed,
coordinate.system = aftt_crs)
Proportion of data nearby (%N):
map_extrapolation(map.type = "nearby",
gower.values = ziphiids.nearby$untransformed,
covariate.names = covariates.ziphiids,
prediction.grid = predgrid.ziphiids$untransformed,
coordinate.system = aftt_crs)
map_extrapolation(map.type = "nearby",
gower.values = ziphiids.nearby$transformed,
covariate.names = covariates.ziphiids,
prediction.grid = predgrid.ziphiids$transformed,
coordinate.system = aftt_crs)
Difference between the two rasters:
map_extrapolation(map.type = "nearby",
gower.values = ziphiids.nearby$difference,
covariate.names = covariates.ziphiids,
prediction.grid = predgrid.ziphiids$transformed,
coordinate.system = aftt_crs)
breakpoints <- c(-35,-0.5,0.5,10)
plot(ziphiids.nearby$difference, breaks = breakpoints, col = pals::viridis(3))
plot(aftt, add=T)
Patterns in the %N
metric are broadly similar across datasets within much of the Davis Strait (to the northeast) and the Sargasso Sea (to the southeast). However, a dichotomy between extrapolation levels on and the off the continental shelf is apparent, with covariate transformations tempering extrapolation in deeper waters, at the cost of reduced confidence in extrapolations made closer to shore.
This document has outlined an analysis of extrapolation in environmental space using bespoke code adapted from the ecospat
(Broennimann et al. 2016) and WhatIf
(Gandrud et al. 2017) packages. Note that many possible models (and model types) can be fitted to distance sampling or related data, with each potentially behaving differently outside reference conditions for any given level of extrapolation (Yates et al. 2018). Because of this, the tools presented herein are model-agnostic and only designed to provide an a priori assessment. This may, in turn, help inform modelling decisions and model interpretations, but caution should always be exercised when performing model selection, discrimination and criticism.
The code is freely available here.
The ExDet-functions.R
R file contains a number of useful functions for assessing extrapolation in environmental space, namely:
ExDet
: Adaptation of the ecospat.climan
function from ecospat
(formerly ecospat.exdet
in previous releases of the package). Assesses the degree of environmental similarity between a reference and a target system, as per Mesgaran et al. (2014). Adds functionality to identify the most influential covariate(s) - MIC - ie. contributing most to departures from reference conditions.whatif.opt
: Adaptation of the whatif
function from the WhatIf
package, modified to run on large datasets via matrix partitioning. Relies on the purrr
package (Henry and Wickham 2019). See Example 2 (beaked whales) for more details.compute_extrapolation
: Calls ExDet
and return results in data.frame
and raster
formats.summarise_extrapolation
: Summarises extrapolation results in tabular form. Is called internally by compute_extrapolation
by default.compare_covariates
: Wrapper around compute_extrapolation
used to assess extrapolation for different combinations of covariates. Useful for informing covariate selection during model development.compute_nearby
: Wrapper around whatif
and whatif.opt
. Quantifies the proportion of reference points located in the vicinity of each target point in multivariate space, as an additional metric of extrapolation.map_extrapolation
: Creates interactive html maps of the outputs from compute_extrapolation
. Aids in the visual assessment of extrapolation. Relies on the leaflet
package (Cheng et al. 2018).extrapolation_analysis
: Wrapper around compute_extrapolation
, summarise_extrapolation
, and map_extrapolation
. Allows a full assessment of extrapolation (ie. calculations, summary, and visualisation) to be conducted in one function call.Bouchet PJ, Miller DL, Harris CM, Thomas L (2019). From here and now to there and then: Practical recommendations for the extrapolation of cetacean density surface models to novel conditions. CREEM Technical report 2019-01. University of St Andrews, St Andrews. Available at: https://risweb.st-andrews.ac.uk/portal/en/researchoutput/from-here-and-now-to-there-and-then-practical-recommendations-for-extrapolating-cetacean-density-surface-models-to-novel-conditions(d93b3865-dfc9-4b1c-bdf1-193183897df0).html
Broennimann O, Di Cola V, Guisan A (2016). ‘Ecospat: Spatial ecology miscellaneous methods. R package version 2.1.1’. Available at: https://CRAN.R-project.org/package=ecospat
Cheng J, Karambelkar B, Xie Y (2018). ‘Leaflet: Create interactive web maps with the javascript ’leaflet’ library. R package version 2.0.2.’ Available at: https://CRAN.R-project.org/package=leaflet
Gandrud C, King G, Stoll H, Zeng L (2017). ‘WhatIf: Evaluate counterfactuals. R package version 1.5-9.’ Available at: https://CRAN.R-project.org/package=WhatIf
Henry L, Wickham H (2019). ‘Purrr: Functional programming tools. R package version 0.3.0.’ Available at: https://CRAN.R-project.org/package=purrr
King G, Zeng L (2007). When can history be our guide? The pitfalls of counterfactual inference. International Studies Quarterly 51, 183–210. DOI: 10.1111/j.1468-2478.2007.00445.x
Mannocci L, Roberts JJ, Halpin PN, Authier M, Boisseau O, Bradai MN, Canãdas A, Chicote C, David L, Di-Méglio N, Fortuna CM, Frantzis A, Gazo M, Genov T, Hammond PS, Holcer D, Kaschner K, Kerem D, Lauriano G, Lewis T, Notarbartolo Di Sciara G, Panigada S, Raga JA, Scheinin A, Ridoux V, Vella A, Vella J (2018). Assessing cetacean surveys throughout the mediterranean sea: A gap analysis in environmental space. Scientific Reports 8, art3126. DOI: 10.5061/dryad.4pd33
Mannocci L, Roberts JJ, Miller DL, Halpin PN (2017). Extrapolating cetacean densities to quantitatively assess human impacts on populations in the high seas. Conservation Biology 31, 601–614. DOI: 10.1111/cobi.12856
Mesgaran MB, Cousens RD, Webber BL (2014). Here be dragons: A tool for quantifying novelty due to covariate range and correlation change when projecting species distribution models. Diversity and Distributions 20, 1147–1159. DOI: 10.1111/ddi.12209
Miller DL, Burt ML, Rexstad EA, Thomas L (2013). Spatial models for distance sampling data: Recent developments and future directions. Methods in Ecology and Evolution 4, 1001–1010. DOI: 10.1111/2041-210X.12105
Miller DL, Rexstad E, Burt L, Bravington MV, Hedley. S (2015). ‘Dsm: Density surface modelling of distance sampling data. R package version 2.2.9.’ Available at: https://CRAN.R-project.org/package=dsm
Roberts JJ, Best BD, Mannocci L, Fujioka E, Halpin PN, Palka DL, Garrison LP, Mullin KD, Cole TVN, Khan CB, McLellan WA, Pabst DA, Lockhart GG (2016). Habitat-based cetacean density models for the u.s. Atlantic and gulf of mexico. Scientific Reports 6, art22615. DOI: 10.1038/srep22615
Rousseeuw PJ, Zomeren BC van (1990). Unmasking multivariate outliers and leverage points. Journal of the American Statistical Association 85, 633–639. DOI: 10.1080/01621459.1990.10474920
Sequeira A, Bouchet PJ, Yates K, Mengersen K, Caley J (2018). Transferring biodiversity models for conservation: Opportunities and challenges. Methods in Ecology and Evolution 9, 1250–1264. DOI: 10.1111/2041-210X.12998
Virgili A, Racine M, Authier M, Monestiez P, Ridoux V (2017). Comparison of habitat models for scarcely detected species. Ecological Modelling 346, 88–98. DOI: 10.1016/j.ecolmodel.2016.12.013
Yates KL, Bouchet PJ, Caley MJ, Mengersen K, Randin CF, Parnell S, Fielding AH, Bamford AJ, Ban S, Barbosa AM, Dormann CF, Elith J, Embling CB, Ervin GN, Fisher R, Gould S, Graf RF, Gregr EJ, Halpin PN, Heikkinen RK, Heinänen S, Jones AR, Krishnakumar PK, Lauria V, Lozano-Montes H, Mannocci L, Mellin C, Mesgaran MB, Moreno-Amat E, Mormede S, Novaczek E, Oppel S, Ortuño Crespo G, Peterson AT, Rapacciuolo G, Roberts JJ, Ross RE, Scales KL, Schoeman D, Snelgrove P, Sundblad G, Thuiller W, Torres LG, Verbruggen H, Wang L, Wenger S, Whittingham MJ, Zharikov Y, Zurell D, Sequeira AMM (2018). Outstanding challenges in the transferability of ecological models. Trends in Ecology and Evolution 33, 790–802. DOI: 10.1016/j.tree.2018.08.001