7 Part 7 (On your own)

7.1 Modeling bird stopover distributions

This last section is to quickly model bird stopover distributions using Boosted Regression Trees. You will use a subset of data from three radars along the mid-Atlantic USA. The data have already been processed to produce the mean Vertically-Integrated Reflectivity (meanVIR) and coefficient of variation of VIR (cvVIR) in units of cm2 per hectare for autumn stopover density of land birds pooled across years from 2008-2014. The data were aggregated to a 1km x 1km resolution grid and subsetted to cells separated by 5km to improve spatial independence of the data. There are several explanatory variables also summarized for the grid cells that include:

`UTMX` - Easting coordinate

`UTMY` - Northing coordinate

`elev` - elevation in meters above sea level

`dcoast` - distance from the nearest coastline in km

`pwood` - proportion of evergreen pine forest cover in grid cell

`emmarsh` - proportion of emergent marsh cover in grid cell

`hwood` - proportion of deciduous hardwood forest cover in grid cell

`urban` - proportion of urban cover in grid cell

`ag` - proportion of agricultural land cover in grid cell

`ndvimean` - mean seasonal Normalized Difference Vegetation Index

`ndvistd` - standard deviation of NDVI

`near_radar` - proximity to the nearest radar in m

Let’s start by getting some libraries and functions set up.

reqPackages <- c('mgcv','fields','dismo','spdep',"RColorBrewer","plotrix")
get.packages <- reqPackages[!(reqPackages %in% installed.packages()[,"Package"])]
if(length(get.packages)>0) install.packages(get.packages,dependencies = TRUE)

library(mgcv)
library(fields)
library(dismo)
library(spdep)
library(RColorBrewer)
library(plotrix)

##Function to flatten a correlation matrix into a data frame
flattenCorrMatrix <- function(cormat) {
  ut <- upper.tri(cormat)
  data.frame(
    row = rownames(cormat)[row(cormat)[ut]],
    column = rownames(cormat)[col(cormat)[ut]],
    cor  =(cormat)[ut]
   
  )
}
##Function for resetting plotting parameters
resetPar <- function() {
  dev.new()
  op <- par(no.readonly = TRUE)
  dev.off()
  op
}

OK. Now let’s load up the data and check for multicollinearity among variables.

setwd(OUT)
data<-read.csv("bird_density_1kmgrid_subset.csv")
##Check for multicollinearity
cormatrix<-cor(data)
b<-flattenCorrMatrix(cormatrix)
b = b[order(b$cor),]
b
##         row     column          cor
## 13     UTMX     dcoast -0.660281926
## 9      UTMX       elev -0.568684073
## 55    urban         ag -0.480984604
## 1   meanVIR      cvVIR -0.471253500
## 65    urban   ndvimean -0.456862403
## 45    hwood      urban -0.454107332
## 76    urban    ndvistd -0.422026577
## 63  emmarsh   ndvimean -0.384281051
## 11  meanVIR     dcoast -0.349726905
## 54    hwood         ag -0.342681560
## 73    pwood    ndvistd -0.313515073
## 27   dcoast    emmarsh -0.268047685
## 7   meanVIR       elev -0.259619483
## 70     UTMY    ndvistd -0.239895868
## 49     UTMY         ag -0.228294034
## 68    cvVIR    ndvistd -0.218279726
## 43    pwood      urban -0.217716027
## 85    pwood near_radar -0.211926385
## 48     UTMX         ag -0.207225835
## 36  emmarsh      hwood -0.197652896
## 26     elev    emmarsh -0.185540343
## 52    pwood         ag -0.178216438
## 58     UTMX   ndvimean -0.162907321
## 69     UTMX    ndvistd -0.157243526
## 57    cvVIR   ndvimean -0.155895351
## 53  emmarsh         ag -0.150611389
## 30    cvVIR      hwood -0.147769079
## 44  emmarsh      urban -0.146133101
## 81     UTMX near_radar -0.138298338
## 37  meanVIR      urban -0.128589504
## 16  meanVIR      pwood -0.118121145
## 14     UTMY     dcoast -0.110878523
## 47    cvVIR         ag -0.098076862
## 20     elev      pwood -0.092663404
## 21   dcoast      pwood -0.072153288
## 28    pwood    emmarsh -0.067489177
## 72   dcoast    ndvistd -0.060408936
## 78 ndvimean    ndvistd -0.060178225
## 59     UTMY   ndvimean -0.057024256
## 3     cvVIR       UTMX -0.053009199
## 31     UTMX      hwood -0.048767322
## 32     UTMY      hwood -0.042097558
## 41     elev      urban -0.039051484
## 25     UTMY    emmarsh -0.036744665
## 89       ag near_radar -0.034941343
## 46  meanVIR         ag -0.020625505
## 4   meanVIR       UTMY -0.014910710
## 86  emmarsh near_radar -0.007550344
## 22  meanVIR    emmarsh -0.003898394
## 19     UTMY      pwood -0.001314135
## 90 ndvimean near_radar  0.005538626
## 42   dcoast      urban  0.025559503
## 23    cvVIR    emmarsh  0.032346303
## 71     elev    ndvistd  0.032594312
## 87    hwood near_radar  0.045453310
## 66       ag   ndvimean  0.052631811
## 91  ndvistd near_radar  0.053568245
## 51   dcoast         ag  0.066641052
## 88    urban near_radar  0.073751109
## 35    pwood      hwood  0.074206857
## 10     UTMY       elev  0.081053876
## 34   dcoast      hwood  0.086336336
## 74  emmarsh    ndvistd  0.086750074
## 79  meanVIR near_radar  0.087946351
## 50     elev         ag  0.094285697
## 39     UTMX      urban  0.095125598
## 33     elev      hwood  0.106928100
## 17    cvVIR      pwood  0.114340312
## 8     cvVIR       elev  0.125258116
## 80    cvVIR near_radar  0.125643627
## 18     UTMX      pwood  0.136885520
## 24     UTMX    emmarsh  0.154395008
## 38    cvVIR      urban  0.155108682
## 56  meanVIR   ndvimean  0.155463442
## 82     UTMY near_radar  0.156599689
## 84   dcoast near_radar  0.157354656
## 5     cvVIR       UTMY  0.163427334
## 75    hwood    ndvistd  0.173566982
## 61   dcoast   ndvimean  0.197846183
## 67  meanVIR    ndvistd  0.228904439
## 60     elev   ndvimean  0.236595646
## 29  meanVIR      hwood  0.239859255
## 12    cvVIR     dcoast  0.258564880
## 2   meanVIR       UTMX  0.258823992
## 40     UTMY      urban  0.267292868
## 62    pwood   ndvimean  0.267336954
## 83     elev near_radar  0.350132218
## 77       ag    ndvistd  0.386065559
## 15     elev     dcoast  0.550849402
## 64    hwood   ndvimean  0.598296970
## 6      UTMX       UTMY  0.614191000

Nothing too strongly correlated (|r|>0.7). Let’s move on.

Let’s plot the data. It’s not too fancy but gives us an impression of the bird distribution patterns. Hotter color = higher bird density.

dataproj<-st_as_sf(data, coords = c("UTMX", "UTMY"))
st_crs(dataproj)<-26918 #UTM 18N
gbmap <- rast(gmap(dataproj, type = "satellite",map_key="AIzaSyDZohZKjKe4J14NrZBxbf58HV097oEzgdE"))
## Warning: [rast] unknown extent
dataproj<-st_transform(dataproj, crs(gbmap))# Google Maps are in Mercator projection. This projects the points to that projection to enable mapping
palette(brewer.pal(n = 9, name = "OrRd"))
plot(gbmap) 
points(dataproj, col = plotrix:::rescale(dataproj$meanVIR, c(1, 9)),pch=20)

Let’s model VIR with a Boosted Regression Tree. We will set the tree complexity to 1 to first examine main effects with no interactions among covariates. Note that no data transformations are performed. This is because assumptions of normality of data are not required.

The boosting process iteratively tunes the models to more closely fit the training data. We will set the bag.fraction to 0.5, which means 50% of the data will be used to train the models while the remainder are used to cross-validate the models. The boosting continues until the point of minimal holdout deviance is reached. Once the deviance starts to creep back up, it means we are starting to overfit the model. We want to aim for around 1000 trees before reaching the stopping point. You can adjust the learning rate to tune how many trees are run.

brt_model  <- gbm.step(data=data, 
                       gbm.x = c("elev","near_radar","pwood","hwood","urban","ndvimean","ndvistd","dcoast","ag","emmarsh"),
                       gbm.y = "meanVIR",
                       family="gaussian",
                       tree.complexity = 5, #set to 1 for no interactions - set >1 for interactions
                       learning.rate = 0.1,
                       bag.fraction = 0.5,n.trees=100)
## Loading required namespace: gbm
## 
##  
##  GBM STEP - version 2.9 
##  
## Performing cross-validation optimisation of a boosted regression tree model 
## for NA and using a family of gaussian 
## Using 3472 observations and 10 predictors 
## creating 10 initial models of 100 trees 
## 
##  folds are unstratified 
## total mean deviance =  0.2019 
## tolerance is fixed at  2e-04 
## ntrees resid. dev. 
## 100    0.0969 
## now adding trees... 
## 200   0.0907 
## 300   0.0886 
## 400   0.0877 
## 500   0.0869 
## 600   0.0869 
## 700   0.0864 
## 800   0.0867 
## 900   0.0868 
## 1000   0.0868 
## 1100   0.0869 
## 1200   0.087 
## 1300   0.0866 
## 1400   0.0866 
## 1500   0.0869 
## 1600   0.0868 
## 1700   0.0869 
## 1800   0.087 
## 1900   0.087 
## 2000   0.0873 
## 2100   0.0873 
## 2200   0.0877
## fitting final gbm model with a fixed number of 700 trees for NA

## 
## mean total deviance = 0.202 
## mean residual deviance = 0.035 
##  
## estimated cv deviance = 0.086 ; se = 0.004 
##  
## training data correlation = 0.913 
## cv correlation =  0.757 ; se = 0.008 
##  
## elapsed time -  0.93 minutes
summary(brt_model)

##                   var   rel.inf
## dcoast         dcoast 21.335562
## near_radar near_radar 17.178529
## elev             elev 13.207224
## hwood           hwood 11.989027
## ndvimean     ndvimean  9.106993
## ndvistd       ndvistd  8.568222
## ag                 ag  8.005161
## urban           urban  5.685826
## emmarsh       emmarsh  2.796909
## pwood           pwood  2.126548

The output shows the covariates ranked by their relative influence in explaning variance of the response variable. The relative influence among covariates will always add up to 100%. Look for the “cv correlation”. This is the cross-validation correlation.

7.1.0.1 Does it appear that the BRT models are informative?

Now let’s produce the partial dependency plots to see the functional responses between bird density and the predictor variables.

gbm.plot(brt_model,plot.layout=c(2,2),common.scale = F)

##If you model tree complexity >1 then look for interactions below
inter<-gbm.interactions(brt_model) #look for interactions among variables
## gbm.interactions - version 2.9
## Cross tabulating interactions for gbm model with 10 predictors
## 1 2 3 4 5 6 7 8 9
inter[1] #let's see the top interactions ranked from strongest to weakest
## $rank.list
##   var1.index var1.names var2.index var2.names int.size
## 1          2 near_radar          1       elev     9.60
## 2          8     dcoast          2 near_radar     3.28
## 3          5      urban          2 near_radar     2.42
## 4          7    ndvistd          2 near_radar     2.03
## 5          8     dcoast          4      hwood     1.87
gbm.perspec(brt_model,2,8,z.range=c(-1,4)) #this function creates a perspective plot to visualize any interactions. Just replace the "2" and "7" with the index number of the two covariates you want to display
## maximum value = 1.46

7.1.0.2 Notice that the variable near_radar is rather influential. In theory, the correction for “range” bias should remove any relationship between VIR and range from the radar. So what do you think is going on here? Hint - think about the width of the beam near the radar and beam filling.

7.1.0.3 Look at the other relationships between mean VIR and predictors. Is there evidence that birds are concentrated near the coastline? Which habitat type seems to have the strongest relationship with bird density? What does this tell you about the types of birds that the radar is detecting? How do you interpret the relationship of bird density with NDVI? Are there any surprising relationships here with any covariates?

7.1.0.4 Now try running the models by increasing the tree complexity to 3 or more. Uncomment the code I’ve included to allow you to determine the strongest interactions. Are there any interesting interactions here? Do the BRT models now explain more of the deviance?

7.1.0.5 Next try including the UTMX and UTMY variables into the model. Do the results change at all?