Introduction


Bioclim is a so-called envelope-style method that uses only occurrence data to define a multi-dimensional environmental space in which a species can occur. This space is constructed as a bounding box around the minimum and maximum values of the environmental variables for all occurrences, resulting in a multi-dimensional rectilinear envelope (Figure 1). To avoid the over-predictive effect of outliers, the resulting envelope can be reduced by specified percentiles or standard deviations.

To predict the probability of species occurrences in any given location, BIOCLIM compares the values of the environmental variables at that location to the percentile distribution of the values from known locations. The 50th percentile is the median, which divides the data in half. Environmental variable and unknown location values closer to the 50th percentile reflect higher location suitability for a species and, thus, a higher probability of occurrence (equals to 1). As the tails of the distribution are not distinguished, the 10th percentile is treated as equal to the 90th percentile and both have the same probability value. The BIOCLIM model combines the scores for each environmental variable into an overall probability of occurrence for each location with equal weights for all environmental variables. The simplicity of the model makes BIOCLIM widely used but acknowledged not to perform as well as other modelling methods.


On EcoCommons, BIOCLIM is implemented using the ‘dismo’ R package. There, predicted values larger than 0.5 are subtracted from 1 to transform upper tail values into the lower tail. Then, the minimum percentile score across all environmental variables is used to obtain the overall score for an unknown location. By using the minimum across all variables, the model predicts that a species' chance of occurring at any grid cell is based on the lowest percentile of any of the environmental variables. The final score is subtracted from 1 and then multiplied by two so that the results are between 0 and 1. The developers of the 'dismo' packages have implemented this scaling to make the results more similar to other species distribution modelling methods and easier to interpret. Values of 1 will rarely be observed, as it would require a location to have the optimal (median) value for all environmental variables. Values of 0 are very common as it is assigned to all cells that have at least one environmental variable with a value outside the percentile distribution.


BIOCLIM was the first species distribution modelling (SDM) package that linked spatially explicit species occurrence data with maps of environmental variables. It was developed in Australia under the leadership of Henry Nix. 

Advantages

  • Simple and intuitive
  • Presence only model, no absence data needed
  • Provides ranking of environmental predictor variables
  • Useful in teaching species distribution modelling


Limitations

  • Susceptible to overprediction
  • Does not account for the interaction between predictors
  • Cannot use categorical variables
  • Does not make quantitative predictions or provide confidence levels


Assumptions

Bioclim was mostly developed to model species distributions in relation to climatic variables and thus assumes that species occurrence is influenced by climate at the scale of climate variables and that these variables are normally distributed.

Requires absence data

No.


Model configuration options 

 

Random seed  

Seed used for generating random values. Using the same seed value, i.e. 123, ensures that running the same model, with the same data and settings generates the same result, despite stochastic processes such as machine learning or cross-validation. (default is not to use a random seed NULL)  

 

 

Bioclim in R


  • Usage
dismo::bioclim(x, p, ...)
  • Arguments
    x
    Raster* object or matrix (including a raster Brick of environmental variables)

    two column matrix or SpatialPoints* object
...

Additional Arguments 

  • Value

An object of class 'Bioclim' (inherits from DistModel-class)

  • Author(s)

Robert J. Hijmans 

Some good examples:-

logo <- stack(system.file("external/rlogo.grd", package="raster"))
#presence data
pts <- matrix(c(48.243420, 48.243420, 47.985820, 52.880230, 49.531423, 46.182616, 54.168232,
69.624263, 83.792291, 85.337894, 74.261072, 83.792291, 95.126713, 84.565092, 66.275456, 41.803408,
25.832176, 3.936132, 18.876962, 17.331359,7.048974, 13.648543, 26.093446, 28.544714, 39.104026,
44.572240, 51.171810, 56.262906, 46.269272, 38.161230, 30.618865, 21.945145, 34.390047, 59.656971,
69.839163, 73.233228, 63.239594, 45.892154, 43.252326, 28.356155) , ncol=2)
bc <- bioclim(logo, pts)

#or
v <- extract(logo, pts)
bc <- bioclim(v)
p1 <- predict(logo, bc)
p2 <- predict(logo, bc, tails=c('both', 'low', 'high'))

#or
#sp <- SpatialPoints(pts)
#bc <- bioclim(logo, pts)

Further Readings 

Dismo Package document 

Why understanding the pioneering and continuing contributions of Bioclim to species distribution modelling is important

 Species distribution modeling with R

References