Range-bagging is a sophisticated method used to understand where certain species might be able to live based on certain environmental conditions. 


Range bagging for species distribution modeling uses presence-only data to estimate environmental limits of a species' niche. In other words this method like other profile methods does not use absence data to compare presence locations with locations where the species is known not to occur, and it does not compare presence locations to random locations, it simply describes the environmental conditions found across all the areas the species is know to occur. 


It employs a technique known as bagging (Bootstrap Aggregating), a popular method in machine learning, where multiple models are generated and their outcomes are averaged.

You can think of it like an environmental compatibility test for species. Here's how it works in simpler terms:


First, take data on where we've found a species before, but recognise this data is not perfect because we may not have observed the species in all places it could live.


From this data, we try to understand what sort of environmental conditions the species can tolerate. At Biosecurity Commons we recommend using variables related to rainfall and temperature to capture the abiotic climatic niche for use in a risk map. However, any environmental predictors could be used in these models, just like any SDM.


The combination of all these conditions forms what we call an "environmental niche."


However, figuring out the full environmental niche is not simple. If we imagine the niche as a multi-dimensional shape (each dimension being an environmental variable), it could have a lot of corners and edges - it's like trying to fully understand a complicated 3D puzzle but you can only see one side at a time. That's why we don't look at all the environmental conditions at once. Instead, we look at smaller "subsets" of the conditions, like just temperature and rainfall. For each subset, we imagine the simplest shape that contains all the points we have (a convex hull). This simpler shape is called a "marginal niche."


We then do a little trick called "bagging" to make our model more robust and reliable. We repeatedly create new subsets by randomly selecting some of the data points and some of the environmental conditions. We create a marginal niche for each of these subsets.


For each location (a combination of environmental conditions), we see how many of our marginal niches include that location. This is like voting: each marginal niche casts a "vote" on whether the location is suitable for the species or not.


We then get a "suitability score" for each location. This score is simply the proportion of marginal niches (votes) that include the location. For example, if a location has a suitability score of 0.7, it means that 70% of our marginal niches think that the species can live there.


This approach gives us a more robust picture of the species' potential habitats because it accounts for our imperfect data and uses multiple views (subsets of conditions) to estimate the environmental niche.


While this method assumes that the environmental niches are simple and connected shapes (which might not always be true), it works well as a general approximation. Plus, it can handle large datasets and can be tuned to be more conservative (excluding less certain habitats) if needed. Overall, range-bagging is a powerful tool for understanding species distribution in a complex world.


Advantages 

  • Quick and easy profile method
  • Likely more robust than other profile methods due to 'bagging'.
  • Increasing evidence that algorithm performs fairly well

 Limitations 
  • Only uses continuous predictor variables
 

Assumptions 


No assumptions are made about the distributions of the environmental variables. 

  

Requires absence data 

No 

 

Configuration options  

Biosecurity Commons allows the user to set model arguments as specified below. 

random_seed  

Seed used for generating random values. Using the same seed value, i.e. 123, ensures that running the same model, with the same data and settings generates the same result, despite stochastic processes such as machine learning or cross-validation. 

n_dim

d_max

 Number of dimensions (variables) of sampled convex hull models

n_models

Number of convex hull models to build in sampled environment

sample_prop

Proportion of environment data rows sampled for fitting

limit_occur

Logical to indicate whether to limit occurrence data to one per environment data cell


References 


Drake, J. M. (2015). Range bagging: a new method for ecological niche modelling from presence-only data. Journal of the Royal Society Interface, 12(107), 20150086.


Dyer, E. E., Franks, V., Cassey, P., Collen, B., Cope, R. C., Jones, K. E., ... & Blackburn, T. M. (2016). A global analysis of the determinants of alien geographical range size in birds. Global Ecology and Biogeography25(11), 1346-1355.


Kramer, A. M., Annis, G., Wittmann, M. E., Chadderton, W. L., Rutherford, E. S., Lodge, D. M., ... & Drake, J. M. (2017). Suitability of Laurentian Great Lakes for invasive species based on global species distribution models and local habitat. Ecosphere8(7), e01883.