Range-bagging : EcoCommons Support Portal

Range-bagging is a sophisticated method used to understand where certain species might be able to live based on certain environmental conditions.

Range bagging for species distribution modeling uses presence-only data to estimate environmental limits of a species' niche. In other words this method like other profile methods does not use absence data to compare presence locations with locations where the species is known not to occur, and it does not compare presence locations to random locations, it simply describes the environmental conditions found across all the areas the species is know to occur.

It employs a technique known as bagging (Bootstrap Aggregating), a popular method in machine learning, where multiple models are generated and their outcomes are averaged.

You can think of it like an environmental compatibility test for species. Here's how it works in simpler terms:

First, take data on where we've found a species before, but recognise this data is not perfect because we may not have observed the species in all places it could live.

From this data, we try to understand what sort of environmental conditions the species can tolerate. At Biosecurity Commons we recommend using variables related to rainfall and temperature to capture the abiotic climatic niche for use in a risk map. However, any environmental predictors could be used in these models, just like any SDM.

The combination of all these conditions forms what we call an "environmental niche."

However, figuring out the full environmental niche is not simple. If we imagine the niche as a multi-dimensional shape (each dimension being an environmental variable), it could have a lot of corners and edges - it's like trying to fully understand a complicated 3D puzzle but you can only see one side at a time. That's why we don't look at all the environmental conditions at once. Instead, we look at smaller "subsets" of the conditions, like just temperature and rainfall. For each subset, we imagine the simplest shape that contains all the points we have (a convex hull). This simpler shape is called a "marginal niche."

We then do a little trick called "bagging" to make our model more robust and reliable. We repeatedly create new subsets by randomly selecting some of the data points and some of the environmental conditions. We create a marginal niche for each of these subsets.

For each location (a combination of environmental conditions), we see how many of our marginal niches include that location. This is like voting: each marginal niche casts a "vote" on whether the location is suitable for the species or not.

We then get a "suitability score" for each location. This score is simply the proportion of marginal niches (votes) that include the location. For example, if a location has a suitability score of 0.7, it means that 70% of our marginal niches think that the species can live there.

This approach gives us a more robust picture of the species' potential habitats because it accounts for our imperfect data and uses multiple views (subsets of conditions) to estimate the environmental niche.

While this method assumes that the environmental niches are simple and connected shapes (which might not always be true), it works well as a general approximation. Plus, it can handle large datasets and can be tuned to be more conservative (excluding less certain habitats) if needed. Overall, range-bagging is a powerful tool for understanding species distribution in a complex world.

Advantages

Quick and easy profile method
Likely more robust than other profile methods due to 'bagging'.
Increasing evidence that algorithm performs fairly well

Limitations

Only uses continuous predictor variables

Assumptions

No assumptions are made about the distributions of the environmental variables.

Requires absence data

Configuration options

Biosecurity Commons allows the user to set model arguments as specified below.

random_seed	Seed used for generating random values. Using the same seed value, i.e. 123, ensures that running the same model, with the same data and settings generates the same result, despite stochastic processes such as machine learning or cross-validation.
n_dim
d_max	Number of dimensions (variables) of sampled convex hull models
n_models	Number of convex hull models to build in sampled environment
sample_prop	Proportion of environment data rows sampled for fitting
limit_occur	Logical to indicate whether to limit occurrence data to one per environment data cell

References

Drake, J. M. (2015). Range bagging: a new method for ecological niche modelling from presence-only data. Journal of the Royal Society Interface, 12(107), 20150086.

Dyer, E. E., Franks, V., Cassey, P., Collen, B., Cope, R. C., Jones, K. E., ... & Blackburn, T. M. (2016). A global analysis of the determinants of alien geographical range size in birds. Global Ecology and Biogeography, 25(11), 1346-1355.

Kramer, A. M., Annis, G., Wittmann, M. E., Chadderton, W. L., Rutherford, E. S., Lodge, D. M., ... & Drake, J. M. (2017). Suitability of Laurentian Great Lakes for invasive species based on global species distribution models and local habitat. Ecosphere, 8(7), e01883.

Range-bagging Print

Advantages

Requires absence data

Configuration options

References

Related Articles