Tips for developing best practice Species Distribution Models (SDM) on EcoCommons : EcoCommons Support Portal

By following these best practices, EcoCommons users can build reliable, efficient, and insightful Species Distribution Models for a variety of ecological applications.

1. Data Requirements

- Occurrence Records: ideally 20–10,000 records

Examples:

Machine learning algorithms like Artificial Neural Networks (ANN) generally require larger datasets.

MaxEnt is an exception which can perform with as few as 7 occurrence records, making it suitable for species with limited data.

Statistical models, for example Generalised Linear Models (GLMs) require a moderate number of occurrence records.

- Predictor variables: environment, climate, demographic or use.

Choose predictor variables based on species knowledge, i.e. what climate or environmental conditions are important for the target species.

Use a maximum of 10 variables in your model and try a few iterations with different variables to develop the most parsimonious model. Note that the more variables you use, the longer the model will take to run.

Check the variable importance plots, correlation plots and response curves to determine if the predictor variable is suitable for your model. If not, then remove the variable and run the model again.

- Spatial Resolution:

A smaller resolution i.e. 1km will produce a finer scale model.
It is recommended to scale all predictor variables to the coarsest resolution, but they can be scaled to the finest resolution, if you are familiar with the spatial layers.
Use coarser resolutions with fewer variables when testing or scoping environmental predictors.
Once finalised, models can be run at finer resolutions; however, it's advised to run these during off-peak hours to avoid processing delays.

2. Model Settings

- Default Settings:

EcoCommons provides peer-reviewed default settings endorsed by experts in our Scientific Advisory Committee. These settings serve as optimal settings for most species.

Users can fine-tune model parameters to meet specific requirements when necessary.

3. Platform Constraints

- Algorithm Allocation:

One algorithm corresponds to one model.

The platform supports up to 80 models per session, ensuring efficient resource allocation.

4. Step-by-Step Workflow

- Explore the Data:

Assess the quality and distribution of occurrence records.

Identify potential outliers or inconsistencies.

If importing from ALA or other third parties, use the available filters to clean the data.

- Explore the Species:

Understand the ecological context and environmental requirements of the species.

- Explore Model Functions:

Familiarise yourself with EcoCommons' available algorithms and modelling options. For instance, Random forest package can not process categorical variables more than 53 etc.

- Select and Test Algorithms:

Start with one dataset and one algorithm to evaluate performance.

Gradually incorporate additional algorithms or datasets based on results.

- Get feedback:

Share the experiments with your colleagues or supervisor to get feedback on your models.

5. Considerations and Limitations

- Data Limitations:

Recognise and acknowledge gaps in occurrence data, i.e. do I have enough coverage of the target species and are there records in places the species is known to NOT occur.

Be cautious when interpreting models based on sparse datasets, as they may over or underestimate the likely distribution of suitable habitat.

- Compute Limitations:

Computing resources are limited and shared with our research community, please be mindful when running models.
Plan and allow extra time to run your models, as they may get queued during peak usage times. Schedule intensive tasks during off-peak hours.
Use simplified models during testing to reduce resource usage.

Thank you :)

Tips for developing best practice Species Distribution Models (SDM) on EcoCommons Print

Related Articles