8 Practical steps to estimating Establishment likelihoods of Pest species on Biosecurity Commons
: In the workflows available on Biosecurity Commons when you run a final model or click the run button for any of the steps, you have the option to rerun each step. If you do rerun a step, the previous run will be overwritten. You can save any step by using the “Export Result” button. Exporting a Result will make it available under 'My Results' where it will be permanently available for future use. It will also be retained in this Project view under 'My Exported Results'
Step 1 – Review the literature
A literature review might reveal:
An existing risk map or establishment likelihood map that could be updated or improved upon.
Pest biology - physiological tolerances indicating climatic conditions where the species would not survive.
Susceptible hosts - understanding the hosts the pest may require, where those hosts are distributed, and the potential for the pest to use uniquely Australian hosts.
Risk pathways: where are viable pests likely to come into Australia.
Existing distribution models of the pest and/or data useful for predicting distributions.
Clean, ready to use occurrence data.
Existing data and worked examples on Biosecurity Commons.
Step 2 – collate and clean data
Biosecurity Commons will have a growing set of data that users will be able to access, but those data will often need to be supplemented with additional data not available on the platform and most data will likely need to be cleaned. Data will include rasters or grids which will in some cases need to be generated from other spatial data ( rasterise). The transformation, addition and multiplication of those rasters will be discussed in later sections.
At Biosecurity Commons we strongly believe that the best data will be uploaded by you the users. Over time, at Biosecurity Commons we expect the available data that can inform establishment likelihoods will grow and as it is shared it will be improved. Much of the data available on the platform have not been fully validated with independent data or had underlying assumptions fully tested. Over time if you upload some data, other users may improve those data.
The most common point data you are likely to use are from occurrence data. Large databases like GBIF provide excellent sources of occurrence data for developing abiotic distribution models. However, these data often have a wide range of records that would be inappropriate for any analysis. Cleaning such data requires the removal of records with taxonomic errors, temporal errors, irrelevant records, and records from non-established populations. If bringing in coordinate data from multiple sources, you may also need to transform the coordinate system so all latitude and longitude values use the same system (see step 3 for further explanation). These data cleaning steps are critical but sometimes time-consuming.
Currently, you will often need to download the data, clean it and re-upload it.
Further, it may be best if the occurrence data used for the abiotic species distribution model are restricted to the species' native range, an area where the species is more likely to have occupied all the suitable areas. However, if species records outside the native range are found in areas with different climatic conditions, excluding these records could inappropriately remove areas from the establishment likelihood map that are areas at risk.
Step 3 - select the extent (Study region)
The next step involves selecting your study region. This region will be represented by a grid that has three characteristics that will be used for all other rasters used in workflows or when displaying the results. These three characteristics include the extent (inclusive of the boundaries of the study area), resolution (single size of all grid cells in the study area), and coordinate reference system (CRS, the coordinate system and spatial projection used to turn the round earth into a flat map).
For the experiment to run, all the other grids used in the experiment will need to match the study region grid you decide to use. To make sure other grids match the study region, the “Conform Layer” ee below).
The default raster includes a 1km2 resolution raster with an extent inclusive of the entire of Australia, in a EPSG:3577) Projection. This is the most common projected coordinate system used in Australia.
There are many other options besides the 1km2 raster of Australia to select your study region:
Choose a raster from your other results
Explore ‘my’ datasets you have uploaded or imported previously
Explore thousands of curated datasets
Import / upload data
First steps when starting a project in the Risk Mapping wizard
First, start with a “Label” and description of the project
Next, “add” the grid which defines your “Study Region”
Step 4: Abiotic Pest Suitability
Results of abiotic suitability should be scaled from zero to one, or as here simply a binary result of zero or one.
Again, if the pest is thought equally likely to survive in any of Australia’s climatic conditions, this step could be skipped.
If climatic tolerances are well understood for the pest species, possibly discovered during the literature review, a simple reclassification of a temperature grid, or rainfall grid might be best. For example, if the review indicates the pest only survives in areas with a minimum temperature of 10 C°, or a maximum temperature of 35 C °. Currently, this is not available on the platform, but it could be done in R, python, or GIS software. Reclassification would simply reclassify all areas with temperatures above or below these thresholds to zero, and any temperatures between those thresholds as 1. This kind of layer would mask out all the zero cells from the final establishment likelihood.
If climatic tolerances are less binary, and species are expected to be more likely to survive or thrive as ranges of bioclimatic variables change, then one of the environmental profile models might be best. Range-bagging, surface range envelopes or models all tend to result in conservative estimates of where the pest might occur, often resulting in more geographic space included as suitable in the resulting geographic predictions.
If pest species are expected to be more likely to survive or thrive as conditions approach a climatic optimum, then more complex or correlative species distribution models may be best. However, such models tend to be less conservative in that they tend to predict a smaller geographic area as being potentially suitable. These models are often trained in the areas of the globe where the pest species is native. It is important to realise that the climatic conditions where a species is found on one continent, may not correspond to where they occur in Australia because pest occurrences in its native range were correlated to climatic conditions, but not actually limited by those conditions [literature review may help determine if climatic conditions are limiting]. Some species will show more tolerance to broader climatic conditions than others. Understanding the biology, physiology, and ecology of potentially invasive species will be critical to appropriately setting model thresholds, arguments and interpreting model results. “Accurate prediction to the region in which a model was fitted does not guarantee accurate prediction outside this range (Elith, 2017; Fourcade et al., 2017).” – ( et al., 2020)
When developing these models there are a variety of decisions to be made. First, decisions around which occurrence points to include when training the model can have a large impact on results. Second, the selection of background points (Maxent) or pseudo-absence points (most other statistical or machine learning models) can have a massive impact on the results (Phillips et al., 2009; Warton & Shepherd, 2010; Syfert et al., 2013). These background or pseudo-absence points need to be selected in places where the species could have dispersed and in some models, it helps if they exhibit the same survey bias as was present in the occurrence data ( et al., 2020). Selection of points where similar species were observed but the target species was not, or selecting points randomly from a bias layer are two ways to generate such data (see SDMs in R; Step 1 module).
Thresholding resulting probabilities from these kinds of models to generate binary outputs is not recommended as it degrades the resolution of information available to decision-makers.
A variety of species distribution models could be run on including:
Environmental Envelop Models (Most commonly used in risk mapping)
Range Bagging (Coming soon)
Machine Learning Models
Maxent ( used, and good predictive performance despite not requiring absence data)
Within the platform you can add an abiotic suitability layer in a variety of ways. First, you can calculate an abiotic species distribution model using the “create an abiotic SDM” tool within the project, found if you click on the “Add” button in the input parameters tab of the Abiotic Suitability section.
You can also select previous workflow results by clicking on “Choose from My Results explore datasets that you have uploaded / imported by clicking on “Explore My Datasets”. You can also explore the thousands of curated datasets by clicking on “Explore Curated Datasets” or click on the “Import / upload data” to add your own data.
If needed you can also perform a variety of functions on the data you input “conform layers” to transform the grid to the same resolution, extent, and CRS as your Study Region gird. You can also combine multiple layers, transform vector files into a grid matching the study region raster (“distribute features”), aggregate categories, or apply a distance weight layer. ( below for more details.
Example of steps to generate an Abiotic Suitability Layer
First click on Abiotic Suitability and then click on the Add button from the Input Parameters tab.
Next there are a variety of ways to add data, but we are going to create a SDM model for Cane Toads, using occurrence data from
Then launch the SDM experiment.
Then select “Upload my own data” to add cleaned occurrence data downloaded from GBIF, but which only includes records from South and Central America, and only from the last ten years.
Select “Species Occurrence” as the type of data we are uploading.
Select occurrence data file.
Fill out fields to identify the uploaded file.
Once the data is uploaded, select the “Species Distribution Modelling Experiment” on the upper right, and add a title and description of your experiment.
Select the occurrence data you uploaded for the experiment.
Click on the eye symbol to visualise the records, to double-check the point locations look reasonable.
Skip the absences tab, we will be developing a Surface Range Envelope model which requires only presence data. Then select some global climate data.
Then select some layers within the global dataset, here we select average temperature, and minimum monthly temperature as we know toads do not survive in areas that get too cold.
We leave the extent of the model at its default “convex hull” indicating the model will be trained using all the South American data we have.
We then select SRE on the algorithms tab, and leave model arguments at default levels, except we change the quantile to 0.001 so nearly all environmental data are included in the final envelope.
Then click start experiment, you will be taken to another page to view job status. When completed, return to the risk mapping modelling wizard, click “My Projects” tab, and click on the Projection to current climate, which showed up under Abiotic Suitability and view the global predicted distribution of Cane Toad, based on data from South America, and inclusive of ocean environments.
Then click on Abiotic Suitability and when you click on the ADD button select “Projection to current climate” from the dropdown menu. Then click on the “Run (Abiotic Suitability)” button on the lower left to generate your final abiotic suitability result, which is simply your unconstrained suitability map, ‘conformed’ to match your Study Region grid.
The screen will shift to the result tab, where the job will continue in progress. When done the result for your Abiotic Suitability, in our case a red 1, for suitable areas, and a blue 0 for unsuitable areas. Note this map does capture all the areas the Cane Toad has invaded. It does admittedly extend further to the south and west than the current Cand Toad distribution, but the model was trained with South American data, with no real thought to the modelling approach which took 10 minutes to run.
Step 5: Biotic Pest Suitability
There are many ways to generate a biotic pest suitability map ( et al. 2020). A simple example would be to consider areas classified as agricultural in the ALUM land use layers as potential areas for a widespread agricultural pest, so scoring every cell will agriculture happening within the cell as 1, the rest zero. You may want to weight areas by how much agricultural area is in each cell. If your pest used orchards as hosts you may use ALUM to identify where orchards aggregate a finer scale NDVI layer to give a total sum of vegetation greenness within that cell with could be used to weight each cell with orchards. You may also include urban areas to account for citrus trees growing in the odd backyard. If trying to estimate the potential distribution of a fungus like Myrtle Rust, you might develop a conservative species distribution model using occurrence points from selected species in the family. For a livestock disease you may aim to get a map of where livestock are try to estimate abundance within each grid cell. Often the final layer may include combining multiple layers.
There is an example of a biotic suitability map for citrus hosts which was generated from citrus orchard mapping, but also used the ALUM, NVIS and NDVI layers.
Biotic suitability results should be scaled from zero to one.
We will be adding some more example biotic suitability layers, but the biggest improvements in available layers will come from you the experts in biosecurity.
Here you can also, select previous workflow results to use here or explore datasets that have been uploaded or imported by others. You can also explore the thousands of curated datasets or import / upload your own data.
Step 6: Pest Suitability
Pest suitability default option is to multiply (intersect) the Abiotic Suitability by the Biotic Suitability. Probabilities should be multiplied if the two probabilities are independent, in other words the probability of Abiotic Suitability (climate) is not related to the probability of Biotic Suitability (in our case wetlands). However, it is also possible to add layers together instead of multiplying them. You would only add layers if the probability at a location was based on either Abiotic Suitability or Biotic Suitability, but not both. For example, if in the areas Abiotic Suitability was zero the Biotic Suitability was greater than zero and vice versa. A union of probabilities adds the two probabilities, but then subtracts the locations where the probabilities intersect which is appropriate if the two probabilities are not independent:
- (1 – Abiotic Suitability (1 – Biotic Suitability probability
Step 7: Pest Arrivals
A variety of grids have been generated based on the examples given by ( et al. 2020) of the proportion of units (tourists, mail parcels, containers etc) that might reach any grid cell. For the number of tourists entering the country, the total number of tourists are divided among the airports within Australia based on the percentage of international arrivals at each airport. Then data indicates that tourists stay relatively close to the airport or tourist accommodation, and a distance function is applied to indicate that very few tourists go far from the airport or tourist accommodation, with the number dropping off as you go further away. For each pathway, and each priority pest the number of units in any grid cell is used in a function that requires two additional inputs (parameter estimates). The functions that generate the final arrivals probability that can be made up of multiple pathways can include up to eleven formulae (see Carmac et al. 2021, pages 11 –12).
The first additional input is the leakage rate. The leakage rate corresponds to the upper and lower bounds of the number of consignments expected to arrive in Australia each year that could constitute a leakage event. This estimate is formed using a combination of interception rates and expert opinion (Hemming et al. 2018), and the estimates are usually made by the federal Department of Agriculture Fisheries and Forestry. The estimate includes the 95% confidence interval of the estimate (lower 2.5th and 97.5th percentiles). Examples of a leakage event includes the arrival of an international passenger with a plant disease on an undeclared plat, or molluscs on the bottom of an arriving boat.
The second additional input required from users includes the estimated 95% confidence interval (lower 2.5th and upper 97.5th percentile) of the expected viability of leakage events. Estimates of viability are often based on a combination of considerations related to how likely would it be for a pest to survive a leakage event. This can include how likely is it that the pest would survive the trip along the pathway, will there be enough pests to permit establishment, will the end point of the pathway be a place the pest can survive. As with the leakage rates, these estimates are made based on expert elicitation (Hemming et al. 2018) by the federal Department of Agriculture, Fisheries and Forestry.
Most pests can enter Australia from multiple pathways, so the final pest arrival probability is usually the union of each pathway probability of entry grid. The union of individual pathway probabilities is usually what is used because arrival pathway probabilities are usually not independent. Other methods may also be used, and better estimates of arrivals can be uploaded by you the user.
The first step to generating a pest arrivals pathway is to click on the “Add New Input” button on the Input Parameters tab.
Then we think that there are two components to the pathway. First, residents returning home from overseas, and second international tourist arrivals, but here in this toy problem, we imagine that only international tourists arriving in Cairns are likely to be carrying Cane Toads. Click on the Combine Layers function.
We then choose to “Add New Input” from the Combine Layers line of the text tree on the left.
From the Conform domestic arrivals line on the tree we “Add” a layer by uploading our own environmental data.
If we zoom in on the uploaded data, we can see domestic arrivals are a human population grid, showing most people reside in or near the cities. ( into the Sydney area here)
The International arrivals to Cairns had a distance weight applied, so the likelihood tourists traveling from Cairns reduces as they get further from Cairns.
We then select the sum of these two click on the “Run (Combine Layers)” button on the lower left. These grids are the number of tourists arriving so adding them is reasonable.
We then “run” the pathway input parameters. Note this is indicating that between 3 and 30 Cane Toads are expected to arrive from returning tourists each year, and the viability of toads in luggage ranges from 10% to 90%, very tough toads!
We then added a Mail Pathway, and then calculated the total Pest Arrivals probability as the Union of the Tourists Pathway and the Mail Pathway. The arrival of Cane Toads on different pathways is not independent, if there is an increase in the travelling Cane Toad population, then the number of entries from all pathways is likely to increase. Again, when probabilities are not independent, use of a union is appropriate.
- (1 – Tourists Pathway (1 – Mail Pathway probability
The default for this final step is to multiply (intersect) the Pest Suitability probability (remember this was the Abiotic Suitability multiplied by the Biotic Suitability) by the Pest Arrivals probability grids. Probabilities should be multiplied if the two probabilities are independent, in other words the probability of Pest Suitability is not related to the probability of pest arrivals. However, it is also possible to add layers together instead of multiplying them. You would only add layers if the probability at a location was based on either Pest Suitability or Pest Arrivals, but not both. A union of probabilities would be appropriate if the Pest Suitability and Pest Arrivals were not independent. Union of probabilities adds the two probabilities, but then subtracts the locations where the probabilities intersect:
- (1 - Pest Suitability (1 – Pest Arrivals probability
It is also possible to leave out abiotic suitability, biotic suitability or pest arrivals if there is no helpful information for any of those three estimates. The variety of ways the user can compute an overall Pest Establishment Likelihood is one reason the values of results run on different pests cannot be quantitatively compared. In other words, results are relative or dependent on the decisions made during the construction of the model. See more on probabilities. Still the relative probability maps are useful for understanding where pest establishment is most or least likely.