*Disclaimer: Please note that Climatch – lite* implements the Climatch algorithm (Euclidean), but is not the Climatch package, which can be found at Climatch (agriculture.gov.au). Results will therefore differ when implementing Climatch – lite* compared to the Climatch package, due to differing climate input data. Details of the Climatch algorithm can be found in the Climatch manual Climatch (agriculture.gov.au).

## Climatch - lite (available on Biosecurity Commons)

The matching problems available here consist of reporting the ‘distance’ or difference of environmental data values in the locations where the pest is known to a occur and paired locations of where it has not been recorded as present so that the sum of the differences between the endpoints of each pair is minimized”

Another way to think of a matching problem would involve finding the best way to pair your friends who live in different places so that the total distance someone would have to travel between each friend pair is as short as possible. If your friends are bike riders that shortest distance might include the steepness of the hills between friends, the smoothness of the roads between friends, as well as the distance.

What you want to do is to pair up these points so that for each pair, you have a 'start' point and an 'end' point. The task is to do this in such a way that if you were to measure the straight-line distances (the "Euclidean distances") between all of these pairs and add them up, you'd get the smallest total distance possible.

Two different matching methods are available on Biosecurity Commons to measure how similar the climate is in two sets of different locations; one set of locations where pests are known to occur and another set of locations where the pest has not been recorded. These comparisons usually include a variety of climate variables related to temperature, and rainfall. One of our methods minimises the average variable values between pairs variables (Euclidean), while the other looks to minimise the pair differences between the worst matching variables.

The first method is called the "Euclidean" algorithm.

Here's a simple way to explain it: Let's say you and a friend are standing in a park. You are at one point, and your friend is at another. The "Euclidean distance" is the straight-line distance from where you are standing to where your friend is standing. Now, think of each climate variable (like temperature, rainfall, etc.) as a point in this park. This method calculates the straight-line distance between each climate variable at your source location (where pest was recorded) and the corresponding climate variable at the location you're interested in (location where the pest has not been recorded but might have suitable climate). It does this for all climate variables and then finds an average "distance" (or difference). The source location with the closest average distance to the target location is considered the best match. In other words if you added up the euclidean distances between all these pair, the best match would be the one where you’d get the smallest total distance possible.

We then square the resulting difference (to make sure all differences are positive), and then average them out over all variables. Then, we take the square root of that average. This gives us a 'climate distance' score between locations where the pest is known to occur and where the pest has not been recorded yet.

The second method is the "Closest standard score" algorithm.

This is a bit different. Instead of calculating an average distance for all climate variables, this method looks at each climate variable individually. It's like going through the park again but this time, instead of meeting at one point, you and your friend are racing to different points one by one. The "distance" here is the difference for each climate variable between your source location and the target location. Among these "distances", it focuses on the longest one – the one that takes you the most time to run to your friend. The source site with the shortest longest distance to the target location is considered the best match.

How do these two matching methods vary in Biosecurity Commons?

The primary difference between these two methods is about what they focus on. The first method looks at the average difference across all climate variables while the second one focuses on the variable that's the least similar in other words the single worst-matching climate variable and gives a lower match level if any single climate variable is very different.

These two methods provide different ways to consider what makes two climates 'similar'. One says that the overall average climate matters, while the other says that the worst-matching variable is the most important. Depending on what you care about, you might choose to use one method over the other.

These methods initially produce values that when they = 0 indicate the best (perfect) match, but in order to align with other methods where similar climatic conditions are scored highly usually between 0 and 1 with 1 being best, or in Climatch between 1 and 10 with 10 the best match.  Users have the option to scale results between 0 & 1 (the default) or 1 & 10.

• Quick and easy way to identify environmental space used by selected pests
• Only requires presence occurrence data included within a gridded study area
• Algorithm has been widely used in pest and weed biosecurity risk analyses
• Again, due to use of grid data on our platform the matching is not like the official Climatch algorithm which compares target and source point locations.

Limitations
• Simple and less sophisticated approach than other SDM methods
• Only uses continuous predictor variables

Assumptions

No assumptions are made about the distributions of the environmental variables. However, as with all bioclimatic profile methods spatial autocorrelation can confound results.

## Requires absence data

No

### Configuration options

Biosecurity Commons allows the user to set model arguments as specified below.

 random_seed Seed used for generating random values. Using the same seed value, i.e. 123, ensures that running the same model, with the same data and settings generates the same result, despite stochastic processes such as machine learning or cross-validation. algorithm Climatch method algorithm (default = 'euclidean')/ Currently Euclidean is the only one available but "closest standard score" should be coming soon d_max Maximum range distance (default = 50 km) used when matching occurrence points to nearest climate data points/cells Note this is the major difference from the official Climatch site as_score True or FalseTrue returns a score in each cell with integer values from 1 to 10 False - is the default and grid cell values are scaled between 0 and 1.

## References - from Climatch manual

Barry, S 2006, CLIMATE (PC version), Bureau of Rural Sciences, Canberra, accessed October 2020.

Bomford, M 2003, Risk Assessment for the Import and Keeping of Exotic Vertebrates in Australia, Bureau of Rural Sciences, Canberra.

Bomford, M 2006, Risk assessment for the establishment of exotic vertebrates in Australia: recalibration and refinement of models, Department of the Environment and Heritage Canberra.

Bomford, M 2008, Risk assessment models for establishment of exotic vertebrates in Australia and New Zealand, Bureau of Rural Sciences, Canberra.

Bomford, M, Kraus, F, Barry, SC & Lawrence, E 2009, ‘Predicting establishment success for alien reptiles and amphibians: a role for climate matching’, Biological Invasions, vol. 11, no. 3, pp.713.

Crombie, J, Brown, L, Lizzio, J & Hood, G 2008, Climatch user manual, Bureau of Rural Sciences, Canberra.

Elith, J, Graham, CH, Anderson, RP, Dudı´k, M, Ferrier, S, Guisan, A, Hijmans, RJ, Huettmann, F, Leathwick, JR, Lehmann, A, Li, J, Lohmann, LG, Loiselle, BA, Manion, G, Moritz, C, Nakamura, M, Nakazawa, Y, Overton, JMcC, Peterson, AT, Phillips, SJ, Richardson, KS, Scachetti-Pereira, R, Schapire, RE, Sobero´n, J, Williams, S, Wisz, MS & Zimmermann, NE 2006, ‘Novel methods improve prediction of species’ distributions from occurrence data’, Ecography, vol. 29, no. 2, pp. 129-151.

Elmore, KL & Richman, MB 2001, ‘Euclidean distance as a similarity metric for principal component analysis’, Monthly Weather Review, vol. 129, pp. 540-549.

Fick, SE, Hijmans, RJ 2017, ‘Worldclim 2: New 1-km spatial resolution climate surfaces for global land areas’, International Journal of Climatology, vol. 37, no. 12, pp. 4302-4315.

Hijmans, RJ, Cameron, S. Parra, J, Jones, P,  Jarvis, A,  Richardson, K 2010,  WorldClim: Global weather stations, The Museum of Vertebrate Zoology, University of California, Berkeley, in collaboration with and (CIAT), and with (Rainforest CRC), accessed October 2020.

Hijmans, RJ, Cameron, SE, Parra, JL,  Jones, PG,  Jarvis, A 2005, ‘Very high resolution interpolated climate surfaces for global land areas’, International Journal of Climatology, vol. 25, no. 15, pp. 1965-1978.

Pheloung, PC 1996a, CLIMATE (Macintosh version), Agriculture Western Australia, Perth.

Pheloung, PC 1996b, CLIMATE: a system to predict the distribution of an organism based on climate preferences, Agriculture Western Australia, Perth.

Pheloung, PC 1995, Determining the weed potential of new plant introductions to Australia: A report on the development of a Weed Risk Assessment System commissioned by the Australian Weeds Committee and the Plant Industries Committee, Agriculture Protection Board, Western Australia.

## Desciption of the official Climatch model - copied from the manual

Copied directly from the Climatic Manual: Full model here: https://climatch.cp1.agriculture.gov.au/

“The ‘Euclidean’ algorithm, also referred to as ‘Closest Euclidean Match’, uses a metric similar to Euclidean distance to calculate the ‘climate distance’ between the source climate station sites (site) and each target location of interest (location) across the climate variables used in the analysis (for more detail see Elmore and Richman (2001)). The source site with the metric that is closest to a given target location determines the level of match class. The mathematical form of the metric is given in formula 1 below.

The Euclidean matching algorithm implemented in Climatch calculates the distance to a target location j in Australia as

(1)

where the floor function rounds down, and yjk is the kth climate variable for the jth target location and the ith source site. The variable k is from 1 to K, where K=16 if all 16 climate variables are used.

The level of the match produced by the ‘Closest standard score’ algorithm, also referred to as the ‘highest floor match’, is based on the maximum Euclidian distance of each climate variable considered individually, between the source sites and each target location. The level of the match class for a given output location is determined by the source site that is closest to it with respect to this metric. This is the formula used in the original Macintosh version (v1) of CLIMATE:

where the cut function cut(a,b) returns the interval, as defined by the break points in the ordered set of b, into which a falls (for example,

will return 7). The difference between the ‘Closest standard score’ and the ‘Euclidean’ methods is that the former is based on the match of the least close climate variable, and the latter on the ‘average’ of all climate variables.”

While different to the application here, a nice explanation of the euclidean algorithm is here.