Introduction

Boosted Regression Tree (BRT) models are a combination of two techniques: decision tree algorithms and boosting methods. Like Random Forest models, BRTs repeatedly fit many decision trees to improve the accuracy of the model. One of the differences between these two methods is the way in which the data to build the trees is selected. Both techniques take a random subset of all data for each new tree that is built. All random subsets have the same number of data points, and are selected from the complete dataset. Used data is placed back in the full dataset and can be selected in subsequent trees. While Random Forest models use the bagging method, which means that each occurrence has an equal probability of being selected in subsequent samples, BRTs use the boosting method in which the input data are weighted in subsequent trees. The weights are applied in such a way that data that was poorly modelled by previous trees has a higher probability of being selected in the new tree. This means that after the first tree is fitted the model will take into account the error in the prediction of that tree to fit the next tree, and so on. By taking into account the fit of previous trees that are built, the model continuously tries to improve its accuracy. This sequential approach is unique to boosting.

Boosted Regression Trees have two important parameters that need to be specified by the user.

  • Tree complexity (tc): this controls the number of splits in each tree. A tc value of 1 results in trees with only 1 split, and means that the model does not take into account interactions between environmental variables. A tc value of 2 results in two splits and so on.
  • Learning rate (lr): this determines the contribution of each tree to the growing model. As small value of lr results in many trees to be built.


These two parameters together determine the number of trees that is required for optimal prediction. The aim is to find the combination of parameters that results in the minimum error for predictions. As a rule of thumb, it is advised to use a combination of tree complexity and learning rate values that result in a model with at least 1000 trees. The optimal ‘tc’ and ‘lr’ values depend on the size of your dataset. For datasets with <500 occurrence points, it is best to model simple trees (‘tc’ = 2 or 3) with small enough learning rates to allow the model to grow at least 1000 trees.


Boosted Regression Trees are a powerful algorithm and work very well with large datasets or when you have a large number of environmental variables compared to the number of observations, and they are very robust to missing values and outliers.


Advantages

  • Can be used with a variety of response types (binomial, gaussian, poisson)
  • Stochastic, which improves predictive performance
  • The best fit is automatically detected by the algorithm
  • Model represents the effect of each predictor after accounting for the effects of other predictors
  • Robust to missing values and outliers


Limitations

  • Needs at least 2 predictor variables to run


Assumptions

No formal distributional assumptions, boosted regression trees are non-parametric and can thus handle skewed and multi-modal data as well as categorical data that are ordinal or non-ordinal.


Requires absence data

Yes.


Configuration

The BCCVL uses the ‘gbm.step’ function in the ‘dismo’ package. The user can set the following configuration options: 


number of cross validations (nfolds)
The number of cross-validation folds should be created of training (cv - 1) and testing (1). The default number of cross-validations is 100, which means that the dataset is divided into 10 subsets, using 9 subsets as training data to calibrate the model, and 1 subset as test data.
prevalence stratify
Whether subsets should be stratified. This means that the subsets are selected in such a way that the mean response value is approximately equal in all subsets. In case of binomial dta, each subset will thus contain roughly the same proporation of each dta class, for example presence/absence.
family
Distribution of the response variable. For binary data such as presence/absence of species, 'bernoulli' should be used, which is the default for this option.
number of trees added each cycle (n.trees)
Number of initial trees to fit, and then added to the model at each cycle. For example, if the default of 50 is selected, the model will start with fitting 50 trees using recursive binary partionoing of the data. Residulas form the initial fit are then fitted with another set of 50 trees, these residuals are then fitted with another set of trees, and so forth, whereby the process focuses more and more on poorly modelled occurrences form previous sets of trees.
maximum number of trees
Maximum number of trees to fit before stopping.
tolerance method
Method used in deciding to stop. If this is set to 'fixed', the value indicated in 'tolerance value' is used. If this is set to 'auto', which is the default, the value is 'tolerance value*total mean deviance'.
tolerance value
Value to use in 'tolerance method'.


References

  • De'Ath G (2007) Boosted trees for ecological modeling and prediction. Ecology, 88(1): 243-251.
  • Elith J, Leathwick JR, Hastie T (2008) A working guide to boosted regression trees. Journal of Animal Ecology, 77(4): 802-813.
  • Franklin J (2010) Mapping species distributions: spatial inference and prediction. Cambridge University Press.