Machine Learning Models: Trees

Content for Thursday, October 30, 2025

In the previous set of lectures, we thought about explanation of spatial patterns through a fairly simple lens. Namely, that the best explanation for variation in values is a function of distance. As such, the best prediction of new values take into account the measurement and some function of distance. In deterministic methods, we assume that the measurements we have should simpley be smoothed based on distance. In probabilistic methods, we allow for the idea that the underlying mean of the process is unknown (and can vary) and then exploit spatial covariance to more meaningfully account for relationship between distance and the mean of the process (allowing for potential second-order effects). It is often the case, however, that we want to know more about this unknown mean value (which factors are most important, how do they affect the process, etc). This might be because we are interested in inference about those factors more than we care about complete predictions. In this case, me might use statistical-learning models to take advantage of the data we have in a way that is computationally efficient. We’ll talk about some of the simpler methods for doing that today: tree based methods.

Resources

Random forests for Classification in Ecology by (Cutler et al. 2007) provides an introduction to the utility of Random Forests for ecologists.
Statistical modeling of spatial data in (Pebesma and Bivand 2023) gives a nice overview of the different goals for spatial statistical modeling and considerations one should be aware of when trying to draw inference from these types of analyses.
An Introduction to Statistical Learning by (James et al. 2021) is a comprehensive introduction to a number of statistical learning techniques with examples in R. Although these examples are not necessarily spatial, the chapters provide a lot of the background necessary for understanding what the models are doing.
Spatial machine learning with R: caret, tidymodels, and mlr3 by Jakub Nowosad provides a useful introduction to fitting Random Forests under 3 different modeling paradigms. The caret package has been around for a long time and is a “go to” for lots of machine learning methods. The other two are newer and have quite a few more methods, but have been less integrated into the existing spatial modeling paradigms.

Objectives

By the end of today you should be able to:

Differentiate supervised from unsupervised classification
Recognize the linkage between statistical learning models and interpolation
Define the key elements of tree-based classifiers
Articulate the differences in statistical learning for spatial data

View all slides in new window Download PDF of all slides

References

Cutler, D. R., T. C. Edwards Jr., K. H. Beard, A. Cutler, K. T. Hess, J. Gibson, and J. J. Lawler. 2007. RANDOM FORESTS FOR CLASSIFICATION IN ECOLOGY. Ecology 88:2783–2792.

James, G., D. Witten, T. Hastie, and R. Tibshirani. 2021. Classification. Pages 129–195 An introduction to statistical learning: With applications in r. Springer US, New York, NY.

Pebesma, E., and R. Bivand. 2023. Spatial data science: With applications in R. Chapman; Hall/CRC, Boca Raton.