Machine Learning I: Trees

HES 505 Fall 2025: Session 21

Matt Williamson

Updates

Grading complete by end of week
NO CLASS WEDS 5 Nov
Final project descriptions!!

Statistical Learning

Objectives

Differentiate supervised from unsupervised classification
Recognize the linkage between statistical learning models and interpolation
Define the key elements of tree-based classifiers
Articulate the differences in statistical learning for spatial data

Explaining Spatial Patterns

Cognitive Description: collection ordering and classification of data
Cause and Effect: design-based or model-based testing of the factors that give rise to geographic distributions
Systems Analysis: describes the entire complex set of interactions that structure an activity

Explanation So Far

Deterministic Interpolation: best explanation for variation is distance
Probabilistic Interpolation: best explanation is covariance (plus spatial trends)

First Order Properties

\[ \begin{equation} z(\mathbf{x}) = \mu(\mathbf{x}) + \epsilon(\mathbf{x}) \end{equation} \]

Often we actually care about the “drivers” of \(\mu(x)\)
Inference not prediction

Key Challenges

Data volume
Sampling designs
Nonlinearity
Autocorrelation

Statistical Learning

“Machine Learning”
Broad class of methods that rely on efficient, algorithmic learning
Probabilities derived from “large” data (rather than theoretical distributions)
Relax traditional statistical assumptions

Supervised Learning/Classification

Goal is inference/prediction not grouping
Training data: observations with known values used to generate model parameters
Validation data: observations with known values used to tune hyperparameters prior to full model implementation
Testing data: observations held out of initial model-fitting to determine accuracy

Classification Trees

Use decision rules to segment the predictor space
Series of consecutive decision rules form a ‘tree’
Terminal nodes (leaves) are the outcome; internal nodes (branches) the splits

Classification Trees

Divide the predictor space (\(R\)) into \(J\) non-overlapping regions
Every observation in \(R_j\) gets the same prediction
Recursive binary splitting
Maximize the information gained at each split
Pruning and over-fitting

Benefits and drawbacks

Benefits

Easy to explain
Links to human decision-making
Graphical displays
Easy handling of qualitative predictors

Costs

Lower predictive accuracy than other methods
Not necessarily robust

Random Forests

Grow 100(000s) of trees using bootstrapping
Random sample of predictors considered at each split
Avoids correlation amongst multiple predictions
Average of trees improves overall outcome (usually)
Lots of extensions

Sooooo Many Trees

How to build ensembles?
The number of and tuning of hyperparameters
Boosting

Key Challenges with Tree-Based Methods

No direct way to incorporate spatial autocorrelation
Training Set qualities vs. Inferential conditions
Cross-validation considerations