Machine Learning I: Trees

HES 505 Fall 2025: Session 21

Matt Williamson

Updates

  • Grading complete by end of week

  • NO CLASS WEDS 5 Nov

  • Final project descriptions!!

Statistical Learning

Objectives

  • Differentiate supervised from unsupervised classification

  • Recognize the linkage between statistical learning models and interpolation

  • Define the key elements of tree-based classifiers

  • Articulate the differences in statistical learning for spatial data

Explaining Spatial Patterns

  • Cognitive Description: collection ordering and classification of data

  • Cause and Effect: design-based or model-based testing of the factors that give rise to geographic distributions

  • Systems Analysis: describes the entire complex set of interactions that structure an activity

Explanation So Far

  • Deterministic Interpolation: best explanation for variation is distance

  • Probabilistic Interpolation: best explanation is covariance (plus spatial trends)

First Order Properties

\[ \begin{equation} z(\mathbf{x}) = \mu(\mathbf{x}) + \epsilon(\mathbf{x}) \end{equation} \]

  • Often we actually care about the “drivers” of \(\mu(x)\)

  • Inference not prediction

Key Challenges

  • Data volume

  • Sampling designs

  • Nonlinearity

  • Autocorrelation

Statistical Learning

  • “Machine Learning”

  • Broad class of methods that rely on efficient, algorithmic learning

  • Probabilities derived from “large” data (rather than theoretical distributions)

  • Relax traditional statistical assumptions

Supervised Learning/Classification

  • Goal is inference/prediction not grouping

  • Training data: observations with known values used to generate model parameters

  • Validation data: observations with known values used to tune hyperparameters prior to full model implementation

  • Testing data: observations held out of initial model-fitting to determine accuracy

Classification Trees

  • Use decision rules to segment the predictor space

  • Series of consecutive decision rules form a ‘tree’

  • Terminal nodes (leaves) are the outcome; internal nodes (branches) the splits

Classification Trees

  • Divide the predictor space (\(R\)) into \(J\) non-overlapping regions

  • Every observation in \(R_j\) gets the same prediction

  • Recursive binary splitting

  • Maximize the information gained at each split

  • Pruning and over-fitting

Benefits and drawbacks

Benefits

  • Easy to explain

  • Links to human decision-making

  • Graphical displays

  • Easy handling of qualitative predictors

Costs

  • Lower predictive accuracy than other methods

  • Not necessarily robust

Random Forests

  • Grow 100(000s) of trees using bootstrapping

  • Random sample of predictors considered at each split

  • Avoids correlation amongst multiple predictions

  • Average of trees improves overall outcome (usually)

  • Lots of extensions

Sooooo Many Trees

  • How to build ensembles?

  • The number of and tuning of hyperparameters

  • Boosting

Key Challenges with Tree-Based Methods

  • No direct way to incorporate spatial autocorrelation

  • Training Set qualities vs. Inferential conditions

  • Cross-validation considerations