HES 505 Fall 2025: Session 23
By the end of today you should be able to:
Define the likelihood function and its relationship to statistical infrerence.
Recognize key assumptions of statistical models and how spatial data may challenge those assumptions.
Simulate fake data with known relationships
Fit simple linear and generalized linear models to spatial data
\[ \begin{equation} z(\mathbf{x}) = \mu(\mathbf{x}) + \epsilon(\mathbf{x}) \end{equation} \]
Often we actually care about the “drivers” of \(\mu(x)\)
Inference not prediction
“Which spatial attributes drive \(z(\mathbf{x})\) ?”
Does \(\mu(x)\) increase or decrease with changes in a particular variable?”
\[ \begin{equation} z(\mathbf{s}) \sim Distr(\mu, \sigma)\\ \mu = w_0 + \sum_{i=1}^{m}w_iX_i(\mathbf{s}) + \epsilon \end{equation} \]
When \(z(s)\) is binary → logistic regression
When \(z(s)\) is continuous → linear (gamma) regression
When \(z(s)\) is discrete → Poisson regression
Assumptions about \(\epsilon\) matter!!
\[ \begin{aligned} y &\sim N(\bar{y}, \bar{\sigma}) &&\quad\text{(linear regression)}\\ y &= w_0 + \sum_{i=1}^{m}w_iX_i(\mathbf{s}) + \epsilon\\ \\ y &\sim \text{Bern}(p) &&\quad\text{(logistic regression)}\\ \text{logit}(y) &= w_0 + \sum_{i=1}^{m}w_iX_i(\mathbf{s}) + \epsilon\\ \\ y &\sim \text{Poisson}(\lambda) &&\quad\text{(Poisson regression)}\\ \log(\lambda) &= w_0 + \sum_{i=1}^{m}w_iX_i(\mathbf{s}) + \epsilon \end{aligned} \]
Distributional assumptions - the likelihood
\(w_i\) is ‘spatial weight’, equivalent to \(\beta\) in typical regression
link function scales linear model to appropriate support
You measure \(y\)
Your model expresses your hypothesis about the rules governing \(y\)
We need estimates of \(w_i\) to complete our rule
\[ p(y_i \mid w, \sigma) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\!\left(-\frac{(y_i - w_iX_i(\mathbf{s}))^2}{2\sigma^2}\right)\\ p(\mathbf{y}\mid w, \sigma) = \prod_{i=1}^n \frac{1}{\sqrt{2\pi\sigma^2}} \exp\!\left(-\frac{(y_i - w_iX_i(\mathbf{s}))^2}{2\sigma^2}\right) \]
Maximum Likelihood: treats \(w_i\) as fixed, unknown constants. Uncertainty calculated after…
Bayesian = \(w_i\) is a random quantity which varies within the constraints of our prior. Uncertainty built into the estimation process.
MLE: “What’s the best dart throw?”
Bayesian: “Given where darts tend to land, what is the distribution of possible aiming points?”
Shape of likelihood function gives an indication of residual behavior
\(\prod\) signals independent observations
Moments of the distribution add assumptions (i.e. Poisson mean = variance)
To identify important correlations between predictors and the occurrence of an event
Generate maps of the ‘range’ or ‘niche’ of events
Understand spatial patterns of event co-occurrence
Forecast changes in event distributions
From Long
Spatially referenced locations of events \((\mathbf{y})\) sampled from the study extent
A matrix of predictors \((\mathbf{X})\) that can be assigned to each event based on spatial location
Goal: Estimate the probability of occurrence of events across unsampled regions of the study area based on correlations with predictors
Random or systematic sample of the study region
The presence (or absence) of the event is recorded for each point
Hypothesized predictors of occurrence are measured (or extracted) at each point
We can model favorability as the probability of occurrence using a logistic regression
A link function maps the linear predictor \((\mathbf{x_i}'\beta + \alpha)\) onto the support (0-1) for probabilities
Estimates of \(\beta\) can then be used to generate ‘wall-to-wall’ spatial predictions
\[ \begin{equation} y_{i} \sim \text{Bern}(p_i)\\ \text{link}(p_i) = \mathbf{x_i}'\beta + \alpha \end{equation} \]
Statistical models describe a generative process: how’d the data get here
Machine learning models ask something different
Underlying modes of prediction