HES 505 Fall 2025: Session 18
Describe the motivations for dimension reduction with spatial data
Distinguish the differences between PCA and cluster analysis
Understand common metrics for assessing the quality of dimension reduction results
Implement simple PCA and cluster analysis using R
Reducing the number of variables without losing important information!!
Generalizeable (not just for spatial data)
Dealing with (multiple) correlations
Identifying latent structures in the data
Parsimony in statistical models
Dimension reduction without info loss
Comonents are “new” composed of “parts” of variables
Variables “load” onto components
Components are defined to be orthogonal
Creates new “coordinate systems” that maximizes variance on that axis
“Rotation matrix” maps original variables onto new axes
Variance Explained per component
Cumulative variance explained (scree plots)
No “outcome” variable
Consistent “patterns” in the data = clusters
Reduces (a lot of) information into categorical classes
Data in multivariate space
Cluster centers assigned to minimize distance to observations and maximize distance between centers
Variaton in how centroids are assigned and stopping rules
Explained inertia - within cluster variation (0:1, lower = better)
Silhouette index - overall structure of the data (-1:1, higher = better)
Spatial autocorrelation
Newer Algorithms
What is your Goal?