Multivariate Description

HES 505 Fall 2025: Session 18

Matt Williamson

Objectives

  • Describe the motivations for dimension reduction with spatial data

  • Distinguish the differences between PCA and cluster analysis

  • Understand common metrics for assessing the quality of dimension reduction results

  • Implement simple PCA and cluster analysis using R

What is Dimension Reduction

  • Reducing the number of variables without losing important information!!

  • Generalizeable (not just for spatial data)

Why Dimension Reduction

  • Dealing with (multiple) correlations

  • Identifying latent structures in the data

  • Parsimony in statistical models

Identifying Gradients with PCA

What is PCA?

  • Dimension reduction without info loss

  • Comonents are “new” composed of “parts” of variables

  • Variables “load” onto components

  • Components are defined to be orthogonal

How does PCA work?

  • Creates new “coordinate systems” that maximizes variance on that axis

  • “Rotation matrix” maps original variables onto new axes

How can we tell how well PCA worked?

  • Variance Explained per component

  • Cumulative variance explained (scree plots)

Unsupervised Classification with Cluster Analysis

Unsupervised Classification

  • No “outcome” variable

  • Consistent “patterns” in the data = clusters

  • Reduces (a lot of) information into categorical classes

How Does Cluster Analysis work?

  • Data in multivariate space

  • Cluster centers assigned to minimize distance to observations and maximize distance between centers

  • Variaton in how centroids are assigned and stopping rules

Evaluating the Quality of Our Clusters

  • Explained inertia - within cluster variation (0:1, lower = better)

  • Silhouette index - overall structure of the data (-1:1, higher = better)

Final Thoughts

Things to Consider

  • Spatial autocorrelation

  • Newer Algorithms

  • What is your Goal?