Interpolation and Kriging

Getting Started

We’ll be using the meuse.all data that ships with the gstat package as it is one that shows up in many explanations of geostatistical interpolation. It has the added bonus of being a reasonably small dataset to allow us to practice a few things without having to wait an extremly long time for the models to run. In addition to the gstat package, we’re adding the fields package which allows us to fit a variety of functional forms to spatial data. Beyond that, the rest of the packages should look familiar.

Code

library(gstat)
library(sf)

Linking to GEOS 3.13.0, GDAL 3.8.5, PROJ 9.5.1; sf_use_s2() is TRUE

Code

library(terra)

terra 1.8.70

Code

library(fields)

Loading required package: spam

Spam version 2.11-1 (2025-01-20) is loaded.
Type 'help( Spam)' or 'demo( spam)' for a short introduction 
and overview of this package.
Help for individual functions is also obtained by adding the
suffix '.spam' to the function name, e.g. 'help( chol.spam)'.


Attaching package: 'spam'

The following objects are masked from 'package:base':

    backsolve, forwardsolve

Loading required package: viridisLite

Loading required package: RColorBrewer


Try help(fields) to get started.


Attaching package: 'fields'

The following object is masked from 'package:terra':

    describe

Code

library(ggplot2)
library(patchwork)


Attaching package: 'patchwork'

The following object is masked from 'package:terra':

    area

Code

library(tidyterra)


Attaching package: 'tidyterra'

The following object is masked from 'package:stats':

    filter

Code

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.1     ✔ stringr   1.5.2
✔ lubridate 1.9.4     ✔ tibble    3.3.0
✔ purrr     1.1.0     ✔ tidyr     1.3.1

── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ tidyr::extract() masks terra::extract()
✖ dplyr::filter()  masks tidyterra::filter(), stats::filter()
✖ dplyr::lag()     masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Code

library(stringr)

Loading the Data

Because the data ships with a package, we can use the data function to load it directly into our environment. Then we need to get it into an sf object. We’ll take a quick look at the CRS (with st_crs) just to get a sense for what the units look like.

Code

data(meuse.all)
meuse_sf <- st_as_sf(meuse.all, 
                     coords = c("x", "y"), 
                     crs = 28992, 
                     agr = "constant")

st_crs(meuse_sf)

Coordinate Reference System:
  User input: EPSG:28992 
  wkt:
PROJCRS["Amersfoort / RD New",
    BASEGEOGCRS["Amersfoort",
        DATUM["Amersfoort",
            ELLIPSOID["Bessel 1841",6377397.155,299.1528128,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4289]],
    CONVERSION["RD New",
        METHOD["Oblique Stereographic",
            ID["EPSG",9809]],
        PARAMETER["Latitude of natural origin",52.1561605555556,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",5.38763888888889,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",0.9999079,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",155000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",463000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["easting (X)",east,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["northing (Y)",north,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Engineering survey, topographic mapping."],
        AREA["Netherlands - onshore, including Waddenzee, Dutch Wadden Islands and 12-mile offshore coastal zone."],
        BBOX[50.75,3.2,53.7,7.22]],
    ID["EPSG",28992]]

Code

glimpse(meuse_sf)

Rows: 164
Columns: 16
$ sample      <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,…
$ cadmium     <dbl> 11.7, 8.6, 6.5, 2.6, 2.8, 3.0, 3.2, 2.8, 2.4, 1.6, 1.4, 1.…
$ copper      <dbl> 85, 81, 68, 81, 48, 61, 31, 29, 37, 24, 25, 25, 93, 31, 27…
$ lead        <dbl> 299, 277, 199, 116, 117, 137, 132, 150, 133, 80, 86, 97, 2…
$ zinc        <dbl> 1022, 1141, 640, 257, 269, 281, 346, 406, 347, 183, 189, 2…
$ elev        <dbl> 7.909, 6.983, 7.800, 7.655, 7.480, 7.791, 8.217, 8.490, 8.…
$ dist.m      <dbl> 50, 30, 150, 270, 380, 470, 240, 120, 240, 420, 400, 300, …
$ om          <dbl> 13.6, 14.0, 13.0, 8.0, 8.7, 7.8, 9.2, 9.5, 10.6, 6.3, 6.4,…
$ ffreq       <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ soil        <dbl> 1, 1, 1, 2, 2, 2, 2, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ lime        <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1…
$ landuse     <fct> Ah, Ah, Ah, Ga, Ah, Ga, Ah, Ab, Ab, W, Fh, Ag, W, Ah, Ah, …
$ in.pit      <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FA…
$ in.meuse155 <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE…
$ in.BMcD     <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FA…
$ geometry    <POINT [m]> POINT (181072 333611), POINT (181025 333558), POINT …

Now that you’ve gotten a chance to look at the data a bit, we’ll set up our interpolation grid. Unfortunately, the various interpolation functions available have different expectations for what data types your using to generate the interpolation model and grid so we’ve got to make a few different versions of these objects. To do this, I use st_bbox to get the bounding box for the data and convert that bbox into an sf object. Then, I create the interpolation grid by creating a blank raster within the bbox. We specify the resolution implicitly by specifying the number of rows and columns inside our raster. Once we have the raster, we convert into both a data.frame and an sf object. Lastly we bring a raster of the meuse dataset to provide a template fr cropping.

Code

box <- st_as_sfc(st_bbox(meuse_sf))
grd <- rast(vect(box), ncol= 1000, nrow=1000)
grid_pts <- as.data.frame(crds(grd, df = TRUE)) %>% rename(x = x, y = y)
grid_sf <- st_as_sf(grid_pts, coords = c("x", "y"), crs = st_crs(meuse_sf))

r <-rast(system.file("ex/meuse.tif", package="terra")) |> 
  project(x=_, y = crs("EPSG:28992")) |>
  resample(x=_, y = grd)

Deterministic Approaches

Inverse Distance Weighting

The gstat function is similar to the formula function in base R, it’s just a way of taking the various arguments and constructing the appropriate spatial call. We pass an id parameter which gstat will use to identify the values produced by anything that uses the formula we construct. The formula argument allows us to specify what we want to model and how. In this case, we are using IDW so we are saying that zinc values are a function of (~) some unknown average. We pass a locations argument to let gstat know where the data and spatial locations are. The nmax argument sets the number of points that will be used to calculate our unknown average. Finally, the set argument accepts a variety of parameters that are specific to whichever form of model you are hoping to implement (you can find more information here at the original version of the gstat manual).

Code

idw_formula_lin <- gstat(id = "zinc", formula = zinc~1, locations = meuse_sf, 
            nmax=10, set=list(idp = 1))
idw_formula_lin

data:
zinc : formula = zinc`~`1 ; data dim = 164 x 15 nmax = 10
set idp = 1;

Once we have the formula set up, we can use the predict function to call the formula and predict values for all of the points in our grid. This outputs an sf object with predictions (and variance, though that is NA here because we’re using IDW) for all of the points. We convert that object to a raster with rasterize and mask it with our template raster (using crop with mask=TRUE) and combine our two rasters to plot them side by side.

Code

idw_interp <- predict(idw_formula_lin, grid_sf)

[inverse distance weighted interpolation]

Code

str(idw_interp)

Classes 'sf' and 'data.frame':  1000000 obs. of  3 variables:
 $ zinc.pred: num  993 993 993 993 993 ...
 $ zinc.var : num  NA NA NA NA NA NA NA NA NA NA ...
 $ geometry :sfc_POINT of length 1000000; first list element:  'XY' num  178606 333609
 - attr(*, "sf_column")= chr "geometry"
 - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA
  ..- attr(*, "names")= chr [1:2] "zinc.pred" "zinc.var"

Code

pred <- terra::rasterize(idw_interp, grd, field = "zinc.pred", fun = "mean")
idw_lin_mask <- crop(pred, r, mask=TRUE)
comb_rst <- c(pred, idw_lin_mask)
plot(comb_rst)

Now that we’ve got the basics of IDW syntax figured out, we might want to explore how changing the inverse-distance power (idp) affects the resulting predictions of our dataset. To do that, I build a custom function that accepts the necessary arguments for each of the above steps. I finish with a call to ggplot using tidyterra because I want to use patchwork to create a nice multipanel plot of the surfaces that result from the different idp values. I map through a series of values to create maps of different idp values.

Code

idw_idp_explore <- function(variable, nmx, pwr, loc_sf, grid_sf, grid_rst, templ_rst){
  ## Need to convert the character variable name into something recognizeable as an object
  dv <- sym(variable)
  ## Set up the idw formula to accept different arguments
  idw_formula <- gstat(id = variable, formula = eval(dv)~1, locations = loc_sf, 
            nmax=nmx, set=list(idp = pwr))
  ## Generate spatial predictions as sf object
  pred_df <- predict(idw_formula, grid_sf)
  ## Identify the column with the prediction
  fld <- str_subset(colnames(pred_df),pattern = "pred")
  ## Make and mask the raster
  pred_rst <- terra::rasterize(pred_df, grid_rst, field = fld, fun = "mean")
  msk_rst <- terra::crop(pred_rst, templ_rst, mask = TRUE)
  ## Make a plot
  ggplot() +
  geom_spatraster(data = msk_rst, 
                  mapping = aes(fill = mean)) +
  scale_fill_viridis_c(na.value = "transparent") +
  ggtitle(paste0("Interp Zinc Concentration (IDW, power = ", pwr, ")")) +
  theme_bw()

}

idp_plots <- map(c(0.5, 2, 4, 8), 
                 \(x) idw_idp_explore(variable = "zinc", nmx = 15, 
                                      pwr = x, loc_sf = meuse_sf, 
                                      grid_sf = grid_sf, grid_rst = grd,
                                      templ_rst = r))

[inverse distance weighted interpolation]

<SpatRaster> resampled to 501264 cells.

[inverse distance weighted interpolation]

<SpatRaster> resampled to 501264 cells.

[inverse distance weighted interpolation]

<SpatRaster> resampled to 501264 cells.

[inverse distance weighted interpolation]

<SpatRaster> resampled to 501264 cells.

Then, I use patchwork to compose the resulting plots by accessing each of the plots in our idp_plots list.

Code

(idp_plots[[1]] + idp_plots[[2]])/(idp_plots[[3]] + idp_plots[[4]])

Thin Plate Splines

Thin plate splines (TPS) fit a smooth surface through scattered data in 2 (or more) dimensions by attempting to minimize the Residual Sum of Squares. The goal is to fit the data well while minimizing the total “bending energy” (i.e., surface roughness) of the surface. The \(\lambda\) parameter controls the smoothness such that \(\lambda=0\) produces an exact interpolation where the surface fits each data point exactly. Higher values of \(\lambda\) increase the smoothness of the surface, but potentially provide a poor fit to the existing data.

The fields package implements TPS for spatial data and works direcly with terra. However, because of this, we have to convert our data into a format that will work with terra. In this case, that means converting our spatial coordinates to a matrix (using st_coordinates) and our variable of interest into a vector. Once we’ve done that, we can call fields::Tps.

Code

coords <- st_coordinates(meuse_sf)
z <- meuse_sf$zinc


tps_model_cv <- fields::Tps(coords, z)
tps_model_cv

Call:
fields::Tps(x = coords, Y = z)
                                                 
 Number of Observations:                164      
 Number of parameters in the null space 3        
 Parameters for fixed spatial drift     3        
 Model degrees of freedom:              68.8     
 Residual degrees of freedom:           95.2     
 GCV estimate for tau:                  186.8    
 MLE for tau:                           178.5    
 MLE for sigma:                         174700000
 lambda                                 0.00018  
 User supplied sigma                    NA       
 User supplied tau^2                    NA       
Summary of estimates: 
                 lambda      trA      GCV   tauHat -lnLike Prof converge
GCV        0.0001825072 68.84128 60141.03 186.8047     1137.245       10
GCV.model            NA       NA       NA       NA           NA       NA
GCV.one    0.0001825072 68.84128 60141.03 186.8047           NA       10
RMSE                 NA       NA       NA       NA           NA       NA
pure error           NA       NA       NA       NA           NA       NA
REML       0.0003770842 52.49955 60875.86 203.4412     1135.935        4

By default, fields::Tps estimates \(\lambda\) using a generalized cross-validation approach in order to balance the fit to the data with the degrees of freedom (number of knots) used by the spline. You can alter this approach by changing the method argument (see ?Tps for the details). The nice part about using fields::Tps is that once the model has been fit, it can be used to make predictions very quickly even for large data. We do that using the terra::interpolate function and then mask our predictions again.

Code

tps_cv_rast <- interpolate(grd, tps_model_cv)
tps_cv_crop <- crop(tps_cv_rast, r, mask=TRUE)

We’ll explore a couple of different \(\lambda\) values and an alternative way of selecting \(\lambda\) to demonstrate the effect of these different options on the resulting surface. We could write another function here, but I’m changing different things and passing different arguments so I’ll leave each call so that you can see what I’m changing.

Code

tps_model_03 <- fields::Tps(coords, z, lambda = tps_model_cv$lambda*2)
tps_03_rast <- interpolate(grd, tps_model_03)
tps_03_crop <- crop(tps_03_rast, r, mask=TRUE)


tps_model_00001 <- fields::Tps(coords, z, lambda = tps_model_cv$lambda/2)
tps_00001_rast <- interpolate(grd, tps_model_00001)
tps_00001_crop <- crop(tps_00001_rast, r, mask=TRUE)

tps_model_cv_reml <- fields::Tps(coords, z, method = "REML")
tps_cvreml_rast <- interpolate(grd, tps_model_cv_reml)
tps_cvreml_crop <- crop(tps_cvreml_rast, r, mask=TRUE)

Code

tps_def <- ggplot() +
  geom_spatraster(data = tps_cv_crop, 
                  mapping = aes(fill = lyr.1)) +
  scale_fill_viridis_c(na.value = "transparent") +
  ggtitle("Interp Zinc Concentration (TPS, lambda = default)") +
  theme_bw()

<SpatRaster> resampled to 501264 cells.

Code

tps_double <- ggplot() +
  geom_spatraster(data = tps_03_crop, 
                  mapping = aes(fill = lyr.1)) +
  scale_fill_viridis_c(na.value = "transparent") +
  ggtitle("Interp Zinc Concentration (TPS, lambda = 0.0004)") +
  theme_bw()

<SpatRaster> resampled to 501264 cells.

Code

tps_half <- ggplot() +
  geom_spatraster(data = tps_00001_crop, 
                  mapping = aes(fill = lyr.1)) +
  scale_fill_viridis_c(na.value = "transparent") +
  ggtitle("Interp Zinc Concentration (TPS, lambda = 0.000009)") +
  theme_bw()

<SpatRaster> resampled to 501264 cells.

Code

tps_reml <- ggplot() +
  geom_spatraster(data = tps_cvreml_crop, 
                  mapping = aes(fill = lyr.1)) +
  scale_fill_viridis_c(na.value = "transparent") +
  ggtitle("Interp Zinc Concentration (TPS, lambda = REML)") +
  theme_bw()

<SpatRaster> resampled to 501264 cells.

Code

(tps_def + tps_double)/(tps_half + tps_reml)

You can see that different values of at \(\lambda\) smooth the surface differently (with the higher values allowing for more “bending” around the measured points giving higher values to the neighboring pixels than we’d get with lower \(\lambda\) values). You can also see (in the bottom-right plot) that using a different method for automating the selection of \(\lambda\) results in a different choice. While GCV is the default choice, it’s focus is to minimize the error values in a leave-one out cross-validation approach. The REML (Restricted Maximum Likelihood) approach tends to penalize overfitting of the data to find a parsimonious solution that explains most of the variation without adding unnecessary parameters. In practice, the REML approach tends to be more stable and is more likely to find the true minimum value for \(lambda\); however, it can be computationally costly and does not always converge on a solution.

Probabalistic approaches: Kriging

In the previous deterministic approaches, we’ve basically assumed that the function producing the measurements we have is stable throughout the landscape (i.e., homogeneous) and relied on that to assume that unmeasured locations should have values that are consistent with that rate and scaled by the distance between points (and potentially a smoothing parameter). These approaches are deterministic in that the measurements are fixed, the distances are known, and the smoothing parameter is set by you. They will generate the same prediction every time you run them so long as the smoothing parameter and the data you use remain the same.

We might imagine that we are uncertain whether the rate is homogeneous. Moreover, we might not be quite sure that the best prediction is simply a function of the distance between the points. In such cases, we might want to generate a probabilistic output (i.e., one with both predictions and variances.). Kriging allows us to do just that. It enables us to say that the mean value is unknown (and our measurements tell us something about what it should be), rather than assuming that the measurements are perfectly reflective of the underlying first-order process. It also allows us to use the empirical estimates of spatial covariance (i.e., how does variance among the points change as a function of distance) to estimate the role of distance in driving the process. We combine some “statistical machinery” to estimate these different elements and generate predictions along with the variance around those predictions. The key piece is to figure out the spatial covariance. We’ll do this using variograms.

Variograms

The gstat package has a variogram function that allows you to estimate the (semi-)variance between points as a function of distance. The semi-variance is just the “half” value of the variance because we are assuming that the function is isotropic (i.e., the same in both directions). We can look at the raw data first by setting the argument cloud = TRUE.

Code

zinc_vc <- variogram(log(zinc) ~ 1, meuse_sf, cloud = TRUE)
plot(zinc_vc)

Holy smokes!! That’s a lot of points!! We only have 164 measurements, so how do we get so many points? This calculation involves comparing the differences between all pairs of points. We can calculate this as: \[ Pairs = \frac{N \times (N-1)}{2} \] Which in our case gives 13,366 points. These cloud plots can be helpful for identifying outliers or strange trends in the data, but are usually too overwhelming to identify things like the nugget, sill, and range. To look at the binned version of the plot (where the variances are combined into a series of lag-distance bins), we can just run the variogram function with its default argument of cloud = FALSE. In both cases, you can alter the number of points considered in the variogram by specifying a cutoff value which will eliminate pairs beyond a certain distance.

Code

sample_vgm <- variogram(log(zinc) ~ 1, data = meuse_sf)
plot(sample_vgm)

Inspection of this plot is much easier. We can see that the nugget (the semivariance when distances are small) is probably somewhere around 0.2, the partial sill (the point at which semivariance no longer seems to be a function of distance) occurs between 0.5 and 0.7, and the range (the distance at which the sill occurs) is probably around 1000m. Now we’ve got to figure out which of our variogram models makes sense to approximate this relationship.

Code

show.vgms(par.strip.text = list(cex = 0.75))

This part is a bit more “art” than “science”, but that’s okay! The goal is to choose a functional form that does a reasonable job of approximating the relationship between distance and semivariance so that we can estimate the parameters necessary to input into our kriging model. Based on the shapes, we might choose a spherical (Sph) or Matern (Mat) shape. We can use the vgm function to generate the appropriate variograms based on our guesses above. These don’t have to be perfect, we will estimate the parameters in a subsequent step. For now, we are just trying to see if we like either of the shapes we’ve chosen once they are imposed on the data we actually have.

Code

sph_vgm_r900 <- vgm(psill = 0.5, model = "Sph",
                range = 900, nugget = 0.1)
plot(sample_vgm, sph_vgm_r900, cutoff = 1000, cex = 1.5)

Code

sph_vgm_Mat <- vgm(psill = 0.6, model = "Mat",
                range = 1100, nugget = 0.1)
plot(sample_vgm, sph_vgm_Mat, cutoff = 1000, cex = 1.5)

Based on these two plots, it doesn’t look like the Matern functional form generates enough “take off” in the values. That is, semivariance increases too slowly in the shorter distance lags. We’ll move forward with the spherical function and actually fit a model to estimate the parameters. When we use fit.variogram we provide the sample variogram as the object and then specify the model like we did in the previous step. We need to proivde the psill, nugget, and range as reasonable starting values to ensure that the empirical variogram can be estimated.

Code

fit_vgm <- fit.variogram(object = sample_vgm,
                    model = vgm(psill = 0.5, model = "Sph",
                                range = 900, nugget = 0.1))
fit_vgm

  model     psill    range
1   Nug 0.1229519   0.0000
2   Sph 0.5232435 880.9165

Code

plot(sample_vgm, fit_vgm, cex = 1.5)

The resuls show you what the estimate for the nugget (constant semivariance) model would be (0.12 which was pretty close to our guess) and then what the partial sill and range are under the spherical model (0.5 for the sill, and ~881 for the range). You can see that while our initial guesses were wrong, we didn’t do too badly!!

Kriging

Now that we’ve got our spatial covariance described and modelable, we can turn to generating predictions using kriging.

Ordinary Kriging

Ordinary kriging assumes that the (unknown) mean of the spatial process is constant and needs to be estimated. Once that estimate is obtained, we can adjust it based on the empirical variogram we built in the previous step. We use the gstat function to set up the necessary prediction formula, then the predict function to interpolate all of the values in our missing grid. Finally, we convert the data to a raster for plotting.

Code

ok_krig_formula <- gstat(id = "log.zinc",
                      formula = log(zinc) ~ 1,
                      data = meuse_sf,
                      model = fit_vgm)

krig_pred <- predict(ok_krig_formula, grid_sf)

[using ordinary kriging]

Code

pred_rast <- c(rasterize(vect(krig_pred), grd, field="log.zinc.pred"), rasterize(vect(krig_pred), grd, field="log.zinc.var"))
names(pred_rast) <- c("pred", "var")
pred_rast_mask <- crop(pred_rast, r, mask=TRUE)
plot(pred_rast_mask)

You can see from these plots that variance tends to be highest in places where our sample points are less dense and that the nature of the smoothing (more high values in the places where we have lots of high values and more low values in the places where our measurements were low) is different from those produced by deterministic models.

Universal kriging

Sometimes we might imagine that our unknown mean varies in space either as a function of location or of some other covariate. In those cases, we are estimating an inhomogeneous process as the first order mean can vary spatially. Universal kriging allows us to estimate the spatial trend in \(\mu\) and then apply our spatial covariance to generate interpolations based on the process and our spatial covariance. Here, we use another preloaded dataset of soil types to add a predictor to describe what we think might be driving the inhomogeneity in the process. The only other difference is that we modify our formula in the gstat call to reflect that we think soil is driving the spatial process.

Code

data(meuse.grid, package = "sp")
meuse_grid_sf <- st_as_sf(meuse.grid, coords = c("x", "y"), crs = 28992)


univ_krig <- gstat(id = "log.zinc",
  formula = log(zinc) ~ soil,      
  locations = meuse_sf,              
  model = fit_vgm
)

uk_pred <- predict(univ_krig, meuse_grid_sf)

[using universal kriging]

Code

ggplot(uk_pred) +
  geom_sf(aes(color = log.zinc.pred)) +
  scale_color_viridis_c() +
  theme_minimal() +
  labs(title = "Universal Kriging Prediction of Zinc",
       color = "Predicted Zn")

Code

ggplot(uk_pred) +
  geom_sf(aes(color = log.zinc.var)) +
  scale_color_viridis_c() +
  theme_minimal() +
  labs(title = "Universal Kriging Variance of Zinc",
       color = "Predicted Zn")

You can see in this plot the “chunky” nature of the soil type data creates some new patterns in the prediction, but the variance patterns remain roughly the same.