Data Manipulation with the tidyverse

Content for Monday, September 8, 2025

We’ve established some of the reasons why data reproducibility and transparency are important. We’ve even gotten to the point where you can bring data into R from a variety of sources. Now we’ve actually got to start wrangling it into the sorts of things we can use for analysis. For a variety of reasons, this course relies on the tidyverse family of packages and its extensions into the spatial realm. Today, we’ll talk about why and begin to develop some intuition for building an analysis using the functions contained in the tidyverse family of packages.

Readings

Setting the Stage

Good enough practices in scientific computing by Wilson et al. (2017). Provides some helpful guidance on organizing projects for people that aren’t necessarily computer scientists.

Pseudocode: what it is and how to write it - A nice blogpost by Sara Metawalli the sketches out the logic of pseudocode and why it can be helpful.

Technical Details

Data Tidying from R. for Data Sciencefrom Wickham (2016) provides the logic behind tidy data and how it works in the tidyverse.

Data Transformation from the same book introduces dplyr and the various functions for altering, summarizing, and reshaping data.

Joins introduces relational databases and a tidy approach for merging datasets. This one will prove to be important when we start working with spatial data.

Objectives

By the end of today you should be able to:

  • Explain the importance of readable code.

  • Articulate the structure of readable scripts

  • Utilize the verb, object, helper syntax of the tidyverse to modify your data in a reproducible way

Slides

The slides for today’s lesson are available online as an HTML file. Use the buttons below to open the slides either as an interactive website or as a static PDF (for printing or storing for later). You can also click in the slides below and navigate through them with your left and right arrow keys.

View all slides in new window Download PDF of all slides

References

Wickham, H., and G. Grolemund. 2016. R for data science: Import, tidy, transform, visualize, and model data. " O’Reilly Media, Inc.".
Wilson, J. A. C., Greg AND Bryan. 2017. Good enough practices in scientific computing. PLOS Computational Biology 13:1–20.