tidyverse
HES 505 Fall 2025: Session 5
Documentation containing code (not vice versa!)
Direct connection between code and explanation
Convey meaning to humans rather than telling computer what to do!
Clarity
Automation
Minimal Documentation
If it’s for a computer, it’s great for a script. If it’s for a human, we need more.
An informal way of writing the ‘logic’ of your program
Balance between readability and precision
Avoid syntactic drift
Easier for you, future you, and others to read
Easier to get help
tidyverse
A group of task-specific packages built on shared grammar
Reduced dependencies on other packages
Consistent logic and shared style
Allows us to code “out loud”
dplyr
and a grammar for data transformationfunctions as verbs
functions work on and return data frames
first argument is always a data frame
subsequent arguments say what to do with it
select
: pick columns by namearrange
: reorder rowsslice
: pick rows using index(es)filter
: pick rows matching criteriadistinct
: filter for unique rowsmutate
: add new variablessummarise
: reduce variables to valuesgroup_by
: for grouped operationsCan be combined with select
to apply verbs to multiple columns
starts_with()
: Starts with a prefixcontains()
: Contains a literal stringone_of()
: Matches variable names in a character vectoreverything()
: Matches all variableslast_col()
: Select last variable, possibly with an offsetIn programming, a pipe is a technique for passing information from one process to another.
You can think about the following sequence of actions - find keys, unlock car, start car, drive to work, park.
Expressed as a set of nested functions in R pseudocode this would look like:
You can think about the following sequence of actions - find keys, unlock car, start car, drive to work, park.
Writing it out using pipes give it a more natural (and easier to read) structure:
tidyr
for cleaning and reshaping data
dplyr::xxxx_join
for combining data
Analysis = many functions, many scripts
Literate document (Quarto) focuses on communication
Integrates inputs and outputs alongside code
Focus is on research questions, decisions, and interpretation