Meeting Your Data

Content for Wednesday, September 3, 2025

Now that you know a bit about the “why” and “how” of this course, it’s time to start actually working with data. We’ll spend a bit of time thinking about the ‘nature’ of data, the qualities of “good” data, and the responsibilities of data creators and users. We’ll also practice some generic approaches for getting your own data into R and accessing data via APIs and functions.

Readings

Setting the Stage

Geographies of conservation II: Technology, surveillance and conservation by algorithm by Adams (2019) provides a critical perspective on the role of new data-sensing technology in the environment.

The ethics of big data as a public good: Which public? Whose good? by Taylor (2016) highlights some of the difficulties that arise when “big data” is largely owned and created by private companies.

A Survey of Data Quality Requirements That Matter in ML Development Pipelines by Priestley et al. (2023) provides a practical discussion of the attributes of “good” data particularly in the context of applied machine learning. While a bit broader in focus, many of the 4 dimensions of data quality are directly relevant to the discussions in the other articles.

Data justice and biodiversity conservation by Pritchard et al. (2022) provides an accessible introduction to the concept of data justice and frameworks available for achieving it.

Technical Details

The “Wrangle” section of R for Data Science from Wickham (2016) provides the logic behind the tidyverse approach to data import and manipulation.

The “Data in R” section of Introduction to R by (Douglas et al. 2022) gives an important overview of the different data types in R and how they are represented as objects within the R environment.

Objectives

By the end of today, you should be able to:

  • Recognize the role that data selection and documentation plays in reproducible workflows

  • Summarize key debates surrounding the role of (spatial) data in solving environmental problems

  • Describe FAIR and CARE principles for data and their relationship to existing debates

  • Read data into your R environment.

  • Inspect the data and summarize it using tables and simple plots

Slides

The slides for today’s lesson are available online as an HTML file. Use the buttons below to open the slides either as an interactive website or as a static PDF (for printing or storing for later). You can also click in the slides below and navigate through them with your left and right arrow keys.

View all slides in new window Download PDF of all slides

References

Adams, W. M. 2019. Geographies of conservation II: Technology, surveillance and conservation by algorithm. Progress in Human Geography 43:337–350.
Douglas, A., D. Roos, A. Couto, F. Mancini, and D. Lusseau. 2022. An introduction to r. Bookdown.
Priestley, M., F. O’donnell, and E. Simperl. 2023. A survey of data quality requirements that matter in ML development pipelines. ACM Journal of Data and Information Quality 15:1–39.
Pritchard, R., L. A. Sauls, J. A. Oldekop, W. A. Kiwango, and D. Brockington. 2022. Data justice and biodiversity conservation. Conservation Biology 36:e13919.
Taylor, L. 2016. The ethics of big data as a public good: Which public? Whose good? Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 374:20160126.
Wickham, H., and G. Grolemund. 2016. R for data science: Import, tidy, transform, visualize, and model data. " O’Reilly Media, Inc.".