Gee whiz, R!

Author

Brian O’Meara

This is showing what can be done with R in quarto. This page is at this link but see the source here. Look at the functions.R file for the underlying functions, and run.R for doing this with base R. I also include bad.R to provide an example of what not to do. A slightly more advanced version of the good way would convert run.R into a set of targets for use with the targets package (Landau 2021), which lets you do more complex workflows. To run the R version, you can do Rscript run.R from the command line or source('run.R') in R; to do quarto, you can do quarto render index.qmd from the command line or use the “Render” button in RStudio.

First, we are going to use the datasauRus package (Davies, Locke, and D’Agostino McGowan 2022) to show the “Datasaurus Dozen” dataset. We can first investigate it with some basic summary stats with dplyr (Wickham et al. 2023) and a plot using ggplot2 (Wickham 2016).

Showing summary information for the datasaurus_dozen dataset, using knitr (Xie 2023) to make it look nice:

dataset count mean_x mean_y sd_x sd_y correlation
away 142 54.26610 47.83472 16.76983 26.93974 -0.0641284
bullseye 142 54.26873 47.83082 16.76924 26.93573 -0.0685864
circle 142 54.26732 47.83772 16.76001 26.93004 -0.0683434
dino 142 54.26327 47.83225 16.76514 26.93540 -0.0644719
dots 142 54.26030 47.83983 16.76774 26.93019 -0.0603414
h_lines 142 54.26144 47.83025 16.76590 26.93988 -0.0617148
high_lines 142 54.26881 47.83545 16.76670 26.94000 -0.0685042
slant_down 142 54.26785 47.83590 16.76676 26.93610 -0.0689797
slant_up 142 54.26588 47.83150 16.76885 26.93861 -0.0686092
star 142 54.26734 47.83955 16.76896 26.93027 -0.0629611
v_lines 142 54.26993 47.83699 16.76996 26.93768 -0.0694456
wide_lines 142 54.26692 47.83160 16.77000 26.93790 -0.0665752
x_shape 142 54.26015 47.83972 16.76996 26.93000 -0.0655833

Plot of data all together:

ggplot(datasaurus_dozen, aes(x=x, y=y)) + geom_point(alpha=0.2)

Looks all the same, right? Maybe if we plot the data by dataset, using facet_wrap in ggplot2

Moral of the story: LOOK AT YOUR DATA. This finds all sorts of problems: having -99 instead of NA for missing data, having a single outlier that was measured in millimeters not meters, and so on.

We pull information from the Global Biodiversity Information Facility (GBIF) to get the locations of salamanders using rgbif (Chamberlain et al. 2024), then leaflet (Cheng, Karambelkar, and Xie 2023) to plot them. For simplicity, we only have special colors for the seven most commonly recorded species, but you can click on the “Other” points to see what they are, too.

Great scientists steal (with attribution – this is a modification of a quote allegedly said by Pablo Picasso). We can use datelife (O’Meara et al. 2024) to get a phylogeny for the salamanders from published studies, using data from Open Tree of Life (McTavish et al. 2015).

salamander_tree <- get_datelife_tree(summarize_salamanders(salamander_data))
strap::geoscalePhylo(salamander_tree, units="Period", cex.tip=1, cex.age=1, cex.ts=1)   

iNaturalist stores lots of information from community scientists, including photos (and this information later flows into GBIF). We can include these photos. Here, we use research grade images of the Eastern Newt, Notophthalmus viridescens, using rinat (Barve and Hart 2022).

Here’s one image. It’s from the United States on 2024-02-10 by Jorge Aguilera.

Image of an Eastern Newt from iNaturalist

Here’s another image. It’s from the Polk County, TN, USA on 2024-02-06 by Jared Gorrell.

Image of an Eastern Newt from iNaturalist

And a third image. It’s from the Tennessee, US on 2024-02-03 by jaron sedlock.

Image of an Eastern Newt from iNaturalist

Fourth image. It’s from the Tennessee, US on 2024-02-03 by jaron sedlock.

Image of an Eastern Newt from iNaturalist

Citations

Barve, Vijay, and Edmund Hart. 2022. “Rinat: Access ’iNaturalist’ Data Through APIs.” https://CRAN.R-project.org/package=rinat.
Chamberlain, Scott, Vijay Barve, Dan Mcglinn, Damiano Oldoni, Peter Desmet, Laurens Geffert, and Karthik Ram. 2024. “Rgbif: Interface to the Global Biodiversity Information Facility API.” https://CRAN.R-project.org/package=rgbif.
Cheng, Joe, Bhaskar Karambelkar, and Yihui Xie. 2023. “Leaflet: Create Interactive Web Maps with the JavaScript ’Leaflet’ Library.” https://CRAN.R-project.org/package=leaflet.
Davies, Rhian, Steph Locke, and Lucy D’Agostino McGowan. 2022. “datasauRus: Datasets from the Datasaurus Dozen.” https://CRAN.R-project.org/package=datasauRus.
Landau, William. 2021. “The Targets r Package: A Dynamic Make-Like Function-Oriented Pipeline Toolkit for Reproducibility and High-Performance Computing.” Journal of Open Source Software 6 (57): 2959. https://doi.org/10.21105/joss.02959.
McTavish, Emily Jane, Cody E. Hinchliff, James F. Allman, Joseph W. Brown, Karen A. Cranston, Mark T. Holder, Jonathan A. Rees, and Stephen A. Smith. 2015. “Phylesystem: A Git-Based Data Store for Community-Curated Phylogenetic Estimates.” Bioinformatics 31 (17): 2794–2800. https://doi.org/10.1093/bioinformatics/btv276.
O’Meara, Brian, Luna L. Sanchez-Reyes, Jonathan Eastman, Tracy Heath, April Wright, Klaus Schliep, Scott Chamberlain, et al. 2024. “Datelife: Scientific Data on Time of Lineage Divergence for Your Taxa.” https://doi.org/10.5281/zenodo.593938.
Wickham, Hadley. 2016. “Ggplot2: Elegant Graphics for Data Analysis.” https://ggplot2.tidyverse.org.
Wickham, Hadley, Romain François, Lionel Henry, Kirill Müller, and Davis Vaughan. 2023. “Dplyr: A Grammar of Data Manipulation.” https://CRAN.R-project.org/package=dplyr.
Xie, Yihui. 2023. “Knitr: A General-Purpose Package for Dynamic Report Generation in r.” https://yihui.org/knitr/.