Gee whiz, R!

Author

Brian O’Meara

This is showing what can be done with R in quarto. This page is at this link but see the source here. Look at the functions.R file for the underlying functions, and run.R for doing this with base R. I also include bad.R to provide an example of what not to do. A slightly more advanced version of the good way would convert run.R into a set of targets for use with the targets package (Landau 2021), which lets you do more complex workflows. To run the R version, you can do Rscript run.R from the command line or source('run.R') in R; to do quarto, you can do quarto render index.qmd from the command line or use the “Render” button in RStudio.

First, we are going to use the datasauRus package (Davies, Locke, and D’Agostino McGowan 2022) to show the “Datasaurus Dozen” dataset. We can first investigate it with some basic summary stats with dplyr (Wickham et al. 2023) and a plot using ggplot2 (Wickham 2016).

Showing summary information for the datasaurus_dozen dataset, using knitr (Xie 2023) to make it look nice:

dataset	count	mean_x	mean_y	sd_x	sd_y	correlation
away	142	54.26610	47.83472	16.76983	26.93974	-0.0641284
bullseye	142	54.26873	47.83082	16.76924	26.93573	-0.0685864
circle	142	54.26732	47.83772	16.76001	26.93004	-0.0683434
dino	142	54.26327	47.83225	16.76514	26.93540	-0.0644719
dots	142	54.26030	47.83983	16.76774	26.93019	-0.0603414
h_lines	142	54.26144	47.83025	16.76590	26.93988	-0.0617148
high_lines	142	54.26881	47.83545	16.76670	26.94000	-0.0685042
slant_down	142	54.26785	47.83590	16.76676	26.93610	-0.0689797
slant_up	142	54.26588	47.83150	16.76885	26.93861	-0.0686092
star	142	54.26734	47.83955	16.76896	26.93027	-0.0629611
v_lines	142	54.26993	47.83699	16.76996	26.93768	-0.0694456
wide_lines	142	54.26692	47.83160	16.77000	26.93790	-0.0665752
x_shape	142	54.26015	47.83972	16.76996	26.93000	-0.0655833

Plot of data all together:

ggplot(datasaurus_dozen, aes(x=x, y=y)) + geom_point(alpha=0.2)

Looks all the same, right? Maybe if we plot the data by dataset, using facet_wrap in ggplot2

Moral of the story: LOOK AT YOUR DATA. This finds all sorts of problems: having -99 instead of NA for missing data, having a single outlier that was measured in millimeters not meters, and so on.

We pull information from the Global Biodiversity Information Facility (GBIF) to get the locations of salamanders using rgbif (Chamberlain et al. 2024), then leaflet (Cheng, Karambelkar, and Xie 2023) to plot them. For simplicity, we only have special colors for the seven most commonly recorded species, but you can click on the “Other” points to see what they are, too.

Great scientists steal (with attribution – this is a modification of a quote allegedly said by Pablo Picasso). We can use datelife (O’Meara et al. 2024) to get a phylogeny for the salamanders from published studies, using data from Open Tree of Life (McTavish et al. 2015).

salamander_tree <- get_datelife_tree(summarize_salamanders(salamander_data))
strap::geoscalePhylo(salamander_tree, units="Period", cex.tip=1, cex.age=1, cex.ts=1)

iNaturalist stores lots of information from community scientists, including photos (and this information later flows into GBIF). We can include these photos. Here, we use research grade images of the Eastern Newt, Notophthalmus viridescens, using rinat (Barve and Hart 2022).

Here’s one image. It’s from the United States on 2024-02-10 by Jorge Aguilera.

Image of an Eastern Newt from iNaturalist

Here’s another image. It’s from the Polk County, TN, USA on 2024-02-06 by Jared Gorrell.

And a third image. It’s from the Tennessee, US on 2024-02-03 by jaron sedlock.

Fourth image. It’s from the Tennessee, US on 2024-02-03 by jaron sedlock.

Citations

Barve, Vijay, and Edmund Hart. 2022. “Rinat: Access ’iNaturalist’ Data Through APIs.” https://CRAN.R-project.org/package=rinat.

Chamberlain, Scott, Vijay Barve, Dan Mcglinn, Damiano Oldoni, Peter Desmet, Laurens Geffert, and Karthik Ram. 2024. “Rgbif: Interface to the Global Biodiversity Information Facility API.” https://CRAN.R-project.org/package=rgbif.

Cheng, Joe, Bhaskar Karambelkar, and Yihui Xie. 2023. “Leaflet: Create Interactive Web Maps with the JavaScript ’Leaflet’ Library.” https://CRAN.R-project.org/package=leaflet.

Davies, Rhian, Steph Locke, and Lucy D’Agostino McGowan. 2022. “datasauRus: Datasets from the Datasaurus Dozen.” https://CRAN.R-project.org/package=datasauRus.

Landau, William. 2021. “The Targets r Package: A Dynamic Make-Like Function-Oriented Pipeline Toolkit for Reproducibility and High-Performance Computing.” Journal of Open Source Software 6 (57): 2959. https://doi.org/10.21105/joss.02959.

McTavish, Emily Jane, Cody E. Hinchliff, James F. Allman, Joseph W. Brown, Karen A. Cranston, Mark T. Holder, Jonathan A. Rees, and Stephen A. Smith. 2015. “Phylesystem: A Git-Based Data Store for Community-Curated Phylogenetic Estimates.” Bioinformatics 31 (17): 2794–2800. https://doi.org/10.1093/bioinformatics/btv276.

O’Meara, Brian, Luna L. Sanchez-Reyes, Jonathan Eastman, Tracy Heath, April Wright, Klaus Schliep, Scott Chamberlain, et al. 2024. “Datelife: Scientific Data on Time of Lineage Divergence for Your Taxa.” https://doi.org/10.5281/zenodo.593938.

Wickham, Hadley. 2016. “Ggplot2: Elegant Graphics for Data Analysis.” https://ggplot2.tidyverse.org.

Wickham, Hadley, Romain François, Lionel Henry, Kirill Müller, and Davis Vaughan. 2023. “Dplyr: A Grammar of Data Manipulation.” https://CRAN.R-project.org/package=dplyr.

Xie, Yihui. 2023. “Knitr: A General-Purpose Package for Dynamic Report Generation in r.” https://yihui.org/knitr/.