dataset | count | mean_x | mean_y | sd_x | sd_y | correlation |
---|---|---|---|---|---|---|
away | 142 | 54.26610 | 47.83472 | 16.76983 | 26.93974 | -0.0641284 |
bullseye | 142 | 54.26873 | 47.83082 | 16.76924 | 26.93573 | -0.0685864 |
circle | 142 | 54.26732 | 47.83772 | 16.76001 | 26.93004 | -0.0683434 |
dino | 142 | 54.26327 | 47.83225 | 16.76514 | 26.93540 | -0.0644719 |
dots | 142 | 54.26030 | 47.83983 | 16.76774 | 26.93019 | -0.0603414 |
h_lines | 142 | 54.26144 | 47.83025 | 16.76590 | 26.93988 | -0.0617148 |
high_lines | 142 | 54.26881 | 47.83545 | 16.76670 | 26.94000 | -0.0685042 |
slant_down | 142 | 54.26785 | 47.83590 | 16.76676 | 26.93610 | -0.0689797 |
slant_up | 142 | 54.26588 | 47.83150 | 16.76885 | 26.93861 | -0.0686092 |
star | 142 | 54.26734 | 47.83955 | 16.76896 | 26.93027 | -0.0629611 |
v_lines | 142 | 54.26993 | 47.83699 | 16.76996 | 26.93768 | -0.0694456 |
wide_lines | 142 | 54.26692 | 47.83160 | 16.77000 | 26.93790 | -0.0665752 |
x_shape | 142 | 54.26015 | 47.83972 | 16.76996 | 26.93000 | -0.0655833 |
Gee whiz, R!
This is showing what can be done with R in quarto. This page is at this link but see the source here. Look at the functions.R file for the underlying functions, and run.R for doing this with base R. I also include bad.R to provide an example of what not to do. A slightly more advanced version of the good way would convert run.R
into a set of targets for use with the targets
package (Landau 2021), which lets you do more complex workflows. To run the R version, you can do Rscript run.R
from the command line or source('run.R')
in R; to do quarto, you can do quarto render index.qmd
from the command line or use the “Render” button in RStudio.
First, we are going to use the datasauRus
package (Davies, Locke, and D’Agostino McGowan 2022) to show the “Datasaurus Dozen” dataset. We can first investigate it with some basic summary stats with dplyr
(Wickham et al. 2023) and a plot using ggplot2
(Wickham 2016).
Showing summary information for the datasaurus_dozen
dataset, using knitr
(Xie 2023) to make it look nice:
Plot of data all together:
ggplot(datasaurus_dozen, aes(x=x, y=y)) + geom_point(alpha=0.2)
Looks all the same, right? Maybe if we plot the data by dataset, using facet_wrap
in ggplot2
Moral of the story: LOOK AT YOUR DATA. This finds all sorts of problems: having -99 instead of NA for missing data, having a single outlier that was measured in millimeters not meters, and so on.
We pull information from the Global Biodiversity Information Facility (GBIF) to get the locations of salamanders using rgbif
(Chamberlain et al. 2024), then leaflet
(Cheng, Karambelkar, and Xie 2023) to plot them. For simplicity, we only have special colors for the seven most commonly recorded species, but you can click on the “Other” points to see what they are, too.
Great scientists steal (with attribution – this is a modification of a quote allegedly said by Pablo Picasso). We can use datelife
(O’Meara et al. 2024) to get a phylogeny for the salamanders from published studies, using data from Open Tree of Life (McTavish et al. 2015).
<- get_datelife_tree(summarize_salamanders(salamander_data))
salamander_tree ::geoscalePhylo(salamander_tree, units="Period", cex.tip=1, cex.age=1, cex.ts=1) strap
iNaturalist stores lots of information from community scientists, including photos (and this information later flows into GBIF). We can include these photos. Here, we use research grade images of the Eastern Newt, Notophthalmus viridescens, using rinat
(Barve and Hart 2022).
Here’s one image. It’s from the United States on 2024-02-10 by Jorge Aguilera.
Here’s another image. It’s from the Polk County, TN, USA on 2024-02-06 by Jared Gorrell.
And a third image. It’s from the Tennessee, US on 2024-02-03 by jaron sedlock.
Fourth image. It’s from the Tennessee, US on 2024-02-03 by jaron sedlock.