# load packages library(readr) library(dplyr) library(ggplot2)
Start by reading in the Porg data.
porg_data <- read_csv("https://itep-r.netlify.com/data/Porg_samples.csv")
Take a couple of minutes to get an overview of the data. Do you remember some of the functions to do that?
view(porg_data) glimpse(porg_data) summary(porg_data)
What columns are there? What kinds of questions do you have about the data after looking at it. (Hint, I see some biomonotoring concentrations and some response times. Maybe Corundum is neurotoxic to porgs? Ask Dorian to find out for sure.)
Now create a scatterplot showing the relationship between conncetration and rxn_time.
ggplot(data = porg_data, aes(x = Concentration, y = rxn_time)) + geom_point()
What type of cleaning does this data set need? Let’s take a look at the lowest and highest concentration levels and see if anything looks odd.
porg_data <- arrange(porg_data, Concentration) porg_data <- arrange(porg_data, desc(Concentration))
Hmmm. Looks like there’s some negative numbers. Is that possible for Corundum measurements? We ask our team of Bot measurement analysts and they say NO! So, let’s filter out the negative concentration values.
porg_data <- filter(porg_data, Concentration > 0)
Let’s look at the data set again. I see a units column. Are all the units the same? Looks like the Concentration units are in both ppb and ppm. Let’s add a column with consistent concentration units, and convert the ppm concentrations into ppb.
porg_data <- mutate(porg_data, conc_ppb = ifelse(Conc_units == "ppm", Concentration * 1000, Concentration))
Remember the rule of plotting data? Hint: Plot the data, Plot the data, Plot the data
So, let’s re-plot the data
ggplot(data = porg_data, aes(x = conc_ppb, y = rxn_time)) + geom_point()
Let’s calculate the mean concentration by island to determine if there might be Corundum concentration differences by island. Do you remeber which function to use here?
porg_island_summary <- group_by(porg_data, Island) %>% summarise(mean_corundum = mean(conc_ppb))
Maybe making a box plot would help answer this question more clearly?
ggplot(data = porg_data, aes(x = Island, y = conc_ppb)) + geom_boxplot()
Let’s fill the boxplot with a different color for each island to make it look better.
ggplot(data = porg_data, aes(x = Island, y = conc_ppb, fill = Island)) + geom_boxplot()
Let’s make an individual scatterplot for each island. Remember which function to use?
ggplot(data = porg_data, aes(x = conc_ppb, y = rxn_time)) + geom_point() + facet_wrap("Island")
Try an x-y scatter plot of conc_ppb and rxn_time and experiment with
geom_smooth(method = "loess")
ggplot(data = porg_data, aes(x = conc_ppb, y = rxn_time)) + geom_point() + geom_smooth(method = "loess") + facet_wrap(~Island)
Nice work Jedi Master! You are ready to tackle your own data set, maps, images, statistics, the universe!