Bio300B Lecture 4
Institutt for biovitenskap, UiB
8 September 2025
“reflect the data, tell a story, and look professional” Wilke
A system for ‘declaratively’ creating graphics, based on “The Grammar of Graphics”.
You provide the data, tell ‘ggplot2’ how to map variables to aesthetics, what graphical primitives to use, it takes care of the details.
plot <- ggplot(data = penguins, # Data
mapping = aes( # Aesthetics
x = body_mass_g,
y = bill_length_mm,
colour = species)) +
geom_point() + # Geometries
scale_colour_brewer(palette = "Set2") + # scales
labs(x = "Body mass, g", # labels
y = "Bill length mm",
colour = "Species") +
theme_bw() # themes
# Also facets
plot
Tibble or data frame with data to be plotted.
Tidy data
Can process data within ggplot
but usually best to do it first
Can add data to the whole plot or to individual geoms
penguin_summary <- penguins |> group_by(species) |> summarise(body_mass_g = mean(body_mass_g, na.rm = TRUE), bill_length_mm = mean(bill_length_mm, na.rm = TRUE) )
ggplot(penguins, aes(x = body_mass_g, y = bill_length_mm, colour = species)) +
geom_point() +
geom_text(aes(label = species), data = penguin_summary, colour = "black")
mapping
specifies which variables in the data should be mapped onto which aesthetics with aes()
Each geom takes different aesthetics
Common aesthetics
Use different geoms for different plot types
Important geoms
geom_point()
geom_boxplot()
geom_histogram()
geom_smooth()
geom_line()
geom_text()
Many geoms, some in extra packages
Count how many observations in each bin
Critical question - how many bins? Set with bins
argument
viewof bins = Inputs.range(
[ 1, 50 ],
{ label: "Number of bins", step: 1, value: 30 },
)
viewof measure2 = Inputs.select(
[ "flipper_length_mm", "bill_length_mm", "bill_depth_mm", "body_mass_g" ],
{ label: "Measure" }
);
viewof species = Inputs.select(
[ "Adelie", "Chinstrap", "Gentoo" ],
{ label: "Species" }
);
Smoothed histograms
adjust
argument adjusts bandwidth to control how smooth
base <- ggplot(penguins, aes(x = species, y = flipper_length_mm))
p_prange <- base + stat_summary(fun = "mean", geom = "col")
p_box <- base + geom_boxplot(aes(fill = species))
p_vio <- base + geom_violin(aes(fill = species))
p_jit <- base + geom_jitter(aes(colour = species))
library(ggbeeswarm)
p_quasi <- base + geom_quasirandom(aes(colour = species))
p_quasi2 <- base + geom_violin(aes(fill = species), alpha = 0.3) +
geom_quasirandom(aes(colour = species))
geom_line()
- join observations from left-rightgeom_path()
- join observations from first to last in dataControl how
All called scale_aesthetic_description
scale_x_log()
scale_y_reverse()
scale_colour_viridis_c()
scale_shape_manual()
Split data into separate panels.
facet_grid()
for two dimensional arrays of subplots
Change how non-data elements of the plot look
Entire themes
Can also change individual elements
Choose an appropriate palette.
Also colour
and linetype
/linewidth
Problem - points plot on top of each other.
Problem - too much data