Data visualisation
Bio300B Lecture 4
Richard J. Telford (Richard.Telford@uib.no)
Institutt for biovitenskap, UiB
22 September 2023
Data visualisation
A picture is worth a thousand words
Tell a story with figures
Avoid common mistakes
“reflect the data, tell a story, and look professional” Wilke
ggplot2
one of at three schemes for graphics in R
part of tidyverse
A system for ‘declaratively’ creating graphics, based on “The Grammar of Graphics”.
You provide the data, tell ‘ggplot2’ how to map variables to aesthetics, what graphical primitives to use, it takes care of the details.
ggplot in action
plot <- ggplot (data = penguins, # Data
mapping = aes ( # Aesthetics
x = body_mass_g,
y = bill_length_mm,
colour = species)) +
geom_point () + # Geometries
scale_colour_brewer (palette = "Set1" ) + # scales
labs (x = "Body mass, g" , # labels
y = "Bill length mm" ,
colour = "Species" ) +
theme_bw () # themes
# Also facets
plot
Data
Tibble or data frame with data to be plotted.
Tidy data
Can process data within ggplot
but usually best to do it first
Can add data to the whole plot or to individual geoms
penguin_summary <- penguins |> group_by (species) |> summarise (body_mass_g = mean (body_mass_g, na.rm = TRUE ), bill_length_mm = mean (bill_length_mm, na.rm = TRUE ) )
ggplot (penguins, aes (x = body_mass_g, y = bill_length_mm, colour = species)) +
geom_point () +
geom_text (aes (label = species), data = penguin_summary, colour = "black" )
Aesthetics
mapping
specifies which variables in the data should be mapped onto which aesthetics with aes()
Each geom takes different aesthetics
Common aesthetics
x, y
fill, colour
shape
linetype
group
Setting vs mapping
Mapping in aes()
ggplot (penguins,
aes (x = flipper_length_mm,
fill = "blue" )) +
geom_histogram ()
Setting in the geom
ggplot (penguins,
aes (x = flipper_length_mm)) +
geom_histogram (fill = "blue" )
geoms
Use different geoms for different plot types
Important geoms
geom_point()
geom_boxplot()
geom_histogram()
geom_smooth()
geom_line()
geom_text()
Many geoms, some in extra packages
Geoms to show distributions
base <- ggplot (penguins, aes (x = flipper_length_mm))
hist <- base + geom_histogram ()
dens <- base + geom_density ()
Geoms to show many distributions
base <- ggplot (penguins, aes (x = species, y = flipper_length_mm))
p_prange <- base + stat_summary (fun = "mean" , geom = "col" )
p_box <- base + geom_boxplot (aes (fill = species))
p_vio <- base + geom_violin (aes (fill = species))
p_jit <- base + geom_jitter (aes (colour = species))
library (ggbeeswarm)
p_quasi <- base + geom_quasirandom (aes (colour = species))
Boxplots can mislead
p <- datasauRus:: box_plots |>
pivot_longer (everything ()) |>
ggplot (aes (x = name, y = value))
p + geom_boxplot () +
p + geom_violin ()
geoms for scatterplots
ggplot (penguins, aes (x = body_mass_g, y = bill_length_mm, colour = species)) +
geom_point () +
geom_smooth (method = "lm" )
Scales
Control how
variables are mapped onto the aesthetics
axes breaks
All called scale_aesthetic_description
scale_x_log()
scale_y_reverse()
scale_colour_viridis_c()
scale_shape_manual()
Labels
plot, axis and legend titles
ggplot (penguins, aes (x = body_mass_g, y = bill_length_mm, colour = species)) +
geom_point () +
labs (x = "Body mass g" ,
y = "Bill length mm" ,
colour = "Species" ,
title = "Bill length against body mass " )
Facets
Split data into separate panels.
plot + facet_wrap (facets = vars (species))
facet_grid()
for two dimensional arrays of subplots
plot + facet_grid (rows = vars (species),
cols = vars (island)
)
Themes
Change how non-data elements of the plot look
Entire themes
Themes
Can also change individual elements
plot + theme (legend.position = "top" )
Removing elements
plot + theme (panel.grid = element_blank ())
Colour & fills
Colour deficient vision
den <- ggplot (penguins, aes (x = bill_length_mm, fill = species)) +
geom_density (alpha = 0.7 )
den
colorBlindness:: cvdPlot (den)
#End rainbow
Better colour scale
den <- ggplot (penguins, aes (x = bill_length_mm, fill = species)) +
geom_density (alpha = 0.7 ) +
scale_fill_brewer (palette = "Set1" )
den
colorBlindness:: cvdPlot (den)
Using colour effectively
Choose an appropriate palette.
Qualitative palettes
RColorBrewer:: display.brewer.all (type = "qual" )
Sequential palettes
RColorBrewer:: display.brewer.all (type = "seq" )
Dividing palettes
RColorBrewer:: display.brewer.all (type = "div" )
Viridis
ggplot (penguins, aes (x = body_mass_g, y = flipper_length_mm)) +
geom_point (aes (colour = flipper_length_mm)) +
scale_colour_viridis_c ()
Highlight
ggplot (penguins, aes (x = body_mass_g, y = flipper_length_mm)) +
geom_point (colour = "red" ) +
gghighlight:: gghighlight (species == "Chinstrap" )
Redundant encoding
ggplot (penguins,
aes (x = body_mass_g,
y = flipper_length_mm,
colour = species,
shape = species)) +
geom_point ()
Avoiding legends
library (directlabels)
direct.label (plot)
Most common mistake in presentations
Summary
You can plot anything you can imagine
Whole ecosystem of packages to help
#tidytuesday for inspiration