Data visualisation

Bio300B Lecture 4

Richard J. Telford (Richard.Telford@uib.no)

Institutt for biovitenskap, UiB

22 September 2023

Data visualisation

  • A picture is worth a thousand words
  • Tell a story with figures
  • Avoid common mistakes

“reflect the data, tell a story, and look professional” Wilke

ggplot2

  • one of at three schemes for graphics in R
  • part of tidyverse

A system for ‘declaratively’ creating graphics, based on “The Grammar of Graphics”.

You provide the data, tell ‘ggplot2’ how to map variables to aesthetics, what graphical primitives to use, it takes care of the details.

ggplot in action

plot <- ggplot(data = penguins,     # Data
       mapping = aes(               # Aesthetics
         x = body_mass_g,    
         y = bill_length_mm, 
         colour = species)) +
  geom_point() +                    # Geometries
  scale_colour_brewer(palette = "Set1") + # scales
  labs(x = "Body mass, g",          # labels
       y = "Bill length mm", 
       colour = "Species") +
  theme_bw()                        # themes
                                    # Also facets
plot

ggplot in action

Data

Tibble or data frame with data to be plotted.

Tidy data

Can process data within ggplot but usually best to do it first

Can add data to the whole plot or to individual geoms

penguin_summary <- penguins |> group_by(species) |> summarise(body_mass_g = mean(body_mass_g, na.rm = TRUE), bill_length_mm = mean(bill_length_mm, na.rm = TRUE) )
ggplot(penguins, aes(x = body_mass_g, y = bill_length_mm, colour = species)) +
  geom_point() +
  geom_text(aes(label = species), data = penguin_summary, colour = "black")

Aesthetics

mapping specifies which variables in the data should be mapped onto which aesthetics with aes()

Each geom takes different aesthetics

Common aesthetics

  • x, y
  • fill, colour
  • shape
  • linetype
  • group

Setting vs mapping

Mapping in aes()

ggplot(penguins, 
       aes(x = flipper_length_mm, 
           fill = "blue")) +
geom_histogram()

Setting in the geom

ggplot(penguins, 
       aes(x = flipper_length_mm)) +
geom_histogram(fill = "blue")

geoms

Use different geoms for different plot types

Important geoms

  • geom_point()
  • geom_boxplot()
  • geom_histogram()
  • geom_smooth()
  • geom_line()
  • geom_text()

Many geoms, some in extra packages

Geoms to show distributions

base <- ggplot(penguins, aes(x = flipper_length_mm))
hist <- base + geom_histogram()
dens <- base + geom_density()

Geoms to show many distributions

base <- ggplot(penguins, aes(x = species, y = flipper_length_mm))

p_prange <- base + stat_summary(fun = "mean", geom = "col")
p_box <- base + geom_boxplot(aes(fill = species))
p_vio <- base + geom_violin(aes(fill = species))
p_jit <- base + geom_jitter(aes(colour = species))
library(ggbeeswarm)
p_quasi <- base + geom_quasirandom(aes(colour = species))

Boxplots can mislead

p <- datasauRus::box_plots |> 
  pivot_longer(everything()) |> 
  ggplot(aes(x = name, y = value))

p + geom_boxplot() +
p + geom_violin()

Show the raw data

top left panel shows mean + SE only, top right shows mean + SE togther with widely spread jittered raw data Bottom plots show the same with more data so SE are smaller

geoms for scatterplots

ggplot(penguins, aes(x = body_mass_g,  y = bill_length_mm, colour = species)) +
  geom_point() +
  geom_smooth(method = "lm")

Scales

Control how

  • variables are mapped onto the aesthetics
  • axes breaks

All called scale_aesthetic_description

  • scale_x_log()
  • scale_y_reverse()
  • scale_colour_viridis_c()
  • scale_shape_manual()

Labels

  • plot, axis and legend titles
ggplot(penguins, aes(x = body_mass_g, y = bill_length_mm, colour = species)) +
  geom_point() +
  labs(x = "Body mass g",
       y = "Bill length mm", 
       colour = "Species", 
       title = "Bill length against body mass ") 

Facets

Split data into separate panels.

plot + facet_wrap(facets = vars(species))

facet_grid() for two dimensional arrays of subplots

plot + facet_grid(rows = vars(species),
                  cols = vars(island)
                  )

Themes

Change how non-data elements of the plot look

Entire themes

Themes

Can also change individual elements

plot + theme(legend.position = "top")

Removing elements

plot + theme(panel.grid = element_blank())

Colour & fills

Colour deficient vision

den <- ggplot(penguins, aes(x = bill_length_mm, fill = species)) +
  geom_density(alpha = 0.7)
den
colorBlindness::cvdPlot(den)

#End rainbow

Better colour scale

den <- ggplot(penguins, aes(x = bill_length_mm, fill = species)) +
  geom_density(alpha = 0.7) +
  scale_fill_brewer(palette = "Set1")
den
colorBlindness::cvdPlot(den)

Using colour effectively

Choose an appropriate palette.

Qualitative palettes

RColorBrewer::display.brewer.all(type = "qual")

Sequential palettes

RColorBrewer::display.brewer.all(type = "seq")

Dividing palettes

RColorBrewer::display.brewer.all(type = "div")

Viridis

ggplot(penguins, aes(x = body_mass_g, y = flipper_length_mm)) +
  geom_point(aes(colour = flipper_length_mm)) +
  scale_colour_viridis_c()

Highlight

ggplot(penguins, aes(x = body_mass_g, y = flipper_length_mm)) +
  geom_point(colour = "red") +
  gghighlight::gghighlight(species == "Chinstrap")

Redundant encoding

ggplot(penguins, 
       aes(x = body_mass_g,
           y = flipper_length_mm,
           colour = species,
           shape = species)) +
  geom_point() 

Avoiding legends

library(directlabels)
direct.label(plot) 

Most common mistake in presentations

plot with very small labels

Solution

theme_bw(base_size = 18)

Summary

  • You can plot anything you can imagine
  • Whole ecosystem of packages to help
  • #tidytuesday for inspiration