12 Pipes

In this chapter, you will

learn how multiple function can be used to process data in succession
be introduced to the pipe |>

If we want to run several functions in turn, for example, if we wanted to find the mean bill length for each penguin species with group_by and summarise (you will find out more about these functions in Chapter 14). You could nest one function inside the other.

summarise(group_by(penguins, species), mean = mean(bill_length_mm, na.rm = TRUE))

# A tibble: 3 × 2
  species    mean
  <fct>     <dbl>
1 Adelie     38.8
2 Chinstrap  48.8
3 Gentoo     47.5

This sort of works, but with larger problems using more function, this solution becomes almost impossible to read and it is very easy to make mistakes and forget which brackets belong to which function.

Another strategy is to make and use intermediate objects

penguins_grouped <- group_by(penguins, species)
summarise(penguins_grouped, mean = mean(bill_length_mm, na.rm = TRUE))

# A tibble: 3 × 2
  species    mean
  <fct>     <dbl>
1 Adelie     38.8
2 Chinstrap  48.8
3 Gentoo     47.5

This works better, but can generate many intermediates and it can be difficult to ensure that the correct one is used.

A popular alternative is to use pipes. The R code for a pipe is |>. Pipes pass the results from one function directly into the next function.

penguins |> 
  group_by(species) |>
  summarise(mean = mean(bill_length_mm, na.rm = TRUE))

# A tibble: 3 × 2
  species    mean
  <fct>     <dbl>
1 Adelie     38.8
2 Chinstrap  48.8
3 Gentoo     47.5

The pipe basically means “and then”, so the above code can be read as “take the penguin data and then group by species and then summarise the mean bill length”.

You never need to use pipes, but they can make code more readable.

12.1 A recipe for mashed potato

Here is a recipe for mashed potato using pipes.

buy("potatoes", kg = "1") |> 
  peel() |> 
  boil(minutes = "15") |> 
  drain() |> 
  mash(add = list("salt", "milk", "butter")) |> 
  serve(decorate = "parsley")

This recipe for mashed potato can be read as buy 1kg potatoes, and then peel them, and then boil them, and so on.

12.2 The native R pipe `|>`

The pipe passes result of code on left of pipe to the function on right, and puts it in the first available argument.

f <- "file.csv"
read_csv(file = f)

and be rewritten as

f |>
 read_csv()

If you want to put the object passed through the pipe into the second argument, you need to name the first, so that it is not available. So if we want to pipe penguins into lm to fit a linear model, we need penguins to be put into the data argument, which is the second argument of lm. We can force this by naming the formula argument, so that data is the first available argument.

# un-named argument = fails
penguins |>
  lm(bill_length ~ species)

# named first argument, penguins pipes into second argument
penguins |>
  lm(formula = bill_length ~ species)

Or we can use the placeholder _

# named first argument, penguins pipes into second argument
penguins |>
  lm(bill_length ~ species, data = _) # R 4.2 and newer only

More complex arrangements, for example using the piped object multiple times, can be done by writing a function, or using the pipebind package. You probably won’t have to do this very often.

# Using a anonymous function
rnorm(10) |>
  (function(x){x - mean(x)})() # Result from rnorm is passed to the anonymous function as x

 [1] -1.97555714 -0.48218398  0.02953355  1.11727215  0.64242338 -0.66064742
 [7]  1.20709071  0.10471908  2.11306503 -2.09571536

# Using the \ shortcut for function
rnorm(10) |>
  (\(x){x - mean(x)})()

 [1]  0.78215898 -0.07098804 -0.06318266  1.20092523  1.34671048 -2.98568019
 [7]  0.81707960 -1.01027789  0.75633615 -0.77308166

# using pipebind
library(pipebind)
rnorm(10) |>
  bind(x, x - mean(x))

 [1] -0.8370634  0.3015608 -0.2920104  1.2031490  0.3592847 -1.1807594
 [7] -0.3339675 -0.1887989  1.9754446 -1.0068395

The magrittr pipe %>%

The |> pipe was introduced in R version 4.1. Previously, the magrittr package pipe %>% was widely used, especially with tidyverse functions. You will see the %>% in a lot of code on stackoverflow and other help sites. In most cases the old and new pipes work in exactly the same way. Advantages of the |> pipe are that it is

slightly faster
doesn’t need any packages loading
easier to debug

12.3 Making a pipe

You can make a pipe either by typing it directly, or by using the RStudio keyboard short-cut . You may need to set the RStudio options. Go to Tools > Global Options > Code and tick Use native pipe operator, |>. To make your code readable, put a line break after each pipe.

Exercise

Re-write the following code to use pipes, then check then result is the same.

adelie <- filter(penguins, species == "Adelie")
adelie_grouped <- group_by(adelie, sex)
adelie_summary <- summarise(adelie_grouped, mean_body_mass = mean(body_mass_g))

Contributors

Richard Telford

```{r setup, include=FALSE} source("R/setup.R") ``` # Pipes {#sec-pipes} ::: callout-note ## In this chapter, you will - learn how multiple function can be used to process data in succession - be introduced to the pipe `|>` ::: If we want to run several functions in turn, for example, if we wanted to find the mean bill length for each penguin species with `group_by` and `summarise` (you will find out more about these functions in @sec-working-with-single-tables-in-dplyr). You could nest one function inside the other. ```{r nested-function} summarise(group_by(penguins, species), mean = mean(bill_length_mm, na.rm = TRUE)) ``` This sort of works, but with larger problems using more function, this solution becomes almost impossible to read and it is very easy to make mistakes and forget which brackets belong to which function. Another strategy is to make and use intermediate objects ```{r intermediate-objects} penguins_grouped <- group_by(penguins, species) summarise(penguins_grouped, mean = mean(bill_length_mm, na.rm = TRUE)) ``` This works better, but can generate many intermediates and it can be difficult to ensure that the correct one is used. A popular alternative is to use pipes. The R code for a pipe is `|>`. Pipes pass the results from one function directly into the next function. ```{r pipes} penguins |> group_by(species) |> summarise(mean = mean(bill_length_mm, na.rm = TRUE)) ``` The pipe basically means "and then", so the above code can be read as "take the penguin data *and then* group by species *and then* summarise the mean bill length". You never need to use pipes, but they can make code more readable. ## A recipe for mashed potato Here is a recipe for mashed potato using pipes. ```{r potatoes, eval = FALSE} buy("potatoes", kg = "1") |> peel() |> boil(minutes = "15") |> drain() |> mash(add = list("salt", "milk", "butter")) |> serve(decorate = "parsley") ``` This recipe for mashed potato can be read as buy 1kg potatoes, *and then* peel them, *and then* boil them, and so on. ## The native R pipe `|>` The pipe passes result of code on left of pipe to the function on right, and puts it in the first available argument. So ```{r, eval = FALSE} f <- "file.csv" read_csv(file = f) ``` and be rewritten as ```{r, eval = FALSE} f |> read_csv() ``` If you want to put the object passed through the pipe into the second argument, you need to name the first, so that it is not available. So if we want to pipe `penguins` into `lm` to fit a linear model, we need penguins to be put into the `data` argument, which is the second argument of `lm`. We can force this by naming the `formula` argument, so that `data` is the first available argument. ```{r second-argument, eval = FALSE} # un-named argument = fails penguins |> lm(bill_length ~ species) # named first argument, penguins pipes into second argument penguins |> lm(formula = bill_length ~ species) ``` Or we can use the placeholder `_` ```{r placeholder-argument, eval = FALSE} # named first argument, penguins pipes into second argument penguins |> lm(bill_length ~ species, data = _) # R 4.2 and newer only ``` More complex arrangements, for example using the piped object multiple times, can be done by writing a function, or using the `pipebind` package. You probably won't have to do this very often. ```{r} #| label: pipebind #| eval: true # Using a anonymous function rnorm(10) |> (function(x){x - mean(x)})() # Result from rnorm is passed to the anonymous function as x # Using the \ shortcut for function rnorm(10) |> (\(x){x - mean(x)})() # using pipebind library(pipebind) rnorm(10) |> bind(x, x - mean(x)) ``` ::: callout-tip ## The magrittr pipe `%>%` The `|>` pipe was introduced in R version 4.1. Previously, the `magrittr` package pipe `%>%` was widely used, especially with tidyverse functions. You will see the `%>%` in a lot of code on [stackoverflow](https://stackoverflow.com/questions/tagged/r) and other help sites. In most cases the old and new pipes work in exactly the same way. Advantages of the `|>` pipe are that it is - slightly faster - doesn't need any packages loading - easier to debug ::: ## Making a pipe You can make a pipe either by typing it directly, or by using the RStudio keyboard short-cut {{< kbd mac=Shift-Command-M win=Control-Shift-M linux=Ctrl-Shift-M >}}. You may need to set the RStudio options. Go to `Tools` \> `Global Options` \> `Code` and tick `Use native pipe operator, |>`. To make your code readable, put a line break after each pipe. ::: callout-note ## Exercise Re-write the following code to use pipes, then check then result is the same. ```{r} #| label: ex-to-pipe #| eval: false adelie <- filter(penguins, species == "Adelie") adelie_grouped <- group_by(adelie, sex) adelie_summary <- summarise(adelie_grouped, mean_body_mass = mean(body_mass_g)) ``` ::: ::: {.column-margin} ### Contributors {.unlisted .unnumbered} - Richard Telford :::

12.1 A recipe for mashed potato

12.2 The native R pipe |>

12.3 Making a pipe

Contributors

12.2 The native R pipe `|>`