5 Installing packages

In this chapter, you will

learn how packages extend what R can do
install packages from CRAN and from GitHub
load packages so their functions/data are available
learn how to cite R and packages

5.1 Functions & Packages

Everything that does something in R is a function.

All functions arranged into packages.

Packages are stored in libraries.

Some R packages, for example, stats and utils, are automatically loaded when you start R. You can see the packages you have installed (and which are loaded) in the tab Packages (see Section 2.3.4.3).

There are also several recommended packages that are installed by default, for example the mgcv package for fitting generalised additive models.

5.2 Loading packages

If you want to use the functions in a package, you need to load the package with the function library().

library("mgcv")

5.3 Installing extra packages

In addition to the packages already installed, there are thousands of extra packages available for download that expand the computing possibilities of R by adding new functions, classes, documentation, data sets, etc.

The vast number of available packages is a little daunting, but many are very specialised, and won’t be useful for you. If you don’t know the name of the package you need for your analyses the task views can help. For example, the Environmetrics task view describes packages for the analysis of ecological and environmental data.

Many packages can be downloaded from CRAN (R’s homepage), and others can be installed from GitHub. The next sections show you how to install packages from these locations.

Every time you install a new package, R imports all necessary files into a local library, but does not activate it. You will have to remember to activate the new package with library() every time your project require items or functions from that package.

Books vs R packages
	Book	R Package
How to get	Order from your favourite bookshop	`install.packages("tidyr")`
When it is kept	Bookshelf/library	A Library
How to use it	Open book	`library(tidyr)`

5.4 Packages published on CRAN

There are lots of extra R packages available from CRAN.

You can install or update a package from CRAN with install.packages(). Simply type the name of the the package you want to install; autocomplete should add quotation marks " ".

install.packages("tidyr")

You only need to do this once, until you need to install a new version of a package, so you should run this directly in the console and not keep it in your script or markdown document (otherwise it will install the package every time you run the code which will be slow).

Once you have installed the package, you can use library() to load it. You need to do this every time use use it.

Rtools

If you are using a Windows computer, when you install a package you may get the warning

WARNING: Rtools is required to build R packages but is not currently installed.

Rtools is required to install packages from their source code when they have code (e.g. C or Fortran) that needs compiling. If you are installing packages with install.packages(), you are, by default, installing binary files from CRAN that just need to be unzipped, and so Rtools is not required.

5.5 Packages published on GitHub

Not all R packages are available on CRAN. Some packages are only available on GitHub.com. If you want to install a package published on GitHub, you may use the function remotes::install_github() (install remotes from CRAN first). You need the repo name from GitHub, this given as "name_of_owner/name_of_repo" Here also, you must add quotation marks " ".

install.packages("remotes")
# ggvegan for plotting ordinations is only on github
remotes::install_github(repo = "gavinsimpson/ggvegan")

5.6 Recommended packages

These packages are ones we recommend you install now:

usethis
remotes
tidyverse
palmerpenguins
biostats.tutorials

These packages will be used throughout the website. There are other packages we will install later.

5.6.1 `usethis` and `remotes`

These are utility packages that we will use to download course material and packages from GitHub.

5.6.2 The tidyverse

The tidyverse is an amazing toolbox which contains a growing, evolving collection of R packages for data science. The packages are developed on the same philosophy and are fully compatible with each other. Some of the tidyverse packages help you import data (readr), some let you make figures (ggplot2), others help you rearrange your data set (dplyr, tidyr), and much more. Have a look at the tidyverse webpages to explore the collection.

Install and activate the tidyverse with the following code:

install.packages("tidyverse")
library(tidyverse)

This will load the core tidyverse packages, which include readr, dplyr, and ggplot2 so you don’t need to load these separately.

5.6.3 palmerpenguins

palmerpenguins is a package that provides you with two real data sets. They contain measurements from three penguin species found in the Palmer archipelago, Antarctica. Several variables such as species, island, and body mass are included for over three hundred penguins. These data sets will be used in the upcoming sections of this website.

Install and activate the package with this code:

install.packages("palmerpenguins")
library(palmerpenguins)

Type this following line in the console if you want to know more about palmerpenguins:

citation("palmerpenguins")

5.6.4 biostats.tutorials

Along with the present website, our team has developed a package with tutorials called biostats.tutorials. This package will help you learn and practice R functions and concepts. Several of the upcoming chapters refer to this package. Install biostats.tutorials with this code:

# install.packages("remotes")
remotes::install_github("biostats-r/biostats.tutorials")

Now load the package with library(biostats.tutorials) and the tutorials should appear in the tab Tutorial (see Section 2.3.5.2). Exercises will tell you when to run a tutorial. Click the “Start Tutorial” button to start the tutorial (this may take a few seconds to start the first time).

Exercise

Install tidyverse, palmerpenguins and remotes from CRAN and biostats.tutorials from GitHub using the code above.

5.7 Debugging failed package installation

Sometimes packages fail to install properly. This can be frustrating and difficult to debug.

Some recommendations

Read the error message. It might tell you exactly what you need to do.
Check exactly which package won’t install. It may be a dependency of the package you want that has problems installing. Try to install the package or dependency again and pay attention to any error message.
Restart R (in Session menu in RStudio) and try again.
Find your user library with .libPaths() and in the file manager delete the package’s directory.
Google any error message. Someone else may have had the same problem.
Try installing the package directly in R (i.e not using RStudio)

5.8 Name conflicts

5.8.1 The problem

Sometimes two packages have functions with the same names. For example, both MASS and dplyr have a select function which does completely different things. If both packages are loaded at the same time there is a conflict and the function that was loaded last takes priority. This can cause big problems, with difficult to interpret error messages.

library(palmerpenguins) # load data
library(dplyr)
library(MASS) # R will report that select is being masked


Attaching package: 'MASS'

The following object is masked from 'package:dplyr':

    select

penguins |> select(species)

Error in select(penguins, species): unused argument (species)

If you have code that worked one day and fails the next with a weird error messages, it might be because of a name conflict. If you start typing a function name into RStudio, it will show which package the function comes from.

There are three solutions.

5.8.2 Loading order

Be very careful about the order in which packages are loaded. If the example above had loaded MASS before dplyr the select function in MASS would have been masked and the code would have worked. This solution is very fragile as it is easy to load packages in the wrong order.

5.8.3 `package::function`

Use the package::function notation to specify which package a function comes from. This is safe and can make code easier to understand by explicitly showing which packages the functions are from. The code above could be written safely as

penguins |> dplyr::select(species)

# A tibble: 344 × 1
  species
  <fct>  
1 Adelie 
2 Adelie 
3 Adelie 
# ℹ 341 more rows

This gets tedious fairly quickly, so is best used with packages that you only need a few functions from once or twice, not many functions you need repeatedly.

5.8.4 `conflicted` package

The safest solution is to use the conflicted package. The conflicted package converts any conflicts between packages into errors. This might seem like a bad idea, but it is much easier to diagnose an error from conflicted than the weird error of a masked function.

library(dplyr)
library(MASS)
library(conflicted)

penguins |> select(Species)

Error:
! [conflicted] select found in 2 packages.
Either pick the one you want with `::`:
• MASS::select
• dplyr::select
Or declare a preference with `conflicts_prefer()`:
• `conflicts_prefer(MASS::select)`
• `conflicts_prefer(dplyr::select)`

As the error message suggests, we can resolve the error either by using the package::function notation, or use the function conflict_prefer to say which function we want to use by default.

library(dplyr)
library(MASS)
library(conflicted)
conflict_prefer("select", "dplyr")

[conflicted] Will prefer dplyr::select over any other package.

penguins |> select(species)

# A tibble: 344 × 1
  species
  <fct>  
1 Adelie 
2 Adelie 
3 Adelie 
# ℹ 341 more rows

If there are many functions from a package that have conflicts, you can use conflict_prefer_all().

conflict_prefer_all("dplyr", quiet = TRUE)

5.9 Packages change over time

R packages often get updated. This is good as functions get added or improved and bugs get fixed. However, it also means that code written last year might give the same result (or even not work at all) next year with all the latest packages. This is a big problem for reproducibility.

The solution is to make sure you re-run your code with the same packages. That is not easy to do by hand. The renv package keeps track of all the packages you are using (and all the packages they depend on). I use renv for my analyses.

The workflow when working with renv is:

Call renv::init() to initialise a private R library for the project
Work in the project as normal, installing R packages as needed in the project
Call renv::snapshot() to save the state of the project library
Continue working on your project, installing and updating R packages as needed. Use renv::install() to install packages from CRAN or GitHub.
If the changes were successful, call renv::snapshot() again. If the updated packages introduced problems, call renv::restore() to revert to the previous state of the library.

5.10 Writing your own function/packages

If you find that you need to run the same code several times, it can be useful to write a function.

To make a function, you need to use the reserved word function followed by brackets with zero or more arguments. After the brackets, braces encompass the body of the function.

Here is a function that multiples two numbers together.

mutliply <- function(x, y = 1) { # The default value of y is 1
  x * y
}

multiply(x = 6, y = 7)

Once you have written a function, it can be useful to make your own package. This makes it easy to use in your own analysis and easy to share with other users. Information on how to make a package using the usethis and devtools packages can be found in the package writing book.

5.11 Citing packages

When you use a package it is important to cite it in a manuscript or thesis both to acknowledge the author’s work in making the package and to increase reproducibility. The correct citation can be seen with the function citation.

citation("nlme")

To cite package 'nlme' in publications use:

  Pinheiro J, Bates D, R Core Team (2024). _nlme: Linear and Nonlinear
  Mixed Effects Models_. R package version 3.1-165,
  <https://CRAN.R-project.org/package=nlme>.

  Pinheiro JC, Bates DM (2000). _Mixed-Effects Models in S and S-PLUS_.
  Springer, New York. doi:10.1007/b98882
  <https://doi.org/10.1007/b98882>.

To see these entries in BibTeX format, use 'print(<citation>,
bibtex=TRUE)', 'toBibtex(.)', or set
'options(citation.bibtex.max=999)'.

It is also important to cite the version used.

packageVersion("nlme")

[1] '3.1.165'

You should also cite R. Again, the citation function can be used

citation()

To cite R in publications use:

  R Core Team (2024). _R: A Language and Environment for Statistical
  Computing_. R Foundation for Statistical Computing, Vienna, Austria.
  <https://www.R-project.org/>.

A BibTeX entry for LaTeX users is

  @Manual{,
    title = {R: A Language and Environment for Statistical Computing},
    author = {{R Core Team}},
    organization = {R Foundation for Statistical Computing},
    address = {Vienna, Austria},
    year = {2024},
    url = {https://www.R-project.org/},
  }

We have invested a lot of time and effort in creating R, please cite it
when using it for data analysis. See also 'citation("pkgname")' for
citing R packages.

The R version can be obtained with R.version.string. This is a variable not a function so it does not take brackets.

R.version.string

[1] "R version 4.4.2 (2024-10-31)"

In quarto you can also use the insert citation tool to add a package to your bibliography.

Quiz

Question 1: You have installed the tidyverse package. Which command do you need to use to activate it?

Exercise

Check your computer is ready for the next chapter with the checker package. This will check that your computer has the correct versions of R and RStudio installed, the required packages, and that the recommended RStudio options are set.

First install the checker package.

install.packages("checker")

Now run

checker::chk_requirements(
  "https://raw.githubusercontent.com/biostats-r/biostats/main/checker/basic.yaml"
)

Please address any issues checker finds before continuing to the next chapter.

Contributors

Jonathan Soulé
Richard Telford

# Installing packages {#sec-installing-packages} ```{r setup, include=FALSE} source("R/setup.R") ``` ::: callout-note ## In this chapter, you will - learn how packages extend what R can do - install packages from CRAN and from GitHub - load packages so their functions/data are available - learn how to cite R and packages ::: ## Functions & Packages Everything that does something in R is a **function**. All functions arranged into **packages**. Packages are stored in **libraries**. Some R packages, for example, `stats` and `utils`, are automatically loaded when you start R. You can see the packages you have installed (and which are loaded) in the tab `Packages` (see @sec-packages-tab). There are also several recommended packages that are installed by default, for example the `mgcv` package for fitting generalised additive models. ## Loading packages If you want to use the functions in a package, you need to load the package with the function `library()`. ```{r} #| eval: false library("mgcv") ``` ## Installing extra packages In addition to the packages already installed, there are thousands of extra packages available for download that expand the computing possibilities of R by adding new functions, classes, documentation, data sets, etc. The vast number of available packages is a little daunting, but many are very specialised, and won't be useful for you. If you don't know the name of the package you need for your analyses the [task views](https://cran.r-project.org/web/views/) can help. For example, the [Environmetrics](https://cran.r-project.org/web/views/Environmetrics.html) task view describes packages for the analysis of ecological and environmental data. Many packages can be downloaded from CRAN (R's homepage), and others can be installed from GitHub. The next sections show you how to install packages from these locations. Every time you install a new package, R imports all necessary files into a local *library*, but does not activate it. You will have to remember to activate the new package with `library()` every time your project require items or functions from that package. | | Book | R Package | |-----------------|------------------------------------|-------------------------------| | How to get | Order from your favourite bookshop | `install.packages("tidyr")` | | When it is kept | Bookshelf/library | A Library | | How to use it | Open book | `library(tidyr)` | : Books vs R packages ## Packages published on CRAN There are lots of extra R packages available from [CRAN](https://cran.r-project.org/){target="_blank"}. You can install or update a package from CRAN with `install.packages()`. Simply type the name of the the package you want to install; autocomplete should add quotation marks `"` `"`. ```{r install-CRAN, eval = FALSE} install.packages("tidyr") ``` You only need to do this once, until you need to install a new version of a package, so you should run this directly in the console and not keep it in your script or markdown document (otherwise it will install the package every time you run the code which will be slow). Once you have installed the package, you can use `library()` to load it. You need to do this every time use use it. ::: callout-tip ## Rtools If you are using a Windows computer, when you install a package you may get the warning > WARNING: Rtools is required to build R packages but is not currently installed. Rtools is required to install packages from their source code when they have code (e.g. C or Fortran) that needs compiling. If you are installing packages with `install.packages()`, you are, by default, installing binary files from CRAN that just need to be unzipped, and so Rtools is not required. ::: ## Packages published on GitHub Not all R packages are available on CRAN. Some packages are only available on [GitHub.com](github.com). If you want to install a package published on [GitHub](https://github.com/){target="_blank"}, you may use the function `remotes::install_github()` (install `remotes` from CRAN first). You need the repo name from GitHub, this given as `"name_of_owner/name_of_repo"` Here also, you must add quotation marks `"` `"`. ```{r install-github, eval = FALSE} install.packages("remotes") # ggvegan for plotting ordinations is only on github remotes::install_github(repo = "gavinsimpson/ggvegan") ``` ## Recommended packages These packages are ones we recommend you install now: - `usethis` - `remotes` - `tidyverse` - `palmerpenguins` - `biostats.tutorials` These packages will be used throughout the website. There are other packages we will install later. ### `usethis` and `remotes` These are utility packages that we will use to download course material and packages from GitHub. ### The tidyverse {#sec-tidyverse} The tidyverse is an amazing toolbox which contains a growing, evolving collection of R packages for data science. The packages are developed on the same philosophy and are fully compatible with each other. Some of the tidyverse packages help you import data (`readr`), some let you make figures (`ggplot2`), others help you rearrange your data set (`dplyr`, `tidyr`), and much more. Have a look at [the tidyverse webpages](https://www.tidyverse.org/){target="_blank"} to explore the collection. Install and activate the tidyverse with the following code: ```{r recommended-tidyverse, echo=TRUE, eval=FALSE} install.packages("tidyverse") library(tidyverse) ``` This will load the core tidyverse packages, which include `readr`, `dplyr`, and `ggplot2` so you don't need to load these separately. ### palmerpenguins `palmerpenguins` is a package that provides you with two *real* data sets. They contain measurements from three penguin species found in the Palmer archipelago, Antarctica. Several variables such as species, island, and body mass are included for over three hundred penguins. These data sets will be used in the upcoming sections of this website. Install and activate the package with this code: ```{r install-the-penguins, echo=TRUE, eval=FALSE} install.packages("palmerpenguins") library(palmerpenguins) ``` Type this following line in the console if you want to know more about `palmerpenguins`: ```{r citation-penguins, echo=TRUE, eval=FALSE} citation("palmerpenguins") ``` ### biostats.tutorials {#sec-biostatstutorials} Along with the present website, our team has developed a package with tutorials called `biostats.tutorials`. This package will help you learn and practice R functions and concepts. Several of the upcoming chapters refer to this package. Install `biostats.tutorials` with this code: ```{r install-the-tutorial, echo=TRUE, eval=FALSE} # install.packages("remotes") remotes::install_github("biostats-r/biostats.tutorials") ``` Now load the package with `library(biostats.tutorials)` and the tutorials should appear in the tab `Tutorial` (see @sec-tutorial-tab). Exercises will tell you when to run a tutorial. Click the "Start Tutorial" button to start the tutorial (this may take a few seconds to start the first time). ::: callout-note ## Exercise Install `tidyverse`, `palmerpenguins` and `remotes` from CRAN and `biostats.tutorials` from GitHub using the code above. ::: ## Debugging failed package installation Sometimes packages fail to install properly. This can be frustrating and difficult to debug. Some recommendations - **Read the error message.** It might tell you exactly what you need to do. - Check exactly which package won't install. It may be a dependency of the package you want that has problems installing. Try to install the package or dependency again and pay attention to any error message. - Restart R (in Session menu in RStudio) and try again. - Find your user library with `.libPaths()` and in the file manager delete the package's directory. - Google any error message. Someone else may have had the same problem. - Try installing the package directly in R (i.e not using RStudio) ## Name conflicts ### The problem ```{r, echo = FALSE} options(tibble.print_max = 2) ``` Sometimes two packages have functions with the same names. For example, both `MASS` and `dplyr` have a `select` function which does completely different things. If both packages are loaded at the same time there is a conflict and the function that was loaded last takes priority. This can cause big problems, with difficult to interpret error messages. ```{r, error = TRUE} library(palmerpenguins) # load data library(dplyr) library(MASS) # R will report that select is being masked penguins |> select(species) ``` If you have code that worked one day and fails the next with a weird error messages, it might be because of a name conflict. If you start typing a function name into RStudio, it will show which package the function comes from. There are three solutions. ### Loading order Be very careful about the order in which packages are loaded. If the example above had loaded `MASS` before `dplyr` the `select` function in `MASS` would have been masked and the code would have worked. This solution is very fragile as it is easy to load packages in the wrong order. ### `package::function` Use the `package::function` notation to specify which package a function comes from. This is safe and can make code easier to understand by explicitly showing which packages the functions are from. The code above could be written safely as ```{r, error = TRUE} penguins |> dplyr::select(species) ``` This gets tedious fairly quickly, so is best used with packages that you only need a few functions from once or twice, not many functions you need repeatedly. ### `conflicted` package The safest solution is to use the [`conflicted` package](https://conflicted.r-lib.org/). The `conflicted` package converts any conflicts between packages into errors. This might seem like a bad idea, but it is much easier to diagnose an error from `conflicted` than the weird error of a masked function. ```{r, error = TRUE} library(dplyr) library(MASS) library(conflicted) penguins |> select(Species) ``` As the error message suggests, we can resolve the error either by using the `package::function` notation, or use the function `conflict_prefer` to say which function we want to use by default. ```{r, error = TRUE} library(dplyr) library(MASS) library(conflicted) conflict_prefer("select", "dplyr") penguins |> select(species) ``` If there are many functions from a package that have conflicts, you can use `conflict_prefer_all()`. ```{r tidylog-conflicts, eval = FALSE} conflict_prefer_all("dplyr", quiet = TRUE) ``` ## Packages change over time R packages often get updated. This is good as functions get added or improved and bugs get fixed. However, it also means that code written last year might give the same result (or even not work at all) next year with all the latest packages. This is a big problem for reproducibility. The solution is to make sure you re-run your code with the same packages. That is not easy to do by hand. The [`renv` package](https://rstudio.github.io/renv/articles/renv.html) keeps track of all the packages you are using (and all the packages they depend on). I use `renv` for my analyses. The workflow when working with `renv` is: - Call `renv::init()` to initialise a private R library for the project - Work in the project as normal, installing R packages as needed in the project - Call `renv::snapshot()` to save the state of the project library - Continue working on your project, installing and updating R packages as needed. Use `renv::install()` to install packages from CRAN or GitHub. - If the changes were successful, call `renv::snapshot()` again. If the updated packages introduced problems, call `renv::restore()` to revert to the previous state of the library. ## Writing your own function/packages If you find that you need to run the same code several times, it can be useful to write a function. To make a function, you need to use the reserved word `function` followed by brackets with zero or more arguments. After the brackets, braces encompass the body of the function. Here is a function that multiples two numbers together. ```{r, eval = FALSE} mutliply <- function(x, y = 1) { # The default value of y is 1 x * y } multiply(x = 6, y = 7) ``` Once you have written a function, it can be useful to make your own package. This makes it easy to use in your own analysis and easy to share with other users. Information on how to make a package using the `usethis` and `devtools` packages can be found in the [package writing book](https://biostats-r.github.io/biostats/package/index.html). ## Citing packages When you use a package it is important to cite it in a manuscript or thesis both to acknowledge the author's work in making the package and to increase reproducibility. The correct citation can be seen with the function `citation`. ```{r citation} citation("nlme") ``` It is also important to cite the version used. ```{r version} packageVersion("nlme") ``` You should also cite R. Again, the `citation` function can be used ```{r} citation() ``` The R version can be obtained with `R.version.string`. This is a variable not a function so it does not take brackets. ```{r} R.version.string ``` In quarto you can also use the insert citation tool to add a package to your bibliography. ```{r} #| label: setup-quiz #| file: !expr here::here("Templates/webex.R") #| include: false ``` ::: {.callout-note} ## Quiz ```{r} #| label: quiz #| results: asis #| echo: false questions <- list( list( question = "You have installed the `tidyverse` package. Which command do you need to use to activate it?", choice = c( "`install.packages(\"tidyverse\")`", "`libraries(tidyverse)`", answer = "`library(tidyverse` `)`", answer = "`library(tidyverse)`", "`package(tidyverse)`", "`Library(tidyverse)`" ) ) ) print_multichoice(questions) ``` ::: ::: callout-note ## Exercise Check your computer is ready for the next chapter with the `checker` package. This will check that your computer has the correct versions of R and RStudio installed, the required packages, and that the recommended RStudio options are set. First install the `checker` package. ```{r} #| label: checker-1 #| eval: false install.packages("checker") ``` Now run ```{r} #| label: checker-2 #| eval: false checker::chk_requirements( "https://raw.githubusercontent.com/biostats-r/biostats/main/checker/basic.yaml" ) ``` Please address any issues `checker` finds before continuing to the next chapter. ::: ::: {.column-margin} ### Contributors {.unlisted .unnumbered} - Jonathan Soulé - Richard Telford :::

5.1 Functions & Packages

5.2 Loading packages

5.3 Installing extra packages

5.4 Packages published on CRAN

5.5 Packages published on GitHub

5.6 Recommended packages

5.6.1 usethis and remotes

5.6.2 The tidyverse

5.6.3 palmerpenguins

5.6.4 biostats.tutorials

5.7 Debugging failed package installation

5.8 Name conflicts

5.8.1 The problem

5.8.2 Loading order

5.8.3 package::function

5.8.4 conflicted package

5.9 Packages change over time

5.10 Writing your own function/packages

5.11 Citing packages

Contributors

5.6.1 `usethis` and `remotes`

5.8.3 `package::function`

5.8.4 `conflicted` package