5  Installing packages

In this chapter, you will
  • learn how packages extend what R can do
  • install packages from CRAN and from GitHub
  • load packages so their functions/data are available
  • learn how to cite R and packages

5.1 Functions & Packages

Everything that does something in R is a function.

All functions arranged into packages.

Packages are stored in libraries.

Some R packages, for example, stats and utils, are automatically loaded when you start R. You can see the packages you have installed (and which are loaded) in the tab Packages (see Section 2.3.4.3).

There are also several recommended packages that are installed by default, for example the mgcv package for fitting generalised additive models.

5.2 Loading packages

If you want to use the functions in a package, you need to load the package with the function library().

library("mgcv")

5.3 Installing extra packages

In addition to the packages already installed, there are thousands of extra packages available for download that expand the computing possibilities of R by adding new functions, classes, documentation, data sets, etc.

The vast number of available packages is a little daunting, but many are very specialised, and won’t be useful for you. If you don’t know the name of the package you need for your analyses the task views can help. For example, the Environmetrics task view describes packages for the analysis of ecological and environmental data.

Many packages can be downloaded from CRAN (R’s homepage), and others can be installed from GitHub. The next sections show you how to install packages from these locations.

Every time you install a new package, R imports all necessary files into a local library, but does not activate it. You will have to remember to activate the new package with library() every time your project require items or functions from that package.

Books vs R packages
Book R Package
How to get Order from your favourite bookshop install.packages("tidyr")
When it is kept Bookshelf/library A Library
How to use it Open book library(tidyr)

5.4 Packages published on CRAN

There are lots of extra R packages available from CRAN.

You can install or update a package from CRAN with install.packages(). Simply type the name of the the package you want to install and add quotation marks " ".

You only need to do this once, until you need to install a new version of a package, so you should run this directly in the console and not keep it in your script or markdown document (otherwise it will install the package every time you run the code which will be slow).

Once you have installed the package, you can use library() to load it. You need to do this every time use use it.

Rtools

If you are using a Windows computer, when you install a package you may get the warning

WARNING: Rtools is required to build R packages but is not currently installed.

Rtools is required to install packages from their source code when they have code (e.g. C or Fortran) that needs compiling. If you are installing packages with install.packages(), you are, by default, installing binary files from CRAN that just need to be unzipped, and so Rtools is not required.

5.5 Packages published on GitHub

Not all R packages are available on CRAN. Some packages are only available on GitHub.com. If you want to install a package published on GitHub, you may use the function remotes::install_github() (install remotes from CRAN first). You need the repo name from GitHub, this given as "name_of_owner/name_of_repo" Here also, you must add quotation marks " ".

install.packages("remotes")
#ggvegan for plotting ordinations is only on github
remotes::install_github(repo = "gavinsimpson/ggvegan")

5.7 Debugging failed package installation

Sometimes packages fail to install properly. This can be frustrating and difficult to debug.

Some recommendations

  • Check exactly which package won’t install. It may be a dependency of the package you want that has problems installing. Try to install the package or dependency again and pay attention to any error message.
  • Restart R (in Session menu in RStudio) and try again.
  • Find your user library with .libPaths() and in the file manager delete the packages directory.
  • Google any error message. Someone else may have had the same problem.
  • Try installing the package directly in R (i.e not using RStudio)

5.8 Name conflicts

5.8.1 The problem

Sometimes two packages have functions with the same names. For example, both MASS and dplyr have a select function which does completely different things. If both packages are loaded at the same time there is a conflict and the function that was loaded last takes priority. This can cause big problems, with difficult to interpret error messages.

library(palmerpenguins) # load data
library(dplyr)
library(MASS) # R will report that select is being masked

Attaching package: 'MASS'
The following object is masked from 'package:dplyr':

    select
penguins |> select(species)
Error in select(penguins, species): unused argument (species)

If you have code that worked one day and fails the next with a weird error messages, it might be because of a name conflict. If you start typing a function name into RStudio, it will show which package the function comes from.

There are three solutions.

5.8.2 Loading order

Be very careful about the order in which packages are loaded. If the example above had loaded MASS before dplyr the select function in MASS would have been masked and the code would have worked. This solution can is very fragile as it is easy to load packages in the wrong order.

5.8.3 package::function

Use the package::function notation to specify which package a function comes from. This is safe and can make code easier to understand by explicitly showing which packages the functions are from. The code above could be written safely as

penguins |> dplyr::select(species)
# A tibble: 344 × 1
  species
  <fct>  
1 Adelie 
2 Adelie 
3 Adelie 
# ℹ 341 more rows

This gets tedious fairly quickly, so is best used with packages that you only need a few functions from once or twice, not many functions you need repeatedly.

5.8.4 conflicted package

The safest solution is to use the conflicted package. The conflicted package converts any conflicts between packages into errors. This might seem like a bad idea, but it is much easier to diagnose an error from conflicted than the weird error of a masked function.

Error:
! [conflicted] select found in 2 packages.
Either pick the one you want with `::`:
• MASS::select
• dplyr::select
Or declare a preference with `conflicts_prefer()`:
• `conflicts_prefer(MASS::select)`
• `conflicts_prefer(dplyr::select)`

As the error message suggests, we can resolve the error either by using the package::function notation, or use the function conflict_prefer to say which function we want to use by default.

[conflicted] Will prefer dplyr::select over any other package.
penguins |> select(species)
# A tibble: 344 × 1
  species
  <fct>  
1 Adelie 
2 Adelie 
3 Adelie 
# ℹ 341 more rows

If there are many functions from a package that conflict, for example if you have loaded tidylog, you can use conflict_prefer_all().

5.9 Packages change over time

R packages often get updated. This is good as functions get added or improved and bugs get fixed. However, it also means that code written last year might give the same result (or even not work at all) next year with all the latest packages. This is a big problem for reproducibility.

The solution is to make sure you re-run your code with the same packages. That is not easy to do by hand. The renv package keeps track of all the packages you are using (and all the packages they depend on). I use renv for my analyses.

The workflow when working with renv is:

  • Call renv::init() to initialise a private R library for the project
  • Work in the project as normal, installing R packages as needed in the project
  • Call renv::snapshot() to save the state of the project library
  • Continue working on your project, installing and updating R packages as needed. Use renv::install() to install packages from CRAN or GitHub.
  • If the changes were successful, call renv::snapshot() again. If the updated packages introduced problems, call renv::restore() to revert to the previous state of the library.

5.10 Writing your own function/packages

If you find that you need to run the same code several times, it can be useful to write a function.

To make a function, you need to use the reserved word function followed by brackets with zero or more arguments. After the brackets, braces encompass the body of the function.

Here is a function that multiples two numbers together.

mutliply <- function(x, y = 1){ #The default value of y is 1
  x * y
}

multiply(x = 6, y = 7)

Once you have written a function, it can be useful to make your own package. This makes it easy to use in your own analysis and easy to share with other users. Information on how to make a package using the usethis and devtools packages can be found in the package writing book.

5.11 Citing packages

When you use a package it is important to cite it in a manuscript or thesis both to acknowledge the author’s work in making the package and to increase reproducibility. The correct citation can be seen with the function citation.

citation("lme4")
To cite lme4 in publications use:

  Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015).
  Fitting Linear Mixed-Effects Models Using lme4. Journal of
  Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.

A BibTeX entry for LaTeX users is

  @Article{,
    title = {Fitting Linear Mixed-Effects Models Using {lme4}},
    author = {Douglas Bates and Martin M{\"a}chler and Ben Bolker and Steve Walker},
    journal = {Journal of Statistical Software},
    year = {2015},
    volume = {67},
    number = {1},
    pages = {1--48},
    doi = {10.18637/jss.v067.i01},
  }

It is also important to cite the version used.

[1] '1.1.33'

You should also cite R. Again, the citation function can be used

To cite R in publications use:

  R Core Team (2023). _R: A Language and Environment for Statistical
  Computing_. R Foundation for Statistical Computing, Vienna, Austria.
  <https://www.R-project.org/>.

A BibTeX entry for LaTeX users is

  @Manual{,
    title = {R: A Language and Environment for Statistical Computing},
    author = {{R Core Team}},
    organization = {R Foundation for Statistical Computing},
    address = {Vienna, Austria},
    year = {2023},
    url = {https://www.R-project.org/},
  }

We have invested a lot of time and effort in creating R, please cite it
when using it for data analysis. See also 'citation("pkgname")' for
citing R packages.

The R version can be obtained with R.version.string. This is a variable not a function so it does not take brackets.

R.version.string
[1] "R version 4.3.1 (2023-06-16)"

In quarto you can also use the insert citation tool to add a package to your bibliography.

Quiz

Question 1: You have installed the tidyverse package. Which command do you need to use to activate it?

Exercise

Check your computer is ready for the next chapter with the checker package. This will check that your computer has the correct versions of R and RStudio installed, the required packages, and that the recommended RStudio options are set.

First install the checker package.

install.packages("checker")

Now run

checker::chk_requirements("https://raw.githubusercontent.com/biostats-r/biostats/main/checker/basic.yaml")

Please address any issues checker finds before continuing to the next chapter.

Contributors

  • Jonathan Soulé
  • Richard Telford