library("mgcv")
5 Installing packages
- learn how packages extend what R can do
- install packages from CRAN and from GitHub
- load packages so their functions/data are available
- learn how to cite R and packages
5.1 Functions & Packages
Everything that does something in R is a function.
All functions arranged into packages.
Packages are stored in libraries.
Some R packages, for example, stats
and utils
, are automatically loaded when you start R. You can see the packages you have installed (and which are loaded) in the tab Packages
(see Section 2.3.4.3).
There are also several recommended packages that are installed by default, for example the mgcv
package for fitting generalised additive models.
5.2 Loading packages
If you want to use the functions in a package, you need to load the package with the function library()
.
5.3 Installing extra packages
In addition to the packages already installed, there are thousands of extra packages available for download that expand the computing possibilities of R by adding new functions, classes, documentation, data sets, etc.
The vast number of available packages is a little daunting, but many are very specialised, and won’t be useful for you. If you don’t know the name of the package you need for your analyses the task views can help. For example, the Environmetrics task view describes packages for the analysis of ecological and environmental data.
Many packages can be downloaded from CRAN (R’s homepage), and others can be installed from GitHub. The next sections show you how to install packages from these locations.
Every time you install a new package, R imports all necessary files into a local library, but does not activate it. You will have to remember to activate the new package with library()
every time your project require items or functions from that package.
Book | R Package | |
---|---|---|
How to get | Order from your favourite bookshop | install.packages("tidyr") |
When it is kept | Bookshelf/library | A Library |
How to use it | Open book | library(tidyr) |
5.4 Packages published on CRAN
There are lots of extra R packages available from CRAN.
You can install or update a package from CRAN with install.packages()
. Simply type the name of the the package you want to install and add quotation marks "
"
.
install.packages("tidyr")
You only need to do this once, until you need to install a new version of a package, so you should run this directly in the console and not keep it in your script or markdown document (otherwise it will install the package every time you run the code which will be slow).
Once you have installed the package, you can use library()
to load it. You need to do this every time use use it.
If you are using a Windows computer, when you install a package you may get the warning
WARNING: Rtools is required to build R packages but is not currently installed.
Rtools is required to install packages from their source code when they have code (e.g. C or Fortran) that needs compiling. If you are installing packages with install.packages()
, you are, by default, installing binary files from CRAN that just need to be unzipped, and so Rtools is not required.
5.5 Packages published on GitHub
Not all R packages are available on CRAN. Some packages are only available on GitHub.com. If you want to install a package published on GitHub, you may use the function remotes::install_github()
(install remotes
from CRAN first). You need the repo name from GitHub, this given as "name_of_owner/name_of_repo"
Here also, you must add quotation marks "
"
.
install.packages("remotes")
#ggvegan for plotting ordinations is only on github
remotes::install_github(repo = "gavinsimpson/ggvegan")
5.6 Recommended packages
These packages are ones we recommend you install now:
usethis
remotes
tidyverse
palmerpenguins
biostats.tutorials
These packages will be used throughout the website. There are other packages we will install later.
5.6.1 usethis
and remotes
These are utility packages that we will use to download course material and packages from GitHub.
5.6.2 The tidyverse
The tidyverse is an amazing toolbox which contains a growing, evolving collection of R packages for data science. The packages are developed on the same philosophy and are fully compatible with each other. Some of the tidyverse packages help you import data (readr
), some let you make figures (ggplot2
), others help you rearrange your data set (dplyr
, tidyr
), and much more. Have a look at the tidyverse webpages to explore the collection.
Install and activate the tidyverse with the following code:
install.packages("tidyverse")
library(tidyverse)
This will load the core tidyverse packages, which include readr
, dplyr
, and ggplot2
so you don’t need to load these separately.
5.6.3 palmerpenguins
palmerpenguins
is a package that provides you with two real data sets. They contain measurements from three penguin species found in the Palmer archipelago, Antarctica. Several variables such as species, island, and body mass are included for over three hundred penguins. These data sets will be used in the upcoming sections of this website.
Install and activate the package with this code:
install.packages("palmerpenguins")
library(palmerpenguins)
Type this following line in the console if you want to know more about palmerpenguins
:
citation("palmerpenguins")
5.6.4 biostats.tutorials
Along with the present website, our team has developed a package with tutorials called biostats.tutorials
. This package will help you learn and practice R functions and concepts. Several of the upcoming chapters refer to this package. Install biostats.tutorials
with this code:
#install.packages("remotes")
remotes::install_github("biostats-r/biostats.tutorials")
Now load the package with library(biostats.tutorials)
and the tutorials should appear in the tab Tutorial
(see Section 2.3.5.2). Exercises will tell you when to run a tutorial. Click the “Start Tutorial” button to start the tutorial (this may take a few seconds to start the first time).
Install tidyverse
, palmerpenguins
and remotes
from CRAN and biostats.tutorials
from GitHub using the code above.
5.7 Debugging failed package installation
Sometimes packages fail to install properly. This can be frustrating and difficult to debug.
Some recommendations
- Read the error message. It might tell you exactly what you need to do.
- Check exactly which package won’t install. It may be a dependency of the package you want that has problems installing. Try to install the package or dependency again and pay attention to any error message.
- Restart R (in Session menu in RStudio) and try again.
- Find your user library with
.libPaths()
and in the file manager delete the package’s directory. - Google any error message. Someone else may have had the same problem.
- Try installing the package directly in R (i.e not using RStudio)
5.8 Name conflicts
5.8.1 The problem
Sometimes two packages have functions with the same names. For example, both MASS
and dplyr
have a select
function which does completely different things. If both packages are loaded at the same time there is a conflict and the function that was loaded last takes priority. This can cause big problems, with difficult to interpret error messages.
library(palmerpenguins) # load data
library(dplyr)
library(MASS) # R will report that select is being masked
Attaching package: 'MASS'
The following object is masked from 'package:dplyr':
select
penguins |> select(species)
Error in select(penguins, species): unused argument (species)
If you have code that worked one day and fails the next with a weird error messages, it might be because of a name conflict. If you start typing a function name into RStudio, it will show which package the function comes from.
There are three solutions.
5.8.2 Loading order
Be very careful about the order in which packages are loaded. If the example above had loaded MASS
before dplyr
the select
function in MASS
would have been masked and the code would have worked. This solution is very fragile as it is easy to load packages in the wrong order.
5.8.3 package::function
Use the package::function
notation to specify which package a function comes from. This is safe and can make code easier to understand by explicitly showing which packages the functions are from. The code above could be written safely as
penguins |> dplyr::select(species)
# A tibble: 344 × 1
species
<fct>
1 Adelie
2 Adelie
3 Adelie
# ℹ 341 more rows
This gets tedious fairly quickly, so is best used with packages that you only need a few functions from once or twice, not many functions you need repeatedly.
5.8.4 conflicted
package
The safest solution is to use the conflicted
package. The conflicted
package converts any conflicts between packages into errors. This might seem like a bad idea, but it is much easier to diagnose an error from conflicted
than the weird error of a masked function.
Error:
! [conflicted] select found in 2 packages.
Either pick the one you want with `::`:
• MASS::select
• dplyr::select
Or declare a preference with `conflicts_prefer()`:
• `conflicts_prefer(MASS::select)`
• `conflicts_prefer(dplyr::select)`
As the error message suggests, we can resolve the error either by using the package::function
notation, or use the function conflict_prefer
to say which function we want to use by default.
library(dplyr)
library(MASS)
library(conflicted)
conflict_prefer("select", "dplyr")
[conflicted] Will prefer dplyr::select over any other package.
penguins |> select(species)
# A tibble: 344 × 1
species
<fct>
1 Adelie
2 Adelie
3 Adelie
# ℹ 341 more rows
If there are many functions from a package that have conflicts, you can use conflict_prefer_all()
.
conflict_prefer_all("dplyr", quiet = TRUE)
5.9 Packages change over time
R packages often get updated. This is good as functions get added or improved and bugs get fixed. However, it also means that code written last year might give the same result (or even not work at all) next year with all the latest packages. This is a big problem for reproducibility.
The solution is to make sure you re-run your code with the same packages. That is not easy to do by hand. The renv
package keeps track of all the packages you are using (and all the packages they depend on). I use renv
for my analyses.
The workflow when working with renv
is:
- Call
renv::init()
to initialise a private R library for the project - Work in the project as normal, installing R packages as needed in the project
- Call
renv::snapshot()
to save the state of the project library - Continue working on your project, installing and updating R packages as needed. Use
renv::install()
to install packages from CRAN or GitHub. - If the changes were successful, call
renv::snapshot()
again. If the updated packages introduced problems, callrenv::restore()
to revert to the previous state of the library.
5.10 Writing your own function/packages
If you find that you need to run the same code several times, it can be useful to write a function.
To make a function, you need to use the reserved word function
followed by brackets with zero or more arguments. After the brackets, braces encompass the body of the function.
Here is a function that multiples two numbers together.
mutliply <- function(x, y = 1){ #The default value of y is 1
x * y
}
multiply(x = 6, y = 7)
Once you have written a function, it can be useful to make your own package. This makes it easy to use in your own analysis and easy to share with other users. Information on how to make a package using the usethis
and devtools
packages can be found in the package writing book.
5.11 Citing packages
When you use a package it is important to cite it in a manuscript or thesis both to acknowledge the author’s work in making the package and to increase reproducibility. The correct citation can be seen with the function citation
.
citation("nlme")
To cite package 'nlme' in publications use:
Pinheiro J, Bates D, R Core Team (2024). _nlme: Linear and Nonlinear
Mixed Effects Models_. R package version 3.1-165,
<https://CRAN.R-project.org/package=nlme>.
Pinheiro JC, Bates DM (2000). _Mixed-Effects Models in S and S-PLUS_.
Springer, New York. doi:10.1007/b98882
<https://doi.org/10.1007/b98882>.
To see these entries in BibTeX format, use 'print(<citation>,
bibtex=TRUE)', 'toBibtex(.)', or set
'options(citation.bibtex.max=999)'.
It is also important to cite the version used.
packageVersion("nlme")
[1] '3.1.165'
You should also cite R. Again, the citation
function can be used
citation()
To cite R in publications use:
R Core Team (2024). _R: A Language and Environment for Statistical
Computing_. R Foundation for Statistical Computing, Vienna, Austria.
<https://www.R-project.org/>.
A BibTeX entry for LaTeX users is
@Manual{,
title = {R: A Language and Environment for Statistical Computing},
author = {{R Core Team}},
organization = {R Foundation for Statistical Computing},
address = {Vienna, Austria},
year = {2024},
url = {https://www.R-project.org/},
}
We have invested a lot of time and effort in creating R, please cite it
when using it for data analysis. See also 'citation("pkgname")' for
citing R packages.
The R version can be obtained with R.version.string
. This is a variable not a function so it does not take brackets.
R.version.string
[1] "R version 4.4.1 (2024-06-14)"
In quarto you can also use the insert citation tool to add a package to your bibliography.
Question 1: You have installed the tidyverse
package. Which command do you need to use to activate it?
Check your computer is ready for the next chapter with the checker
package. This will check that your computer has the correct versions of R and RStudio installed, the required packages, and that the recommended RStudio options are set.
First install the checker
package.
install.packages("checker")
Now run
checker::chk_requirements("https://raw.githubusercontent.com/biostats-r/biostats/main/checker/basic.yaml")
Please address any issues checker
finds before continuing to the next chapter.
Contributors
- Jonathan Soulé
- Richard Telford