16 Using functions and packages

16.1 Functions

Everything that does something in R is a function.

For most functions the function name is followed by brackets. Within the brackets are zero or more arguments separated by commas.

Missing commas out is a common mistake and will give an error.

rnorm(n = 10, mean = 5 sd = 1)
## Error: <text>:1:24: unexpected symbol
## 1: rnorm(n = 10, mean = 5 sd
##                            ^

To get help on a function, use a ?.


The examples at the bottom of the help file can be useful to understand how to use the function. You can run the examples either by copying and pasting them, or using example().


16.1.1 infix functions

There is a special type of function called an infix function that does not use brackets but is placed between two objects.

For example, in

5 < 3
## [1] FALSE

< is the infix function that tests if the first number is smaller than the second and returns a TRUE or FALSE.

Some other infix functions are

5 > 3 # Greater than
## [1] TRUE
c("a", "z") %in% c("a", "b", "c") # are the first values in the second vector
## [1]  TRUE FALSE
7 %% 4 # modulus (finds the remainder)
## [1] 3
7 %/% 4 # integer division
## [1] 1

To get help of an infix function surround it with backticks.


16.2 Packages

All functions arranged into packages.

Some R packages, for example, stats and utils, are automatically loaded when you start R.

There are also several recommended packages that are installed by default, for example the mgcv package for fitting generalised additive models.

16.3 Loading packages

If you want to use the functions in a package, you need to load the package with the function library().

## Loading required package: nlme
## Attaching package: 'nlme'
## The following object is masked from 'package:dplyr':
##     collapse
## This is mgcv 1.8-36. For overview type 'help("mgcv-package")'.

16.4 Installing extra packages

There are lots of extra R packages available from CRAN (Rs homepage).

You can install or update a package from CRAN with install.packages()

You only need to do this once (unless you need to install a new version of a package), so you should run this directly in the console and not keep it in your script (otherwise it will install the package every time you run the code).

Once you have installed the package, you can use library() to load it. You need to do this every time use use it.

Often you know what package you want to install. If you don’t know the name of the package you need for your analyses the task views can help. For example, the Environmetrics task view describes packages for the analysis of ecological and environmental data.

Not all R packages are available on CRAN. Some packages in development are only available on github.com. Packages on github can be installed with the remotes package.

#ggvegan for plotting ordinations is only on github

16.5 Debugging failed package installation

Sometimes packages fail to install properly. This can be frustrating.

Some recommendations

  • Check exactly which package won’t install. It may be a dependency of the package you want. Try to install it again.
  • Restart R (in Session menu in RStudio) and try again.
  • Google any error message. Someone else may have had the same problem.

16.6 Name conflicts

16.6.1 The problem

Sometimes two packages have functions with the same names. For example, both MASS and dplyr have a select function which does completely different things. If both packages are loaded at the same time there is a conflict and the function that was loaded last takes priority. This can cause big problems, with difficult to interpret error messages.

library(palmerpenguins)#load data
library(MASS) # R will report that select is being masked
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##     select
penguins |> select(species)
## Error in select(penguins, species): unused argument (species)

If you have code that worked one day and fails the next with a weird error messages, it might be because of a name conflict. If you start typing a function name into RStudio, it will show which package the function comes from.

There are three solutions.

16.6.2 Loading order

Be very careful about the order in which packages are loaded. If the example above had loaded MASS before dplyr the select function in MASS would have been masked and the code would have worked. This solution can work in a script that you source, but is fragile in interactive sessions when it is easy to load packages in the wrong order.

16.6.3 package::function

Use the package::function notation to specify which package a function comes from. This is safe and can make code easier to understand by showing which packages the functions are from. The code above could be written safely as

penguins |> dplyr::select(species)
## # A tibble: 344 × 1
##   species
##   <fct>  
## 1 Adelie 
## 2 Adelie 
## 3 Adelie 
## # … with 341 more rows

This gets ugly fairly quickly, so is best used with packages that you only need a few functions from once or twice, not functions you need many times.

16.6.4 conflicted package

The safest solution is to use the conflicted package. The conflicted package converts any conflicts between packages into errors. This might seem like a bad idea, but it is much easier to diagnose an error from conflicted than the weird error of a masked function.

## Error: [conflicted] `select` found in 2 packages.
## Either pick the one you want with `::` 
## * MASS::select
## * dplyr::select
## Or declare a preference with `conflict_prefer()`
## * conflict_prefer("select", "MASS")
## * conflict_prefer("select", "dplyr")

As the error message suggests, we can resolve the error either by using the package::function notation, or use the function conflict_prefer to say which function we want to use by default.

## [conflicted] Will prefer dplyr::select over any other package
peguins |> select(Species)
## Error in select(peguins, Species): object 'peguins' not found

If there are many functions that need preferences recording, for example if you have loaded tidylog, you can iterate over them with purrr::map.

getNamespaceExports("tidylog") |> 
  map(~conflict_prefer(.x, winner = "tidylog"))

16.7 Citing packages

When you use a package it is important to cite it in a manuscript or thesis both to acknowledge the author’s work in making the package and to increase reproducibility. The correct citation can be seen with the function citation.

## To cite lme4 in publications use:
##   Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015).
##   Fitting Linear Mixed-Effects Models Using lme4. Journal of
##   Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.
## A BibTeX entry for LaTeX users is
##   @Article{,
##     title = {Fitting Linear Mixed-Effects Models Using {lme4}},
##     author = {Douglas Bates and Martin M{\"a}chler and Ben Bolker and Steve Walker},
##     journal = {Journal of Statistical Software},
##     year = {2015},
##     volume = {67},
##     number = {1},
##     pages = {1--48},
##     doi = {10.18637/jss.v067.i01},
##   }

It is also important to cite the version used.

## [1] ''

You should also cite R. Again, the citation function can be used

## To cite R in publications use:
##   R Core Team (2021). R: A language and environment for statistical
##   computing. R Foundation for Statistical Computing, Vienna, Austria.
##   URL https://www.R-project.org/.
## A BibTeX entry for LaTeX users is
##   @Manual{,
##     title = {R: A Language and Environment for Statistical Computing},
##     author = {{R Core Team}},
##     organization = {R Foundation for Statistical Computing},
##     address = {Vienna, Austria},
##     year = {2021},
##     url = {https://www.R-project.org/},
##   }
## We have invested a lot of time and effort in creating R, please cite it
## when using it for data analysis. See also 'citation("pkgname")' for
## citing R packages.

The R version can be obtained with R.version.string. This is a variable not a function so it does not take brackets.

## [1] "R version 4.1.2 (2021-11-01)"

16.8 Packages change over time

R packages often get updated. This is good as functions get improved and bugs get fixed. However, it also means that code written last year might not work next year with all the latest packages. This is a big problem for reproducibility.

The solution is to make sure you re-run your code with the same packages. That is not easy to do by hand. The renv package keeps track of all the packages you are using (and all the packages they depend on). I use renv for my analyses.

The workflow when working with renv is:

  • Call renv::init() to initialise a private R library for the project

  • Work in the project as normal, installing R packages as needed in the project

  • Call renv::snapshot() to save the state of the project library

  • Continue working on your project, installing and updating R packages as needed. Use renv::install() to install packages from CRAN or github.

  • If the changes were successful, call renv::snapshot() again. If the updated packages introduced problems, call renv::restore() to revert to the previous state of the library.

16.9 Writing your own function/packages

If you find that you need to run the same code several times, it can be useful to write a function.

To make a function, you need to use the reserved word function followed by brackets with zero or more arguments. After the brackets, braces encompass the body of the function.

Here is a function that multiples two numbers together.

mutliply <- function(x, y = 1){ #The default value of y is 1
  x * y

multiply(x = 6, y = 7)

Once you have written a function, it can be useful to make your own package. This makes it easy to use in your own analysis and easy to share with other users. Information on how to make a package using the usethis and devtools packages can be found in the package writing book or at https://r-pkgs.org/whole-game.html.