3 Elements of an R markdown file
3.1 YAML
The YAML is metadata for the document that goes right at the top of the file.
The YAML consists of key: values
pairs.
The colon and space are required.
It can set the document author and title, the output format and many other things.
YAML format can be difficult to get right as it is sensitive to white space.
You can use an RStudio Addin from the package ymlthis
to help write the YAML.
3.1.1 R code in YAML
It is possible to add R code to the YAML, for example to show the current date.
The R code needs to be enclosed in quote marks AND back-ticks with an r
before the code.
If the code contains quotemarks, then they need to be different from the enclosing quotemarks (i.e. single vs double quotes).
---
title: "My Manuscript"
output: html_document
date: '24 August 2022'
---
See the data-time tutorial for more about the codes in "%d %B %Y"
.
Exercise
Add date and author to the YAML of your svalbard_traits R Markdown document so it shows today’s date and your name when knitted.
3.1.2 Output formats
R Markdown documents can be rendered in many different output formats, including
- presentations (with
xaringan::moon_reader
or ioslides) - posters (with
posterdown
) - books (with
bookdown
). - theses (with
thesisdown
)
This tutorial focuses on document-like reports.
There is a choice of output format for documents. This can be specified when the R markdown file is created in RStudio or by editing the YAML.
Producing an html file to view in a browser is the simplest, as nothing extra needs installing. The YAML should look something like this.
---
title: "My Manuscript"
output: html_document
---
Word documents are also easy; just change the output to word_document
.
This can be very useful if you have a supervisor or collaborators who cannot cope with R markdown directly, but consider using the redoc
package which lets you convert an edited word document back into R markdown.
Rendering the R markdown file as a PDF requires some external tools (LaTeX) to be installed (you don’t need to learn any LaTeX).
This can be done with the tinytxt package
.
# run this only once
install.packages('tinytex')
tinytex::install_tinytex()
Then the output format in the YAML can be changed to pdf_document
.
With PDF documents, it can be tricky to control exactly where the figures are positioned, so I recommend working with html as long as possible.
3.2 R Markdown, PDF and LaTeX
R Markdown uses LaTeX to make PDFs. You don’t need to know any LaTeX, but you can include some if you want to change the formatting etc. For example, you force a new page, you could use the command.
\\newpage
3.5 Text
Type to make text! In the RStudio visual editor, which you can access by clicking on “Visual” above the document, you can format the text in much the same way you would work in MS word or Libra Office. RStudio has a built-in spell checker that will underline words it doesn’t recognise in red. Go to “Tools” >> “Global Options…” >> “Spelling” to change the language.
3.5.1 Source editor
Sometimes it is useful to be able to write in markdown directly, rather than using the visual editor, for example if you are writing a question for stackoverflow.com or an issue on github.com. You can see this mode by clicking on “Source” above the document
Paragraphs have a blank line between them. It is good practice to write one sentence per line. The extra line breaks will be removed when the document in knitted. If you want to force a line break, put two spaces at the end of the line.
Formatting is generated with some special characters. For example:
Markdown Syntax | Output |
---|---|
|
Header 1 |
|
Header 2 |
|
Header 3 |
*italics* and **bold** | italics and bold |
superscript m^2^ | superscript m2 |
subscript CO~2~ | subscript CO2 |
`verbatim code` |
verbatim code |
3.6 Escaping characters
If you actually want a *_^~ in the text, you need to escape it by putting a backslash \ before it, e.g. \*.
A more complete list of formatting is given in the R markdown cheat sheet.
A more complete list is given in the R markdown cheat sheet.
Exercise
The Results section of the svalbard_traits document should be in Header 1 style, and species names should be in italics. Fix this and render the document to check the formatting has worked.
3.7 Code chunks
Code in an R Markdown document is contained in code chunks.
This is a code chunk that loads the penguin data from the palmerpenguins
package.
```{r}
data(penguins, package = "palmerpenguins)
```
It starts with three back-ticks, followed by braces.
Inside the braces, the “r” indicates that this is a chunk in the R language.
Next, on a new line, is the body of the chunk.
The chunk ends with three back-ticks on their own line.
In the visual editor, you won’t see the back-ticks, but the code block will start with {r}
and have a grey background.
3.7.1 Making a chunk
You can type the back-ticks and braces needed to make a block, but, when using the visual editor, it is easier to get RStudio to insert the block with the insert tool. Type a forwards slash / on a blank line and choose “R code chunk”. You can also use the RStudio keyboard shortcut ctrl+alt+i (on a mac Command+Option+i).
3.7.2 Chunk language
We will just work with R chunks, but it is possible to run chunks in other languages in RStudio, including Python.
Exercise
Make a new code block (or blocks) that make a plot showing the effect of the treatment on leaf thickness.
Hint
You can copy and modify some of the existing code rather than writing from scratch.
3.7.3 Chunk options
Code block options control how the blocks work and how any output is treated. Options are given in special comments at the top of the block (in previous versions of R Markdown they were given after the r in the braces at the start of the chunk).
```{r}
#| echo: false
#| label: penguins-bill-body
#| warning: false
#| fig-cap: "The figure caption"
#| fig-alt: Plot of penguin bill length against body mass by species
library(ggplot2)
ggplot(penguins, aes(x = body_mass_g, y = bill_length_mm, colour = species)) +
geom_point()
```
3.8 Options format
The white space in the block options is critical.
If you don’t have the space after the
#|
then the option becomes a regular comment and is ignored.If you don’t have a space after the colon, you get “ERROR: Render failed due to invalid YAML.”
true
and false
can be written in lower case (in R they must be upper case).
There are lots of chunk options, but only a few that you will need to use frequently. Here are some and their default.
-
echo
(true
) Show the chunk’s code in the output. -
eval
(false
) Run the chunk code. -
include
(true
) Include the output of the chunk in the document. -
message
(true
) Include messages from R. -
warning
(true
) Include warnings from R. -
error
(false
) Iftrue
, shows any error message. Iffalse
, stops knitting when there is an error in R code.
I leave message
and warning
as true
while I am writing the document, so I can see any possible problems, and set them to false
when I knit the final version.
I sometimes find it useful to set error
to true
as can make it easier to debug any errors in the code.
Chunk options for figures are shown in section 4.1.1.
For more options see http://yihui.name/knitr/options/
Exercise
Importing packages produces lots of output that we don’t need to see in the final report Use block options to hide the output of this block.
The code block making the table is giving a message about the grouping. Use block options to make this message go away. Use block options to make this message go away.
3.8.1 Setting default chunk options
Default chunk options can be set for all chunks with knitr::opts_chunk$set
in the first chunk.
New R markdown files created by RStudio automatically have this in a chunk called “setup”.
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
This chunk sets echo = TRUE
for all chunks.
The include=FALSE
will stop any output from this chunk being included in the output.
3.8.2 Chunk labels
It is a good idea to label chunks. If you don’t, they will automatically be called “unnamed-chunk-n” where “n” is a incrementing number. This is inconvenient for debugging (you need to work out which chunk is “unnamed-chunk-37”) and for working with any image files generated by the document. In section 6 you will see how to use chunk names to cross-reference figures and tables in your document.
Chunks can be labelled either by putting the label in the braces at the start of the chunk
```{r load-packages}
library(tidyverse)
```
or with the label chunk option
```{r}
#| label: load-packages
library(tidyverse)
```
3.9 Special characters in labels
Avoid spaces, underscores, periods and other special characters in code block labels. They will cause all sorts of strange problems.
3.9.1 Running a chunk
Code in chunks will be run when the document is knitted (unless eval: false
), but it is also useful to run the code interactively to check that it works.
You can do this by clicking on the green play buttons at the right of the chunk (Fig. 3.1) or from the Run button above the document.
If the code depends on previous chunks, the grey/green icon will run them all.
3.9.2 Hiding a chunk
If a chunk has a lot of code, it can be useful to hide it to make it easier to navigate the document. The grey arrow next to the line numbers will do this. Sections of text can also be hidden.
3.9.3 Environments and working directory
R knits R Markdown documents in a new R session. Initially, no packages are loaded and the environment is empty: the R markdown document does not have access to any objects in your current environment (this is a good thing for reproducible analyses). This means that any data or packages you want to use in the document needs to be imported by the code in the document.
The working directory for the new R session used when knitting the R markdown file is the directory where the file is.
If the file is in the root directory of an RStudio project, relative paths will work the same way in the R markdown document as from the console.
If the file is in a sub-directory, use here::here()
to form paths relative to the project root.
3.10 Inline code
In addition to the output from chunks of code, you can insert code directly into text.
This lets you avoid copying and pasting numbers from the output.
Inline code is enclosed by back-ticks and starts with an r
.
Seven times six is `r 7 * 6`
Seven times six is 42
3.11 Numbers in words
If you want numbers written as words, for example at the start of a sentence, use the package english
.
Seven times six is `r english::words(7 * 6)`
Seven times six is forty-two
It is best to keep inline code short to keep the text readable. One trick is to do all necessary calculations in a previous chunk, so only the name of the object with the result needs to be in the inline code. If there are many results to report, consider storing them in a list as in the following example.
cor_adelie <- cor.test(
~ bill_length_mm + body_mass_g,
data = penguins,
subset = species == "Adelie")
adelie_list <- list(
#degrees of freedom
df = cor_adelie$parameter,
# extract correlation and round
est = round(cor_adelie$estimate, 2),
#format p.value with an "=" is the first character is not "<".
#See the characters tutorial for more on the stringr package and regular expressions.
p_val = str_replace(
string = format.pval(cor_adelie$p.value, eps = 0.001),
pattern = "^(?!<)",
replacement = "= ")
)
Bill length and body mass in Adelie penguins are positively correlated,
r = `r adelie_list$est` (df = `r adelie_list$df`, p `r adelie_list$p_val`).
Bill length and body mass in Adelie penguins are positively correlated, r = 0.55 (df = 149, p < 0.001).
3.4 Comments
A comment in an R code block starts with a
#
, just as in an ordinary R script.A comment in the text is enclosed an html comment mark
Type Ctrl + Shift + C to get this comment mark.
In the source editor, you can select text you want to hide and use this keyboard short-cut to comment it out.