Data in R may be stored in a multitude of object types, but the most important ones are vector, matrix, list, data frame and tibble.
10.1 Matrices
A matrix is a two-dimensional object that displays data of the same type (numeric, character, etc) in the form of a table. It is built up with the function matrix() in which the data is imported either in the form of combined data elements (ex: c(12, 54, 987, 5, ...)), a series or sequence of data elements (ex: 1:25), or a vector (ex: temperature). In addition, one must define the number of rows and columns with nrow = and ncol =.
In the following example, the object neo is a matrix made of 4 rows and 6 columns filled with the numeric values stored in the vector temperature that we have previously created.
In a matrix, each row is numbered [x, ] and each column is numbered [ , y]. Any of the data elements may be retrieved by using its coordinates [x, y] preceded by the name of the matrix:
neo[2, 3]
[1] 10.8
A full row or column may be retrieved with the same expression, but we leave empty the coordinate that is not needed:
neo[2, ]
[1] 9.2 10.1 10.8 12.3 10.3 9.7
neo[ , 3]
[1] 10.7 10.8 11.3 11.9
10.1.2 About the use of matrices
The use of matrices on this website is very limited. However, you may meet matrices in other projects, so it is best to know about their existence. You can read more about matrices here.
Exercise
Make the matix neo as above, then
select the first two columns
the third column. What happened?
the second row.
The element in the third column second row.
10.2 Arrays
Matrices are two dimensional objects; arrays generalise this to have any number of dimensions. Arrays are an efficient way to store and manipulate high dimensional data (e.g,. latitude * longitude * time), but can be difficult to understand and don’t play well with the tidyververse.
A list is an object that contains values of one or several data types. It can not only contain single data elements, but also other objects such as vectors, matrices, etc.
Lists are created by the function list() that combines objects. list() conveniently allows for naming the elements by the mean of the symbol =.
In the example below, we will store 6 elements and name them string, number, temp, boolean, words and matrix. Among these elements to be stored are vec.char, temperature and neo, 3 objects that we have created further above on this page.
my_list<-list(string ="one", number =2, temp =temperature, boolean =TRUE, words =c("dog", "cat", "fish"), matrix =neo)my_list
You can access list items by position with [[ notation.
my_list[[1]]#get first element
[1] "one"
Naming elements is quite convenient as it allows you to retrieve them rapidly by the mean of the symbol $. The syntax is as follows: list_name$element_name.
Here we retrieve the element matrix in the list my_list:
Even better, you can retrieve a single data element contained in a list element. Here you will have to write an expression that makes use of both the symbol $ and the brackets []in the proper order. The syntax is as follows: list_name$element_name[data].
In this first example, we retrieve the data element located at the third position of the object named words in the list my_list:
my_list$words[3]
[1] "fish"
In the second example, we retrieve the data element located at the second row and third column of the matrix named matrix in the list my_list:
Make a list with three named elements of different types.
use square bracket notation to extract the second element
use $ notation to extract an element by name.
10.4 Data frames and tibbles
A data frame is a two-dimensional object that stores data of various types in the form of a table. Data frames are a popular way to store research data since the columns are usually associated with single variables (and are thus of a specific data type such as numeric or character) and rows are associated with observations.
Until recently, data frames were the main storage objects for research data. Nowadays, tibbles (an evolution of the data frame that appeared in the tidyverse) are replacing data frames as they are more practical for handling data sets (you will understand why further below). Because of this trend, we will focus mainly on tibbles here in this section and further on this website. It is however likely that you will meet data frames in the course of your studies. Do not worry as we will see how to transform data frames into tibbles.
Tibbles are standard introduced in tidyverse, so you must make sure that the package is active before using these objects. Simply run this command first:
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
If you have not installed the package yet, have a look at the Section 5.6.2.
10.4.1 Data frames vs. tibbles
The object df printed below is a data frame that stores the average temperature recorded monthly in 2017, 2018 and 2019 at Lygra (Vestland, Norway). It is created with the function data.frame().
Year Month Avg_temperature
1 2017 January 3.4
2 2017 February 2.8
3 2017 March 4.2
4 2017 April 5.8
5 2017 May 11.4
6 2017 June 12.6
7 2017 July 14.6
8 2017 August 13.9
9 2017 September 13.7
10 2017 October 9.2
11 2017 November 4.3
12 2017 December 3.1
13 2018 January 2.3
14 2018 February 0.5
15 2018 March 0.8
16 2018 April 6.7
17 2018 May 13.5
18 2018 June 13.6
19 2018 July 16.2
20 2018 August 13.8
21 2018 September 11.6
22 2018 October 8.0
23 2018 November 6.6
24 2018 December 3.9
25 2019 January 1.7
26 2019 February 4.6
27 2019 March 4.0
28 2019 April 9.1
29 2019 May 8.8
30 2019 June 13.2
31 2019 July 15.4
32 2019 August 15.8
33 2019 September 11.6
34 2019 October 7.8
35 2019 November 3.6
36 2019 December 4.8
As you may see, you get at once the whole data set with all 36 rows, the 3 variables, the header with column names and the first column that gives a number to each row.
The object tbl below is a tibble that contains exactly the same observations and variables as df. It is built up by the function tibble().
# A tibble: 36 × 3
Year Month Avg_temperature
<int> <chr> <dbl>
1 2017 January 3.4
2 2017 February 2.8
3 2017 March 4.2
4 2017 April 5.8
5 2017 May 11.4
6 2017 June 12.6
7 2017 July 14.6
8 2017 August 13.9
9 2017 September 13.7
10 2017 October 9.2
# ℹ 26 more rows
Here, you get a more convenient display of the same data:
only the first 10 rows and the header are displayed,
the number of rows not printed is displayed in the present window (# ... with 26 more rows),
the dimensions of the tibble appear clearly in the header (# A tibble: 36 x 3),
the column names come along with a quick description of the data type (<int> for integer, <chr> for character, <dbl> for double, etc).
All in all, tibbles print much better and give more information than data frames do! They also have more predictable behaviour when extracting data from them.
10.4.2 Retrieving data elements
Similarly to vectors, matrices and lists, one can extract single elements from data frames and tibbles. Here, we use brackets [] to do so:
df[3, "Avg_temperature"]
[1] 4.2
tbl[3, "Avg_temperature"]
# A tibble: 1 × 1
Avg_temperature
<dbl>
1 4.2
One can also retrieve rows or columns:
df[3, ]
Year Month Avg_temperature
3 2017 March 4.2
tbl[3, ]
# A tibble: 1 × 3
Year Month Avg_temperature
<int> <chr> <dbl>
1 2017 March 4.2
In Chapter 14 you will see an alternative way of manipulating tibbles with the dplyr package.
10.4.3 Transforming a data frame into a tibble
If you have been previously working with data frames, have been given a data frame to work with, or have imported data using functions that create data frames, you may convert them into tibbles by using as_tibble(). Here we convert the data frame df into a tibble:
# A tibble: 36 × 3
Year Month Avg_temperature
<int> <chr> <dbl>
1 2017 January 3.4
2 2017 February 2.8
3 2017 March 4.2
4 2017 April 5.8
5 2017 May 11.4
6 2017 June 12.6
7 2017 July 14.6
8 2017 August 13.9
9 2017 September 13.7
10 2017 October 9.2
# ℹ 26 more rows
You may read more about tibbles here.
You may read more about data frames here.
Exercise
Make a tibble with the first column the months of the year, and the second is a sequence of numbers from 1 to 12.
Now that you know the basics of R and that you have all the tools to “manually” create R objects, you will learn how to import a data set from an external source. We will see how to read and fetch data from various file types such as .txt, .csv, .xls, .xlsx, and directly store it in tibbles.
# Data structures beyond the vector {#sec-beyond-vector}Data in R may be stored in a multitude of object types, but the most important ones are vector, matrix, list, data frame and tibble.## MatricesA matrix is a two-dimensional object that displays data of the *same* type (numeric, character, etc) in the form of a table. It is built up with the function `matrix()` in which the data is imported either in the form of combined data elements (ex: `c(12, 54, 987, 5, ...)`), a series or sequence of data elements (ex: `1:25`), or a vector (ex: `temperature`). In addition, one must define the number of rows and columns with `nrow =` and `ncol =`. In the following example, the object `neo` is a matrix made of 4 rows and 6 columns filled with the numeric values stored in the vector `temperature` that we have previously created.```{r matrix, echo=TRUE}temperature <- c(8.7, 9.2, 9.4, 9.5, 9.7, 10.1, 10.3, 10.6, 10.7, 10.8, 11.3, 11.9, 12.2, 12.3, 11.7, 10.2, 10.3, 10.3, 10.4, 10.3, 10.1, 9.7, 9.5, 9.4)neo <- matrix(temperature, nrow = 4, ncol = 6)neo ```### Accessing data elementsIn a matrix, each row is numbered `[x, ]` and each column is numbered `[ , y]`.Any of the data elements may be retrieved by using its coordinates `[x, y]` preceded by the name of the matrix:```{r matrix2, echo=TRUE}neo[2, 3]```A full row or column may be retrieved with the same expression, but we leave empty the coordinate that is not needed:```{r matrix3, echo=TRUE}neo[2, ]neo[ , 3]```### About the use of matricesThe use of matrices on this website is very limited.However, you may meet matrices in other projects, so it is best to know about their existence.You can read more about matrices [here](https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/matrix){target="_blank"}.::: callout-note ## ExerciseMake the matix neo as above, then- select the first two columns- the third column. What happened?- the second row.- The element in the third column second row.:::## ArraysMatrices are two dimensional objects; arrays generalise this to have any number of dimensions. Arrays are an efficient way to store and manipulate high dimensional data (e.g,. latitude * longitude * time), but can be difficult to understand and don't play well with the `tidyververse`.```{r}#| label: arrayarray(data =1:24, dim =c(2, 4, 3))```## ListsA list is an object that contains values of *one or several* data types. It can not only contain single data elements, but also other objects such as vectors, matrices, etc. Lists are created by the function `list()` that combines objects.`list()` conveniently allows for naming the elements by the mean of the symbol `=`. In the example below, we will store 6 elements and name them `string`, `number`, `temp`, `boolean`, `words` and `matrix`. Among these elements to be stored are `vec.char`, `temperature` and `neo`, 3 objects that we have created further above on this page. ```{r list2, echo=TRUE}my_list <- list(string = "one", number = 2, temp = temperature, boolean = TRUE, words = c("dog", "cat", "fish"), matrix = neo)my_list ```### Retrieving list elementsYou can access list items by position with `[[` notation.```{r list-square}my_list[[1]] #get first element```Naming elements is quite convenient as it allows you to retrieve them rapidly by the mean of the symbol `$`. The syntax is as follows: `list_name$element_name`. Here we retrieve the element `matrix` in the list `my_list`:```{r listdollar, echo=TRUE} my_list$matrix```### Retrieving single data elementsEven better, you can retrieve a single data element contained in a list element. Here you will have to write an expression that makes use of both the symbol `$` and the brackets `[``]`in the proper order. The syntax is as follows: `list_name$element_name[data]`. In this first example, we retrieve the data element located at the third position of the object named `words` in the list `my_list`:```{r list-element-data, echo=TRUE} my_list$words[3]```In the second example, we retrieve the data element located at the second row and third column of the matrix named `matrix` in the list `my_list`:```{r list-element-data2, echo=TRUE} my_list$matrix[2, 3]```You may read more information about lists [here](https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/list){target="_blank"}.::: callout-note ## ExerciseMake a list with three named elements of different types. - use square bracket notation to extract the second element - use `$` notation to extract an element by name.:::## Data frames and tibblesA data frame is a two-dimensional object that stores data of *various* types in the form of a table. Data frames are a popular way to store research data since the columns are usually associated with single variables (and are thus of a specific data type such as numeric or character) and rows are associated with observations.Until recently, data frames were the main storage objects for research data. Nowadays, tibbles (an evolution of the data frame that appeared in the tidyverse) are replacing data frames as they are more practical for handling data sets (you will understand why further below).Because of this trend, we will focus mainly on tibbles here in this section and further on this website.It is however likely that you will meet data frames in the course of your studies. Do not worry as we will see how to transform data frames into tibbles.Tibbles are standard introduced in `tidyverse`, so you must make sure that the package is active before using these objects. Simply run this command first:```{r load tydiverse, echo=TRUE}library(tidyverse)```If you have not installed the package yet, have a look at the @sec-tidyverse.### Data frames vs. tibblesThe object `df` printed below is a data frame that stores the average temperature recorded monthly in 2017, 2018 and 2019 at Lygra (Vestland, Norway). It is created with the function `data.frame()`.```{r df, echo=TRUE}df <- data.frame(Year = rep(2017:2019, each = 12), Month = rep(month.name, 3), Avg_temperature = c(3.4, 2.8, 4.2, 5.8, 11.4, 12.6, 14.6, 13.9, 13.7, 9.2, 4.3, 3.1, 2.3, 0.5, 0.8, 6.7, 13.5, 13.6, 16.2, 13.8, 11.6, 8.0, 6.6, 3.9, 1.7, 4.6, 4.0, 9.1, 8.8, 13.2, 15.4, 15.8, 11.6, 7.8, 3.6, 4.8))df```As you may see, you get at once the _whole_ data set with all 36 rows, the 3 variables, the header with column names and the first column that gives a number to each row. The object `tbl` below is a tibble that contains exactly the same observations and variables as `df`. It is built up by the function `tibble()`. ```{r tbl, echo=TRUE}tbl <- tibble(Year = rep(2017:2019, each = 12), Month = rep(month.name, 3), Avg_temperature = c(3.4, 2.8, 4.2, 5.8, 11.4, 12.6, 14.6, 13.9, 13.7, 9.2, 4.3, 3.1, 2.3, 0.5, 0.8, 6.7, 13.5, 13.6, 16.2, 13.8, 11.6, 8.0, 6.6, 3.9, 1.7, 4.6, 4.0, 9.1, 8.8, 13.2, 15.4, 15.8, 11.6, 7.8, 3.6, 4.8))tbl```Here, you get a more convenient display of the same data:+ only the first 10 rows and the header are displayed,+ the number of rows _not_ printed is displayed in the present window (`# ... with 26 more rows`),+ the dimensions of the tibble appear clearly in the header (`# A tibble: 36 x 3`),+ the column names come along with a quick description of the data type (`<int>` for integer, `<chr>` for character, `<dbl>` for double, etc).All in all, tibbles _print_ much better and give more information than data frames do!They also have more predictable behaviour when extracting data from them.### Retrieving data elementsSimilarly to vectors, matrices and lists, one can extract single elements from data frames and tibbles.Here, we use brackets `[``]` to do so:```{r dftbl2, echo=TRUE}df[3, "Avg_temperature"]tbl[3, "Avg_temperature"]```One can also retrieve rows or columns:```{r dftbl3, echo=TRUE}df[3, ]tbl[3, ]df[ , "Avg_temperature"]tbl[ , "Avg_temperature"]```It is also possible to use the symbol `$` to retrieve the content of specific variables:```{r dftbl3bis, echo=TRUE}df$Avg_temperaturetbl$Avg_temperature```In @sec-working-with-single-tables-in-dplyr you will see an alternative way of manipulating tibbles with the `dplyr` package.### Transforming a data frame into a tibbleIf you have been previously working with data frames, have been given a data frame to work with, or have imported data using functions that create data frames, you may convert them into tibbles by using `as_tibble()`. Here we convert the data frame `df` into a tibble:```{r df-tibble, echo=TRUE}df_as_tibble <- as_tibble(df)df_as_tibble ```You may read more about tibbles [here](https://tibble.tidyverse.org/){target="_blank"}. You may read more about data frames [here](https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/data.frame){target="_blank"}.::: callout-note## ExerciseMake a tibble with the first column the months of the year, and the second is a sequence of numbers from 1 to 12.- extract the month column using `$` notation:::::: callout-note## Further Reading {- .literature .toc-ignore}+ [R for data science](https://r4ds.hadley.nz/){target="_blank"}+ [The tidyverse](https://www.tidyverse.org/){target="_blank"}+ [Advanced R](https://adv-r.hadley.nz/index.html) chapters 1 -- 4:::::: callout-note## What's nextNow that you know the basics of R and that you have all the tools to "manually" create R objects, you will learn how to import a data set from an external source. We will see how to read and fetch data from various file types such as .txt, .csv, .xls, .xlsx, and directly store it in tibbles.:::::: {.column-margin}### Contributors {.unlisted .unnumbered}- Jonathan Soulé- Aud Halbritter- Richard Telford:::