2 Version control with GitHub and RStudio
2.1 Basic workflow
Let’s start by briefly explaining the basic workflow. On GitHub you can make repositories, which is a kind of project. Your online repository, or short repo, is called remote. To use your repository, you need to clone it locally onto your computer and it is then called local. You can write and edit R code locally on your computer. The new code is then committed and pushed back to the remote. If you follow this workflow consistently, GitHub will keep track of all the changes you make.
- repo - short for repository
- repository - the directory or folder that is under version control
- local - the repository on your computer
- remote - the repository on GitHub
- commit - take a snapshot of one or more files in the repository
- push - send commits from the local repo to the remote repo
- pull - retrieve commits from the remote repo to the local repo
- .gitignore a file that tells git which files or types of files you don’t want to commit
2.1.1 The workflow
For the sake of this tutorial, Kingsley and Angelina will show us the workflow with RStudio and GitHub.
The general workflow looks like Figure 2.2: Kingsley has a repo on GitHub. He clones the repo on his computer (1). He develops code and makes commits (2). Then he pushes the changes back to the remote repo on GitHub (3). In this tutorial, we will explain each of these steps in detail.
2.2 Installation
You can download and install git from https://git-scm.com/
(For UiB users, you can find git in the Software Center; Mac users need the software xcode)
You can get a GitHub account at https://github.com/: sign up and follow the instructions.
2.3 usethis
package
The usethis
package makes working with git and GitHub from R and RStudio much easier.
install.packages("usethis")
- Install git.
- Once you have done this you will need to restart RStudio if it is open.
- Create a GitHub account.
- Install the
usethis
R package.
2.4 Set-up
Now you have to configure your name and email associated with your GitHub account. Go to RStudio and, using your own identification, type:
library(usethis)
use_git_config(
user.name = "kingsleyshaklebolt",
user.email = "kingsleyshaklebolt@ministryofmagic.com"
)
The next step is to connect RStudio and GitHub.
Configure git with your user name and email.
2.5 Connect RStudio and GitHub
GitHub needs to validate who you are before you can connect it and RStudio. We can do this by generating and saving a Personal Access Token (PAT). You need to do this once for every RStudio project.
usethis::create_github_token()
This function will open GitHub. After confirming your password, you will be shown a page to make a new PAT. You don’t need to change any of the options. Just click the green “Generate token” button at the bottom of the page.
The next page will show you your PAT. Copy it by clicking on the clipboard icon and return to R.
Now run
gitcreds::gitcreds_set()
this will ask you for your PAT: paste it at the prompt and press return. This will save the PAT so that it can be used to access GitHub. Treat your PAT as a password - never save it in a script.
Depending on your operating system, you might only have to do this once (at least until the PAT expires) or you might need to do it again for every RStudio project.
git_vaccinate()
will add various files to your global .gitignore file (Section 2.10) to reduce the chance of you leaking passwords, making git safer to use.
If you do accidentally commit a password or other token or key, you should assume that it has been compromised (bots search GitHub for these), and immediately replace the password.
Generate a GitHub PAT and store it with gitcreds::gitcreds_set()
. Then vaccinate git on you computer.
2.6 Making a repo
Now you are ready to start using RStudio and GitHub. There are two main workflows:
- Make a new local repo and push it to GitHub
- Clone an existing repo from GitHub onto your computer
We will cover these in turn: choose the best one for your circumstances.
By default, repositories on GitHub are publicly readable. They can be made private if necessary.
This workflow is useful if you already have an RStudio project on your computer or are starting a new project.
If you are making a new RStudio project, tick the “Create a git repository” box in the New Project Wizard.
Otherwise, in RStudio, create a git repo for your project with
usethis::use_git()
After creating the repo, the function will ask whether you want to commit some files.
- If it is a new project, this is safe to agree to.
- If it is an existing project there might be some files that you don’t want to commit, for example very large files or files with sensitive information such as passwords. In this case, don’t let R commit everything, instead pick which files you want committed (see Section 2.7) and improve the
.gitignore
(see Section 2.10) so they cannot be committed by accident.
Then let RStudio restart so that the git panel is activated.
Once you have committed at least one file, you can create a repo on GitHub.
By default, the repo name on GitHub will match the RStudio project name, and the repo will be public.
Now go to GitHub, find your repo and check the files you committed have been uploaded.
This is how your new repo looks like.
This workflow is useful is you already have a repo on GitHub that you want to use.
Cloning your GitHub repository means that you are making a copy of your remote repository on Github, locally on your computer. You can clone any repository on GitHub, whether it is your own or belongs to somebody else, as long as it is public. This also means that all your repositories that are public can be seen by everybody, but do not worry too much about this, GitHub has millions of repos. But make sure you do not commit sensitive information such as passwords.
Go to the GitHub repository you that you want to clone and find the names of the owner and the repo.
To clone the “kingsleyshacklebolt/dragon_study” repo, use
create_from_github("kingsleyshacklebolt/dragon_study")
If the repo you are trying to clone is not your own, the function will first make a fork (see forking tutorial Chapter 4) on GitHub into your account. The function will clone the repo onto your desktop by default. You can copy the entire directory from there to somewhere more appropriate.
Now it is time to commit and push.
Git repos typically have a default branch (you will learn more about branches in Chapter 3). The traditional name for this branch is “master”, but “main” is now preferred as it is more inclusive. GitHub uses “main” but git still uses “master” (this might change).
On stackoverflow and other help sites, you will see both branch names. The code will work if you use the name you have.
You can change the name of the default branch of your repo to “main” with this code from the usethis
package
You can set the default for all future git repos made on your computer by running.
git_default_branch_configure(name = "main")
- Create a new RStudio project, set up git on it and agree to commit the files. You should now have a git tab in RStudio.
- Rename the default branch to “main”.
- Set up the project to use GitHub, push the committed files and view them on GitHub.
After one commit, your repo will have the following structure.
2.7 Stage, commit and push
If you create or edit a file in your repository and save the changes the file will appear in the Git panel. There will be two yellow question marks ? ? if you add a file; a blue M if you edit a file that has already been committed; a red D if you delete a file. If you move files and they will show up as deleted and added in the new place. Once you have staged (ticking the box by the file name) both the deleted and new file, the icon will become a purple R.
Once you have written a chunk of code, save it and click on the Commit button. A new window will appear.
All the changes in the file are shown in green and red colour. Green is code that you have added to the file. Red is code that has been deleted. This makes it very easy to see all the changes that haven been made.
Stage the changes you made by ticking the box by the file name and add a Commit message (top right). The commit message should contain all the changes you have done. It can be short, but should be complete. It will help you later if you are searching for a specific commit.
Note that you need to tick all files that you want to commit.
Click Commit to save the changes which creates a permanent snapshot of the file in the Git directory along with a message that describe the changes you made in this file.
After several commit, your repo will have the following structure.
2.7.1 Some rules
Commits are cheap. It does not take much time to click on Commit, stage the file(s) and write a few words about the commit (aim for max 50 characters). A good rule is if you want to use the word and then it should probably have been two commits. Therefore, commit often and provide useful messages so you can keep track of what you are doing.
This is an example of useless commit messages:
2.7.2 Push and pull
So far you are still working locally on your computer and you have not changed the remote repository on GitHub. All the new code is still locally on your computer. To upload your commits to your remote GitHub repository you need to Push (green arrow in the Git tab) these changes to your remote repository on GitHub.
RStudio tells you the status of your local repo compared with the remote. For example, in Figure 2.12 the local repo is 2 commits ahead of the remote.
Add a short script, perhaps to make a plot of the palmerpenguins::penguins
data, to your RStudio project. It should appear in the git tab. Commit it. Now edit the file, and see how the git tab changes. Look at the diffs then commit the file again. Now rename the file and again see how the git tab changes. Commit the changes.
2.9 README file
A README file explains what a project is about and how it can be used. It should tell other people the important information about a repository.
Each GitHub repo should have a README file, because it is the first thing that anybody will look at. README files should contain information like:
- What is the project about
- Why is it useful or important
- Where users can get help with the project (i.e. for an R package)
- Who is the maintainer and contributor(s) of the project
To set up a README file use the function usethis::use_readme_rmd()
, which will create a new R Markdown file called README.Rmd. The YAML of this file is output: github_document and should not be changed. The rest of the file can be edited according to what you want to show in the README file.
Each time you change the README file, it has to be built using the knit button on top, which creates a .md file. Then the edits to both the .Rmd and .md file have to be committed and pushed to GitHub. Then the changes should be visible on GitHub.
Add a REDME file to your repo. Commit the changes and push to GitHub. Check if your README file is now on GitHub.
If you make a README file to the root of a public repo with the same name as your username, the README file appears on your profile page. Here you can tell other about yourself.
2.10 .gitignore file
When creating a new GitHub repository you can add a .gitignore
text file, which tells git all the files that should be ignored. In general, data or output files should not be committed, but exceptions can be useful for relatively small and unchanging files.
Every change you are making to a file in your R project and commit to GitHub, will be tracked. Commit files, code and output to GitHub, where you want to track changes. Do not commit all files, for example output files like figures which can easily be recreated with code.
Here is an example of a .gitignore
file:
# History files
.Rhistory
.Rapp.history
# Session Data files
.RData
# RStudio files
.Rproj.user/
# OAuth2 token, see https://github.com/hadley/httr/releases/tag/v0.3
.httr-oauth
.Rproj.user
#data (excludes everything in the folder data)
data/*
# you can make exceptions for specific files
!data/dragon_taxonomy.csv
#figures & output (excludes all figure files)
*.png
*.pdf
*.jpeg
2.11 Useful terminal commands
RStudio has a range of possibilities to work with Git and GitHub as shown in this tutorial. The Terminal has more commands and options and will be handy for trouble shooting. In this tutorial we only explain a limited selection of things that can go wrong.
2.12 Trouble shooting
Git not found
Sometimes, RStudio will complain that it cannot find git even after you install it.
On the RStudio menu click on
Ensure that the path to the Git executable is correct. This is particularly important in Windows where it may not default correctly (e.g. C:/Program Files (x86)/Git/bin/git.exe).
Git not working
If you use a university Windows computer, it might default to using a path like
\\helix\user_name\..
This is a UNC path. You can check the path with getwd()
. Git does not work with UNC paths. Make sure you start RStudio from, for example, O:\
.
Undo last commit
If you have committed something that you do not want to, you can undo the last commit. This only works if you have not pushed yet. Go to the terminal and type:
git reset --soft HEAD@{1}
For more help with trouble shooting, try the git flight rules.
If you need to do serious git surgery, for example to remove a large file committed accidentally, it is a cunning plan to copy your local repo first and practice on the copy before trying to fix the original repo.
If things get very messy, it can be worth making a new clone of the repo from GitHub, copying edited files to the new repo, and deleting the old repo.
Happy Git provides instructions for how to get started with Git, R and RStudio, explains the workflows and useful tips for when things go wrong.
The Git flight rules are an exhaustive resource for what to do when things go wrong.