5  Digitize and import data

5.1 Digitizing data

The first step after manual data collection is to digitize the data. This step is not necessary for digital data collected.

When digitizing the data, the digital version of the datasheet should reflect the paper version, to avoid having to flip to different sections of the spreadsheet. You do not need to change the format to a long table, because you will need the data in this format later. It is better to make the data entry as smooth as possible to avoid mistakes. Changing the format of the data frame can easily be changed later using code.

5.2 Proof reading

After digitizing your data, the next step is to proof read. Proof reading means that you check that the digital copy of your data reflect the paper copy. At this stage, the digital spreadsheet can still be edited by hand, if you are correcting for typos and similar issues.

Note that if you discover consistent error (e.g. entering the wrong date for multiple days), it is safer to make these changes using code.

After proof reading we recommend to save your data as raw data, a non-manipulated version of your data. Indicate in file name, that this is the raw version of the data. This will later make the process of data cleaning fully transparent. For example, if you want to share your data later, anybody can see what was done to the data, and if somebody wants to clean your data differently for another purpose, this is still possible.

It is recommended to proof read the whole dataset. If the dataset is very large, you can start with randomly proof reading some data sheets. Depending on the number of mistakes you find in a subset of the data, you can decided if you want to proof read everything or not.

5.3 Import data

The next step is to import the data to R for further processing. See the section on Import data in R in the Working in R book.