Bio300B Lecture 2
Institutt for biovitenskap, UiB
26 August 2024
Assign object to a name
Forgetting to assign is a very common error
Function name followed by brackets
Arguments separated by comma
Don’t include an argument - uses default
Don’t need to name arguments if in correct order
All elements must be the same type
Atomic vectors
Automatic coercion
Predict the outcome of
[1] 1 0
Extract from
2 dimensional
All elements same type
Arrays can have 3+ dimensions
[row_indices, column_indices]
[1] 4 5
Each element of a list can be a different type
Can make a smaller list, or extract contents of a carriage
Extract vector “a”
rectangular data structure - 2-dimensions
columns can have different type of object
special type of list where all vectors have same length
Tibbles are better behaved version of data.frame
Data.frames have row and column names
With square brackets
With column names
Which method is safer?
Can also use dplyr
package.
if
statements for choice
else
is optional
use ifelse()
or dplyr::case_when()
for vectorised if
logical conditions can be combined
&&
AND - TRUE if both TRUE||
OR - TRUE if either TRUE!
NOT - TRUE if FALSE (or use !=
for not equal)&&
and ||
return a single TRUE/FALSE
Useful for if
statements
&
and |
return a vector of TRUE/FALSE
Useful with ifelse()
or dplyr::case_when()
Often don’t need an explicit loop - R is vectorised
for
loopsfor
loops iterate over elements of a vector
for
pitfallsNeed to pre-allocate space or slow
Rarely need a loop - purrr::map()
, apply()
generally cleaner
$a
[1] 2
$b
[1] 5.5
$c
[1] 7
a b c
2.0 5.5 7.0
apply()
for iterating over rows/columns of a matrix
With your computer
With your collaborators
“Your closest collaborator is you six months ago but you don’t reply to email.” — Paul Wilson
Need understandable code
Goodstylemakescodeeasiertoread
A condition of publication in a Nature Portfolio journal is that authors are required to make materials, data, code, and associated protocols promptly available to readers without undue qualifications.
it is a condition for publication of accepted manuscripts at CJFAS that authors make publicly available all data and code needed to reproduce those results (including code to reproduce statistical results, simulation results, and figures) via an online data repository.
The only way to write good code is to write tons of shitty code first. Feeling shame about bad code stops you from getting to good code
— Hadley Wickham (@hadleywickham) 17 April 2015
Make your own style - but be consistent
“There are only two hard things in Computer Science: cache invalidation and naming things.”
— Phil Karlton
k
camelCase 🐫 | UpperCamelCase | snake_case 🐍 |
---|---|---|
billLengthMM | BillLengthMM | bill_length_mm |
bergenWeather2022 | BergenWeather2022 | bergen_weather_2022 |
dryMassG | DryMassG | dry_mass_g |
makeWeatherPlot | MakeWeatherPlot | make_weather_plot |
Place spaces
|>
, +
, -
, <-
, )=
in function callsUse styler
package to edit code to meet style guide.
Use lintr
package for static code analysis, including style check
Helps you find your way around a script
Long scripts become difficult to navigate
Fix by moving parts of the code into different files
For example:
Import with
Repeated code is hard to maintain
Make repeated code into functions.
Single place to maintain
Comments
Use # to start comments.
Help you and others to understand what you did
Comments should explain the why, not the what.
Try to make code self-documenting with descriptive object names