Call: survfit(formula = Surv(time, status) ~ sex, data = lung)
n events median 0.95LCL 0.95UCL
sex=male 138 112 270 212 310
sex=female 90 53 426 348 550
plotting a model
Kaplan-Meier curve
‘+’ mark censored data
library(ggfortify)autoplot(mod_surv)
Testing differences in survival time
between two or more groups
mod_diff <-survdiff(Surv(time, status) ~ sex, data = lung)mod_diff
Call:
survdiff(formula = Surv(time, status) ~ sex, data = lung)
N Observed Expected (O-E)^2/E (O-E)^2/V
sex=male 138 112 91.6 4.55 10.3
sex=female 90 53 73.4 5.68 10.3
Chisq= 10.3 on 1 degrees of freedom, p= 0.001
Survivorship curve
Shape of the survival curve
Hazard: The instantaneous risk of death
m_ll <-survreg(Surv(time, status) ~ age + sex, data = lung,dist ="loglogistic")summary(m_ll)
Call:
survreg(formula = Surv(time, status) ~ age + sex, data = lung,
dist = "loglogistic")
Value Std. Error z p
(Intercept) 6.39982 0.49256 12.99 < 2e-16
age -0.01401 0.00771 -1.82 0.06946
sexfemale 0.47751 0.14036 3.40 0.00067
Log(scale) -0.56991 0.06543 -8.71 < 2e-16
Scale= 0.566
Log logistic distribution
Loglik(model)= -1152.9 Loglik(intercept only)= -1160.9
Chisq= 16.07 on 2 degrees of freedom, p= 0.00032
Number of Newton-Raphson Iterations: 4
n= 228
Choice of distribution
Exponentiated coefficients show relative increase or decrease in the expected survival times when a covariate is increased one step while others are fixed:
exp(coef(m_ll))
(Intercept) age sexfemale
601.7394821 0.9860925 1.6120541
Expected survival time decreases by 1.4 % (i.e. multiply by 0.986) for each additional year of age
The expected survival time for females is 61.2 % higher than for males (multiply by 1.612).
scale
m_ll$scale
[1] 0.5655786
The scale parameter < 1, indicates that slope of the hazard decreases with time - Type III
Choice of distribution
Distribution
df
exponential
1
Weibull
2
lognormal
2
log logistic
2
Can use AIC to choose best
m_w <-update(m_ll, dist ="weibull")AIC(m_ll, m_w)
df AIC
m_ll 4 2313.794
m_w 4 2302.109
Summary
Most important things
Designing an experiment
R tips and tricks
Entering data into a spreadsheet
Importing data
Cleaning data
Data visualisation
Using quarto
Choosing a model type
Interpreting model output
Method choice - 2 key questions
Are the data clustered/grouped?
What is the distribution of the residuals?
(also other types of model - survival etc)
Clustered data
Mixed effect models
Normal distribution: linear mixed effect models
Binomial/poisson: generalised linear mixed effect models
Independent data, normal distribution
Linear models - ordinary least squares
t-tests
two sample t-test
one sample t-test
paired t-test
lm
Independent data, non-normal distribution
generalised linear models
glm
Binomial & Poisson distributions
Interpreting model output
data(penguins, package ="palmerpenguins")mod <-lm(bill_length_mm ~ sex * species, data = penguins)summary(mod)
Call:
lm(formula = bill_length_mm ~ sex * species, data = penguins)
Residuals:
Min 1Q Median 3Q Max
-5.7904 -1.3735 -0.0638 1.2096 11.4265
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.2575 0.2710 137.473 < 2e-16 ***
sexmale 3.1329 0.3833 8.174 6.64e-15 ***
speciesChinstrap 9.3160 0.4808 19.377 < 2e-16 ***
speciesGentoo 8.3063 0.4073 20.393 < 2e-16 ***
sexmale:speciesChinstrap 1.3877 0.6799 2.041 0.0421 *
sexmale:speciesGentoo 0.7771 0.5721 1.358 0.1753
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.316 on 327 degrees of freedom
(11 observations deleted due to missingness)
Multiple R-squared: 0.8234, Adjusted R-squared: 0.8207
F-statistic: 305 on 5 and 327 DF, p-value: < 2.2e-16