# 14 Regression and ggplot

##
14.1 Regression lines with `geom_smooth()`

`geom_smooth()`

adds a regression line to a plot.
By default it uses a loess smooth when there are fewer than 1000 observations, and a GAM when there are more.
The grey band around the regression line is the confidence interval.
It can be turned off with `se = FALSE`

, and changed from the default 95% with the level argument.

```
ggplot(penguins, aes(x = body_mass_g, y = bill_length_mm)) +
geom_point() +
geom_smooth()
```

`## `geom_smooth()` using method = 'loess' and formula 'y ~ x'`

You can change they type of regression model used by `geom_smooth()`

with the `method`

argument.
So to show a linear model, use `method = "lm"`

.

```
ggplot(penguins, aes(x = body_mass_g, y = bill_length_mm)) +
geom_point() +
geom_smooth(method = "lm")
```

`## `geom_smooth()` using formula 'y ~ x'`

To show a glm, we need to `method = "glm"`

and set the family in the `method.args`

argument.

```
data(SWAP, package= "rioja")
swap_data <- bind_cols(pH = SWAP$pH, SWAP$spec)
ggplot(swap_data, aes(x = pH, y = sign(TA003A))) + # sign converts data to presence absence
geom_jitter(width = 0, height = 0.1) +
geom_smooth(
method = "glm",
method.args = list(family = binomial)) +
scale_y_continuous(breaks = c(0, 1)) +
labs(y = expression(italic(Tabellaria~binalis)))
```

`## `geom_smooth()` using formula 'y ~ x'`

## 14.2 Manual plotting of a linear model

We can also fit a regression model and make predictions. This is most useful for more complex models. For example, if we want to fit a model for the relationship between penguin body mass and bill length with species as a second predictor, we can only do it this way.

First fit the model

```
mod <- lm(bill_length_mm ~ body_mass_g + species,
data = penguins)
```

Now we need to make predictions.
We could do this with `predict()`

, but it is often easier to use `augment()`

from the `broom`

package as it takes care of missing values better.

```
#preds <- predict(mod, interval = "confidence", conf.level = 0.95)
# augment with a lm model
preds <- broom::augment(mod, interval = "confidence", conf.level = 0.95)
```

Now we can plot them, using geom_ribbon() and `geom_line()`

to recreate what `geom_smooth()`

produces.

```
ggplot(penguins, aes(x = body_mass_g, y = bill_length_mm, fill = species)) +
geom_point(aes(colour = species)) +
geom_ribbon(aes(ymin = .lower, ymax = .upper), data = preds, alpha = 0.3) +
geom_line(aes(y = .fitted, colour = species), data = preds) +
labs(x = "Body mass g", y = "Bill length mm")
```

## 14.3 Manual plotting of generalised linear models

With a generalised linear model, it is a little more complex as we can get the predictions on the response scale or the transformed link scale. If we want confidence intervals, we need to calculate them on the link scale and then transform them back to the response scale.

With a poisson model, we can transform the predictions from the link scale to the response scale with the exponential function `exp()`

.

With a binomial model, we need to reverse the logit function.
The easiest way to do this is with `plogis()`

.

```
mod_glm <- glm(sign(TA003A) ~ pH, data = swap_data, family = binomial)
preds_glm <- broom::augment(mod_glm, type.predict = "link", se_fit = TRUE) |>
mutate(
fitted = plogis(.fitted),
lower = plogis(.fitted + .se.fit * 1.96),
upper = plogis(.fitted - .se.fit * 1.96),
)
```

Now we can plot the predictions.

```
ggplot(swap_data, aes(x = pH, y = sign(TA003A))) + # sign converts data to presence absence
geom_jitter(width = 0, height = 0.1) +
geom_ribbon(aes(ymax = upper, ymin = lower, y = NULL),
data = preds_glm, alpha = 0.3) +
geom_line(aes(y = fitted),
data = preds_glm) +
scale_y_continuous(breaks = c(0, 1)) +
labs(y = expression(italic(Tabellaria~binalis)))
```