Bayesian Regression modelling with brms

Michael Jones

2025-05-07

Bayesian Primer

Why Bayes?

Philosophical: Probability can encode uncertainty
Practical: use distributions
Update beliefs based on data
Do away with many different statistical tests
Coherent system right up to complex non-linear models

Why not Bayes?

Slow(er) than Frequentist methods
More computationally intensive
Models might not converge
Have to provide a prior
Acceptance?

What Bayes?

\[ P(\theta|y) = \frac{P(y|\theta)P(\theta)}{P(y)} \]

What Bayes?

\[ P(\text{posterior}) = \frac{P(\text{likelihood})P(\text{prior})}{P(y)} \]

What Bayes?

\[ P(\text{posterior}) \propto {P(\text{likelihood})P(\text{prior})} \]

How Bayes?

Pen and paper conjugate priors
Computers Markov Chain Monte Carlo: BUGS/JAGS with Metropolis Hastings and Gibbs Sampling or Stan with HMC and NUTS

Regression Modelling Recap

The humble `lm`

set.seed(100)
fake_data <- tibble(x = runif(100, 0, 10),
                    y = 10 * x + 4 + rnorm(100, 0, 5))

ggplot(fake_data, aes(x = x, y = y)) + geom_point()

The humble `lm`

lm(y ~ x,
   data = fake_data)


Call:
lm(formula = y ~ x, data = fake_data)

Coefficients:
(Intercept)            x  
      3.729        9.968

The humble `lm`

lm(y ~ x,
   data = fake_data)


Call:
lm(formula = y ~ x, data = fake_data)

Coefficients:
(Intercept)            x  
      3.729        9.968

The humble `lm`

lm(y ~ x,
   data = fake_data)


Call:
lm(formula = y ~ x, data = fake_data)

Coefficients:
(Intercept)            x  
      3.729        9.968

Which of these are linear models?

\[\begin{align} y &= \beta_0 + \beta_1 x\\ y &= \beta_0 + \beta_1 x_1 + \beta_2 x_2\\ y &= \exp(\beta_0 + \beta_1 x_1 + \beta_2 x_2)\\ y &= \beta_0 + \beta_1 x^2\\ y &= \beta_0 + \beta_1^2 x \end{align}\]

`lm` formula syntax

formula	maths
`y ~ x`	\(y = b_0 + b_1 x\)
`y ~ 1 + x`	\(y = b_0 + b_1 x\)
`y ~ 0 + x`	\(y = b_1 x\)
`y ~ x + y`	\(y = b_0 + b_1 x + b_2 y\)
`y ~ x:y`	\(y = b_0 + b_1 x y\)
`y ~ x*y`	\(y = b_0 + b_1 x + b_2 y + b_3 x y\)
`y ~ I(x^2)`	\(y = b_0 + b_1 x^2\)

Aside: linear models that aren’t a line

set.seed(100)
quad_data <- tibble(
  x = runif(50, -5, 5),
  y = 2 * x^2 + 4*x + 5 + rnorm(50, sd = 5))

ggplot(quad_data, aes(x, y)) +
  geom_point()

Aside: linear models that aren’t a line

quad_mod <- lm(y ~ I(x^2) + x,
               data = quad_data)

quad_mod


Call:
lm(formula = y ~ I(x^2) + x, data = quad_data)

Coefficients:
(Intercept)       I(x^2)            x  
      4.898        2.006        4.307

Aside: linear models that aren’t a line

b0 <- quad_mod$coefficients[1]
b1 <- quad_mod$coefficients[2]
b2 <- quad_mod$coefficients[3]
ggplot(quad_data, aes(x, y)) +
  geom_point() +
  geom_function(fun = \(x) b0 + x^2 * b1 + x * b2, colour = "red")

brms

What is brms?

“Bayesian Regression Modelling in Stan”
written by Paul Bürkner
more on CRAN
relies on stan

The stan family

brms with R

Side note: Sampling a Distribution

Objective: Approximate a distribution
Problem: You can ask what the value of this distribution is at any point, but you only get the value at that point
Solution: Find a way to methodically sample (ask a lots of questions) in a way that your sample becomes representative. Bonus points if you can do so as efficiently as possible.

Side note: Sampling a Distribution

Pick a point, take the sample, record the result
Pick some candidates, and accept one with some probability referencing your current value
Jump to this new point
Do this a lot to make a chain
Improvements available including HMC

Markov Chain Monte Carlo

My First Bayesian Model

lm(y ~ x,
   data = fake_data)

bmod1 <- brm(y ~ x,
             data = fake_data,
             family = gaussian(link = "identity"),
             backend = "cmdstanr",
             file = "my_first_model")

My First Baysian Model

y ~ x

\[\begin{align} y &= \beta_0 + \beta_1 x + \epsilon\\ y &\sim \mathcal{N}(\beta_0 + \beta_1 x, \sigma^2) \end{align}\]

My First Bayesian Model


Compiling Stan program...
Start sampling
Running MCMC with 4 sequential chains...

Chain 1 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 1 Iteration:  100 / 2000 [  5%]  (Warmup) 
Chain 1 Iteration:  200 / 2000 [ 10%]  (Warmup) 
Chain 1 Iteration:  300 / 2000 [ 15%]  (Warmup) 
Chain 1 Iteration:  400 / 2000 [ 20%]  (Warmup) 
Chain 1 Iteration:  500 / 2000 [ 25%]  (Warmup) 
Chain 1 Iteration:  600 / 2000 [ 30%]  (Warmup) 
Chain 1 Iteration:  700 / 2000 [ 35%]  (Warmup) 
Chain 1 Iteration:  800 / 2000 [ 40%]  (Warmup) 
Chain 1 Iteration:  900 / 2000 [ 45%]  (Warmup) 
Chain 1 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 1 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 1 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
Chain 1 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
Chain 1 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
Chain 1 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
Chain 1 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
Chain 1 Iteration: 1600 / 2000 [ 80%]  (Sampling)

My First Bayesian Model


 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y ~ x 
   Data: fake_data (Number of observations: 100) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     3.71      1.14     1.51     5.95 1.00     3540     2793
x             9.97      0.20     9.58    10.36 1.00     3567     2746

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     5.13      0.38     4.47     5.92 1.00     3517     2585

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

My First Bayesian Model


 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y ~ x 
   Data: fake_data (Number of observations: 100) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     3.71      1.14     1.51     5.95 1.00     3540     2793
x             9.97      0.20     9.58    10.36 1.00     3567     2746

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     5.13      0.38     4.47     5.92 1.00     3517     2585

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

My First Bayesian Model


 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y ~ x 
   Data: fake_data (Number of observations: 100) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     3.71      1.14     1.51     5.95 1.00     3540     2793
x             9.97      0.20     9.58    10.36 1.00     3567     2746

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     5.13      0.38     4.47     5.92 1.00     3517     2585

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

My First Bayesian Model


 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y ~ x 
   Data: fake_data (Number of observations: 100) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     3.71      1.14     1.51     5.95 1.00     3540     2793
x             9.97      0.20     9.58    10.36 1.00     3567     2746

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     5.13      0.38     4.47     5.92 1.00     3517     2585

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

My First Bayesian Model


 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y ~ x 
   Data: fake_data (Number of observations: 100) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     3.71      1.14     1.51     5.95 1.00     3540     2793
x             9.97      0.20     9.58    10.36 1.00     3567     2746

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     5.13      0.38     4.47     5.92 1.00     3517     2585

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

My First Bayesian Model


 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y ~ x 
   Data: fake_data (Number of observations: 100) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     3.71      1.14     1.51     5.95 1.00     3540     2793
x             9.97      0.20     9.58    10.36 1.00     3567     2746

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     5.13      0.38     4.47     5.92 1.00     3517     2585

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

My First Bayesian Model


 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y ~ x 
   Data: fake_data (Number of observations: 100) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     3.71      1.14     1.51     5.95 1.00     3540     2793
x             9.97      0.20     9.58    10.36 1.00     3567     2746

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     5.13      0.38     4.47     5.92 1.00     3517     2585

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

My First Bayesian Model


 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y ~ x 
   Data: fake_data (Number of observations: 100) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     3.71      1.14     1.51     5.95 1.00     3540     2793
x             9.97      0.20     9.58    10.36 1.00     3567     2746

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     5.13      0.38     4.47     5.92 1.00     3517     2585

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

My First Bayesian Model - Hairy Caterpillars

plot(bmod1)

My First Bayesian Model - PP Check

pp_check(bmod1)

My First Bayesian Model - Draws

posterior::as_draws_df(bmod1)

# A draws_df: 1000 iterations, 4 chains, and 6 variables
   b_Intercept  b_x sigma Intercept lprior lp__
1          2.2 10.4   5.4        57   -8.4 -315
2          2.0 10.5   5.6        57   -8.4 -315
3          2.3 10.3   5.6        56   -8.4 -312
4          2.3 10.3   4.6        56   -8.4 -312
5          3.1 10.1   5.2        56   -8.4 -311
6          4.0  9.8   4.8        55   -8.4 -311
7          3.6 10.1   5.2        56   -8.4 -310
8          2.3 10.3   5.0        56   -8.4 -311
9          3.8 10.0   5.5        56   -8.4 -311
10         5.0  9.7   4.8        56   -8.4 -311
# ... with 3990 more draws
# ... hidden reserved variables {'.chain', '.iteration', '.draw'}

My First Bayesian Model - Draws

posterior::as_draws_df(bmod1) |>
  slice_sample(n = 10) |>
  ggplot() +
  geom_abline(mapping = aes(intercept = b_Intercept,
                            slope = b_x)) +
  geom_point(data = fake_data, mapping = aes(x = x, y = y))

Other Models

Robust Regression

reach for ridge or lasso regression
Bayesian way: model the errors with a student-t distribution rather than a gaussian

Binomial Data (coin flips etc)

coin_data <- tibble(flips = 50, heads = 10)
bmod_coin <- brm(
  heads | trials(flips) ~ 1,
  data = coin_data,
  family = binomial(link = "logit"),
  backend = "cmdstanr",
  file = "coins"
)

Coin Flips

bmod_coin

 Family: binomial 
  Links: mu = logit 
Formula: heads | trials(flips) ~ 1 
   Data: coin_data (Number of observations: 1) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    -1.39      0.35    -2.10    -0.75 1.00     1425     1578

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Coin Flips

inv_logit_scaled(fixef(bmod_coin)[1])

[1] 0.200066

10/50

[1] 0.2

Coin Flips

bayesplot::mcmc_areas(
  bmod_coin, 
  pars = "b_Intercept", # we only want to plot the Intercept parameter
  transformations = inv_logit_scaled) # convert to probability space

Side note: Beta Priors

ggplot() +
  geom_function(fun = \(x) dbeta(x, shape1 = 1, shape2 = 1), colour = "darkblue") + 
  geom_function(fun = \(x) dbeta(x, shape1 = 10, shape2 = 10), colour = "orange") +
  geom_function(fun = \(x) dbeta(x, shape1 = 100, shape2 = 100), colour = "purple")

Coin Flips with priors

bmod_coin_weak <- brm(
  heads | trials(flips) ~ 1,
  data = coin_data,
  prior = prior(beta(2, 2), class = "Intercept"), # <--
  family = binomial(link = "identity"),
  backend = "cmdstanr",
  file = "coins_weak"
)

bmod_coin_strong <- brm(
  heads | trials(flips) ~ 1,
  data = coin_data,
  prior = prior(beta(100, 100), class = "Intercept"), # <--
  family = binomial(link = "identity"),
  backend = "cmdstanr",
  file = "coins_strong"
)

Coin Flips with weak prior

bmod_coin_weak

 Family: binomial 
  Links: mu = identity 
Formula: heads | trials(flips) ~ 1 
   Data: coin_data (Number of observations: 1) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.22      0.05     0.12     0.34 1.00     1299     1416

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Coin Flips with Strong Prior

bmod_coin_strong

 Family: binomial 
  Links: mu = identity 
Formula: heads | trials(flips) ~ 1 
   Data: coin_data (Number of observations: 1) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.44      0.03     0.38     0.50 1.00     1464     1870

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Comparison

Multilevel Models

Reflect hierarchical structure of your problem domain
There’s some variation at a population level
Also some variation within sub entities
You want to share uncertainty/information between these levels

Multilevel Models

Extension to the formula:

y ~ p + (r | g)

p: population level effects
r: ‘random’ effects, i.e. group level effects
g: grouping variable

Multilevel Models

bmod_iris <- brm(
  Petal.Length ~ 1 + Sepal.Length + (1 + Sepal.Length | Species), 
  data = iris, 
  backend = "cmdstanr",
  file = "multilevel")

Multilevel Models


 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: Petal.Length ~ 1 + Sepal.Length + (1 + Sepal.Length | Species) 
   Data: iris (Number of observations: 150) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Multilevel Hyperparameters:
~Species (Number of levels: 3) 
                            Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)                   0.64      0.69     0.02     2.39 1.01      495      771
sd(Sepal.Length)                0.53      0.28     0.18     1.24 1.01      578      785
cor(Intercept,Sepal.Length)    -0.05      0.60    -0.97     0.95 1.00      634     1171

Multilevel Models


Regression Coefficients:
             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept        0.53      0.54    -0.46     2.02 1.01      376      132
Sepal.Length     0.59      0.24     0.12     1.12 1.01      660      589

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.26      0.02     0.23     0.30 1.01     1138      833

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Multilevel Models

fixef(bmod_iris)

              Estimate Est.Error       Q2.5    Q97.5
Intercept    0.5318709 0.5396609 -0.4638470 2.023480
Sepal.Length 0.5873578 0.2442129  0.1190371 1.115754

ranef(bmod_iris)

$Species
, , Intercept

              Estimate Est.Error      Q2.5     Q97.5
setosa      0.06899166 0.5537010 -1.256369 1.2012115
versicolor -0.15903353 0.5497306 -1.704974 0.8129511
virginica   0.02431809 0.4968133 -1.320186 1.0449123

, , Sepal.Length

              Estimate Est.Error       Q2.5      Q97.5
setosa     -0.41549242 0.2556132 -0.9461325 0.05927435
versicolor  0.06777262 0.2430173 -0.4618805 0.52713793
virginica   0.17090787 0.2443261 -0.3543854 0.64513015

Multilevel Models

Prediction

new_data <- tibble(Species = "setosa", Sepal.Length = 7)
(predictions <- tidybayes::add_epred_draws(bmod_iris,
                                           newdata = new_data))

# A tibble: 4,000 × 7
# Groups:   Species, Sepal.Length, .row [1]
   Species Sepal.Length  .row .chain .iteration .draw .epred
   <chr>          <dbl> <int>  <int>      <int> <int>  <dbl>
 1 setosa             7     1     NA         NA     1   1.77
 2 setosa             7     1     NA         NA     2   1.68
 3 setosa             7     1     NA         NA     3   1.62
 4 setosa             7     1     NA         NA     4   1.59
 5 setosa             7     1     NA         NA     5   1.91
 6 setosa             7     1     NA         NA     6   1.59
 7 setosa             7     1     NA         NA     7   1.61
 8 setosa             7     1     NA         NA     8   1.70
 9 setosa             7     1     NA         NA     9   1.81
10 setosa             7     1     NA         NA    10   1.75
# ℹ 3,990 more rows

Prediction

as_tibble(predictions) |>
  ggplot(aes(x = .epred)) +
  geom_histogram(bins = 50)

Predictions

Other Models

Too Many Zeros

More zeros than expected in the data
Hurdle models: if \(p < h\) then zero, otherwise (non-zero) value from other distribution
Zero-inflated: a mixture of models, e.g. some probability of getting zero, and some probability of getting a draw from a normal, e.g. poisson.

Too Many Zeros

Fake data: 20% chance of generating a value from a log normal distribution:

set.seed(1000)
p_lnorm <- 0.2
n <- 100

fake_hurdle_data <- tibble(
  r = runif(n),
  x = if_else(r < 0.2, rlnorm(n, meanlog = 4, sdlog = 4), 0))

Too Many Zeros

bmod_hurdle <-
  brm(x ~ 1,
      fake_hurdle_data,
      family = hurdle_lognormal(),
      backend = "cmdstanr",
      file = "hurdle")

Too Many Zeros

bmod_hurdle

 Family: hurdle_lognormal 
  Links: mu = identity; sigma = identity; hu = identity 
Formula: x ~ 1 
   Data: fake_hurdle_data (Number of observations: 100) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     4.32      0.70     3.00     5.69 1.00     3163     2520

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     3.35      0.51     2.54     4.49 1.00     2954     2224
hu        0.75      0.04     0.67     0.83 1.00     3606     3052

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Learn More

Resources

Doing Bayesian Data Analysis John K Kruschke
Statistical Rethinking Richard McElreath (plus YouTube/GitHub resources)
Solomon Kurz’s work rewriting books in brms
Regression and Other Stories Gelman, Hill, Vehtari

Resources

Next Steps

install brms
if Windows, get RTools for compilers etc
install cmdstanr and follow the instructions
Dip into stan?

Bonus: Stan code

bmod1 |> stancode()

// generated with brms 2.22.0
functions {
}
data {
  int<lower=1> N;  // total number of observations
  vector[N] Y;  // response variable
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  int<lower=1> Kc;  // number of population-level effects after centering
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
  matrix[N, Kc] Xc;  // centered version of X without an intercept
  vector[Kc] means_X;  // column means of X before centering
  for (i in 2:K) {
    means_X[i - 1] = mean(X[, i]);
    Xc[, i - 1] = X[, i] - means_X[i - 1];
  }
}
parameters {
  vector[Kc] b;  // regression coefficients
  real Intercept;  // temporary intercept for centered predictors
  real<lower=0> sigma;  // dispersion parameter
}
transformed parameters {
  real lprior = 0;  // prior contributions to the log posterior
  lprior += student_t_lpdf(Intercept | 3, 57, 33.6);
  lprior += student_t_lpdf(sigma | 3, 0, 33.6)
    - 1 * student_t_lccdf(0 | 3, 0, 33.6);
}
model {
  // likelihood including constants
  if (!prior_only) {
    target += normal_id_glm_lpdf(Y | Xc, Intercept, b, sigma);
  }
  // priors including constants
  target += lprior;
}
generated quantities {
  // actual population-level intercept
  real b_Intercept = Intercept - dot_product(means_X, b);
}

Bayesian Regression modelling with brms

Bayesian Primer

Why Bayes?

Why not Bayes?

What Bayes?

What Bayes?

What Bayes?

How Bayes?

Regression Modelling Recap

The humble lm

The humble lm

The humble lm

The humble lm

Which of these are linear models?

lm formula syntax

Aside: linear models that aren’t a line

Aside: linear models that aren’t a line

Aside: linear models that aren’t a line

brms

What is brms?

The stan family

brms with R

Side note: Sampling a Distribution

Side note: Sampling a Distribution

My First Bayesian Model

My First Baysian Model

My First Bayesian Model

My First Bayesian Model

My First Bayesian Model

My First Bayesian Model

My First Bayesian Model

My First Bayesian Model

My First Bayesian Model

My First Bayesian Model

My First Bayesian Model

My First Bayesian Model - Hairy Caterpillars

My First Bayesian Model - PP Check

My First Bayesian Model - Draws

My First Bayesian Model - Draws

Other Models

Robust Regression

Binomial Data (coin flips etc)

Coin Flips

Coin Flips

Coin Flips

Side note: Beta Priors

Coin Flips with priors

Coin Flips with weak prior

Coin Flips with Strong Prior

Comparison

Multilevel Models

Multilevel Models

Multilevel Models

Multilevel Models

Multilevel Models

Multilevel Models

Multilevel Models

Multilevel Models

Multilevel Models

Prediction

Prediction

Predictions

Other Models

Too Many Zeros

Too Many Zeros

Too Many Zeros

Too Many Zeros

Learn More

Resources

Resources

Next Steps

Bonus: Stan code

The humble `lm`

The humble `lm`

The humble `lm`

The humble `lm`

`lm` formula syntax