Bayesian Regression modelling with brms

Michael Jones

2025-05-07

Bayesian Primer

Why Bayes?

  • Philosophical: Probability can encode uncertainty
  • Practical: use distributions
  • Update beliefs based on data
  • Do away with many different statistical tests
  • Coherent system right up to complex non-linear models

Why not Bayes?

  • Slow(er) than Frequentist methods
  • More computationally intensive
  • Models might not converge
  • Have to provide a prior
  • Acceptance?

What Bayes?

\[ P(\theta|y) = \frac{P(y|\theta)P(\theta)}{P(y)} \]

What Bayes?

\[ P(\text{posterior}) = \frac{P(\text{likelihood})P(\text{prior})}{P(y)} \]

What Bayes?

\[ P(\text{posterior}) \propto {P(\text{likelihood})P(\text{prior})} \]

How Bayes?

  • Pen and paper conjugate priors
  • Computers Markov Chain Monte Carlo: BUGS/JAGS with Metropolis Hastings and Gibbs Sampling or Stan with HMC and NUTS

Regression Modelling Recap

The humble lm

set.seed(100)
fake_data <- tibble(x = runif(100, 0, 10),
                    y = 10 * x + 4 + rnorm(100, 0, 5))

ggplot(fake_data, aes(x = x, y = y)) + geom_point()

The humble lm

lm(y ~ x,
   data = fake_data)

Call:
lm(formula = y ~ x, data = fake_data)

Coefficients:
(Intercept)            x  
      3.729        9.968  

The humble lm

lm(y ~ x,
   data = fake_data)

Call:
lm(formula = y ~ x, data = fake_data)

Coefficients:
(Intercept)            x  
      3.729        9.968  

The humble lm

lm(y ~ x,
   data = fake_data)

Call:
lm(formula = y ~ x, data = fake_data)

Coefficients:
(Intercept)            x  
      3.729        9.968  

Which of these are linear models?

\[\begin{align} y &= \beta_0 + \beta_1 x\\ y &= \beta_0 + \beta_1 x_1 + \beta_2 x_2\\ y &= \exp(\beta_0 + \beta_1 x_1 + \beta_2 x_2)\\ y &= \beta_0 + \beta_1 x^2\\ y &= \beta_0 + \beta_1^2 x \end{align}\]

lm formula syntax

formula maths
y ~ x \(y = b_0 + b_1 x\)
y ~ 1 + x \(y = b_0 + b_1 x\)
y ~ 0 + x \(y = b_1 x\)
y ~ x + y \(y = b_0 + b_1 x + b_2 y\)
y ~ x:y \(y = b_0 + b_1 x y\)
y ~ x*y \(y = b_0 + b_1 x + b_2 y + b_3 x y\)
y ~ I(x^2) \(y = b_0 + b_1 x^2\)

Aside: linear models that aren’t a line

set.seed(100)
quad_data <- tibble(
  x = runif(50, -5, 5),
  y = 2 * x^2 + 4*x + 5 + rnorm(50, sd = 5))

ggplot(quad_data, aes(x, y)) +
  geom_point()

Aside: linear models that aren’t a line

quad_mod <- lm(y ~ I(x^2) + x,
               data = quad_data)

quad_mod

Call:
lm(formula = y ~ I(x^2) + x, data = quad_data)

Coefficients:
(Intercept)       I(x^2)            x  
      4.898        2.006        4.307  

Aside: linear models that aren’t a line

b0 <- quad_mod$coefficients[1]
b1 <- quad_mod$coefficients[2]
b2 <- quad_mod$coefficients[3]
ggplot(quad_data, aes(x, y)) +
  geom_point() +
  geom_function(fun = \(x) b0 + x^2 * b1 + x * b2, colour = "red")

brms

What is brms?

  • “Bayesian Regression Modelling in Stan”
  • written by Paul Bürkner
  • more on CRAN
  • relies on stan

The stan family

rstanarmbrmsR packagesrstancmdstanrCallerspystanjuliastanOther LanguagesstanStan Itselftidybayesbayesplotposteriorlooshinystan....Helper R packages

brms with R

stanc++Rbrmsbrms.fit

Side note: Sampling a Distribution

  • Objective: Approximate a distribution
  • Problem: You can ask what the value of this distribution is at any point, but you only get the value at that point
  • Solution: Find a way to methodically sample (ask a lots of questions) in a way that your sample becomes representative. Bonus points if you can do so as efficiently as possible.

Side note: Sampling a Distribution

  • Pick a point, take the sample, record the result
  • Pick some candidates, and accept one with some probability referencing your current value
  • Jump to this new point
  • Do this a lot to make a chain
  • Improvements available including HMC

Markov Chain Monte Carlo

My First Bayesian Model

lm(y ~ x,
   data = fake_data)
bmod1 <- brm(y ~ x,
             data = fake_data,
             family = gaussian(link = "identity"),
             backend = "cmdstanr",
             file = "my_first_model")

My First Baysian Model

y ~ x

\[\begin{align} y &= \beta_0 + \beta_1 x + \epsilon\\ y &\sim \mathcal{N}(\beta_0 + \beta_1 x, \sigma^2) \end{align}\]

My First Bayesian Model


Compiling Stan program...
Start sampling
Running MCMC with 4 sequential chains...

Chain 1 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 1 Iteration:  100 / 2000 [  5%]  (Warmup) 
Chain 1 Iteration:  200 / 2000 [ 10%]  (Warmup) 
Chain 1 Iteration:  300 / 2000 [ 15%]  (Warmup) 
Chain 1 Iteration:  400 / 2000 [ 20%]  (Warmup) 
Chain 1 Iteration:  500 / 2000 [ 25%]  (Warmup) 
Chain 1 Iteration:  600 / 2000 [ 30%]  (Warmup) 
Chain 1 Iteration:  700 / 2000 [ 35%]  (Warmup) 
Chain 1 Iteration:  800 / 2000 [ 40%]  (Warmup) 
Chain 1 Iteration:  900 / 2000 [ 45%]  (Warmup) 
Chain 1 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 1 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 1 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
Chain 1 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
Chain 1 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
Chain 1 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
Chain 1 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
Chain 1 Iteration: 1600 / 2000 [ 80%]  (Sampling) 

My First Bayesian Model


 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y ~ x 
   Data: fake_data (Number of observations: 100) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     3.71      1.14     1.51     5.95 1.00     3540     2793
x             9.97      0.20     9.58    10.36 1.00     3567     2746

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     5.13      0.38     4.47     5.92 1.00     3517     2585

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

My First Bayesian Model


 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y ~ x 
   Data: fake_data (Number of observations: 100) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     3.71      1.14     1.51     5.95 1.00     3540     2793
x             9.97      0.20     9.58    10.36 1.00     3567     2746

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     5.13      0.38     4.47     5.92 1.00     3517     2585

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

My First Bayesian Model


 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y ~ x 
   Data: fake_data (Number of observations: 100) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     3.71      1.14     1.51     5.95 1.00     3540     2793
x             9.97      0.20     9.58    10.36 1.00     3567     2746

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     5.13      0.38     4.47     5.92 1.00     3517     2585

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

My First Bayesian Model


 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y ~ x 
   Data: fake_data (Number of observations: 100) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     3.71      1.14     1.51     5.95 1.00     3540     2793
x             9.97      0.20     9.58    10.36 1.00     3567     2746

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     5.13      0.38     4.47     5.92 1.00     3517     2585

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

My First Bayesian Model


 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y ~ x 
   Data: fake_data (Number of observations: 100) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     3.71      1.14     1.51     5.95 1.00     3540     2793
x             9.97      0.20     9.58    10.36 1.00     3567     2746

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     5.13      0.38     4.47     5.92 1.00     3517     2585

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

My First Bayesian Model


 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y ~ x 
   Data: fake_data (Number of observations: 100) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     3.71      1.14     1.51     5.95 1.00     3540     2793
x             9.97      0.20     9.58    10.36 1.00     3567     2746

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     5.13      0.38     4.47     5.92 1.00     3517     2585

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

My First Bayesian Model


 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y ~ x 
   Data: fake_data (Number of observations: 100) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     3.71      1.14     1.51     5.95 1.00     3540     2793
x             9.97      0.20     9.58    10.36 1.00     3567     2746

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     5.13      0.38     4.47     5.92 1.00     3517     2585

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

My First Bayesian Model


 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y ~ x 
   Data: fake_data (Number of observations: 100) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     3.71      1.14     1.51     5.95 1.00     3540     2793
x             9.97      0.20     9.58    10.36 1.00     3567     2746

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     5.13      0.38     4.47     5.92 1.00     3517     2585

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

My First Bayesian Model - Hairy Caterpillars

plot(bmod1)

My First Bayesian Model - PP Check

pp_check(bmod1)

My First Bayesian Model - Draws

posterior::as_draws_df(bmod1)
# A draws_df: 1000 iterations, 4 chains, and 6 variables
   b_Intercept  b_x sigma Intercept lprior lp__
1          2.2 10.4   5.4        57   -8.4 -315
2          2.0 10.5   5.6        57   -8.4 -315
3          2.3 10.3   5.6        56   -8.4 -312
4          2.3 10.3   4.6        56   -8.4 -312
5          3.1 10.1   5.2        56   -8.4 -311
6          4.0  9.8   4.8        55   -8.4 -311
7          3.6 10.1   5.2        56   -8.4 -310
8          2.3 10.3   5.0        56   -8.4 -311
9          3.8 10.0   5.5        56   -8.4 -311
10         5.0  9.7   4.8        56   -8.4 -311
# ... with 3990 more draws
# ... hidden reserved variables {'.chain', '.iteration', '.draw'}

My First Bayesian Model - Draws

posterior::as_draws_df(bmod1) |>
  slice_sample(n = 10) |>
  ggplot() +
  geom_abline(mapping = aes(intercept = b_Intercept,
                            slope = b_x)) +
  geom_point(data = fake_data, mapping = aes(x = x, y = y))

Other Models

Robust Regression

  • reach for ridge or lasso regression
  • Bayesian way: model the errors with a student-t distribution rather than a gaussian

Binomial Data (coin flips etc)

coin_data <- tibble(flips = 50, heads = 10)
bmod_coin <- brm(
  heads | trials(flips) ~ 1,
  data = coin_data,
  family = binomial(link = "logit"),
  backend = "cmdstanr",
  file = "coins"
)

Coin Flips

bmod_coin
 Family: binomial 
  Links: mu = logit 
Formula: heads | trials(flips) ~ 1 
   Data: coin_data (Number of observations: 1) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    -1.39      0.35    -2.10    -0.75 1.00     1425     1578

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Coin Flips

inv_logit_scaled(fixef(bmod_coin)[1])
[1] 0.200066
10/50
[1] 0.2

Coin Flips

bayesplot::mcmc_areas(
  bmod_coin, 
  pars = "b_Intercept", # we only want to plot the Intercept parameter
  transformations = inv_logit_scaled) # convert to probability space

Side note: Beta Priors

ggplot() +
  geom_function(fun = \(x) dbeta(x, shape1 = 1, shape2 = 1), colour = "darkblue") + 
  geom_function(fun = \(x) dbeta(x, shape1 = 10, shape2 = 10), colour = "orange") +
  geom_function(fun = \(x) dbeta(x, shape1 = 100, shape2 = 100), colour = "purple")

Coin Flips with priors

bmod_coin_weak <- brm(
  heads | trials(flips) ~ 1,
  data = coin_data,
  prior = prior(beta(2, 2), class = "Intercept"), # <--
  family = binomial(link = "identity"),
  backend = "cmdstanr",
  file = "coins_weak"
)

bmod_coin_strong <- brm(
  heads | trials(flips) ~ 1,
  data = coin_data,
  prior = prior(beta(100, 100), class = "Intercept"), # <--
  family = binomial(link = "identity"),
  backend = "cmdstanr",
  file = "coins_strong"
)

Coin Flips with weak prior

bmod_coin_weak
 Family: binomial 
  Links: mu = identity 
Formula: heads | trials(flips) ~ 1 
   Data: coin_data (Number of observations: 1) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.22      0.05     0.12     0.34 1.00     1299     1416

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Coin Flips with Strong Prior

bmod_coin_strong
 Family: binomial 
  Links: mu = identity 
Formula: heads | trials(flips) ~ 1 
   Data: coin_data (Number of observations: 1) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.44      0.03     0.38     0.50 1.00     1464     1870

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Comparison

Multilevel Models

Multilevel Models

  • Reflect hierarchical structure of your problem domain
  • There’s some variation at a population level
  • Also some variation within sub entities
  • You want to share uncertainty/information between these levels

Multilevel Models

Extension to the formula:

y ~ p + (r | g)
  • p: population level effects
  • r: ‘random’ effects, i.e. group level effects
  • g: grouping variable

Multilevel Models

y ~ xy ~ x + (1 | g)y ~ 1 + (0 + x | g)y ~ 1 + x + (1 + x | g)

Multilevel Models

bmod_iris <- brm(
  Petal.Length ~ 1 + Sepal.Length + (1 + Sepal.Length | Species), 
  data = iris, 
  backend = "cmdstanr",
  file = "multilevel")

Multilevel Models


 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: Petal.Length ~ 1 + Sepal.Length + (1 + Sepal.Length | Species) 
   Data: iris (Number of observations: 150) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Multilevel Hyperparameters:
~Species (Number of levels: 3) 
                            Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)                   0.64      0.69     0.02     2.39 1.01      495      771
sd(Sepal.Length)                0.53      0.28     0.18     1.24 1.01      578      785
cor(Intercept,Sepal.Length)    -0.05      0.60    -0.97     0.95 1.00      634     1171

Multilevel Models


Regression Coefficients:
             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept        0.53      0.54    -0.46     2.02 1.01      376      132
Sepal.Length     0.59      0.24     0.12     1.12 1.01      660      589

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.26      0.02     0.23     0.30 1.01     1138      833

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Multilevel Models

fixef(bmod_iris)
              Estimate Est.Error       Q2.5    Q97.5
Intercept    0.5318709 0.5396609 -0.4638470 2.023480
Sepal.Length 0.5873578 0.2442129  0.1190371 1.115754
ranef(bmod_iris)
$Species
, , Intercept

              Estimate Est.Error      Q2.5     Q97.5
setosa      0.06899166 0.5537010 -1.256369 1.2012115
versicolor -0.15903353 0.5497306 -1.704974 0.8129511
virginica   0.02431809 0.4968133 -1.320186 1.0449123

, , Sepal.Length

              Estimate Est.Error       Q2.5      Q97.5
setosa     -0.41549242 0.2556132 -0.9461325 0.05927435
versicolor  0.06777262 0.2430173 -0.4618805 0.52713793
virginica   0.17090787 0.2443261 -0.3543854 0.64513015

Multilevel Models

Prediction

new_data <- tibble(Species = "setosa", Sepal.Length = 7)
(predictions <- tidybayes::add_epred_draws(bmod_iris,
                                           newdata = new_data))
# A tibble: 4,000 × 7
# Groups:   Species, Sepal.Length, .row [1]
   Species Sepal.Length  .row .chain .iteration .draw .epred
   <chr>          <dbl> <int>  <int>      <int> <int>  <dbl>
 1 setosa             7     1     NA         NA     1   1.77
 2 setosa             7     1     NA         NA     2   1.68
 3 setosa             7     1     NA         NA     3   1.62
 4 setosa             7     1     NA         NA     4   1.59
 5 setosa             7     1     NA         NA     5   1.91
 6 setosa             7     1     NA         NA     6   1.59
 7 setosa             7     1     NA         NA     7   1.61
 8 setosa             7     1     NA         NA     8   1.70
 9 setosa             7     1     NA         NA     9   1.81
10 setosa             7     1     NA         NA    10   1.75
# ℹ 3,990 more rows

Prediction

as_tibble(predictions) |>
  ggplot(aes(x = .epred)) +
  geom_histogram(bins = 50)

Predictions

Other Models

Too Many Zeros

  • More zeros than expected in the data
  • Hurdle models: if \(p < h\) then zero, otherwise (non-zero) value from other distribution
  • Zero-inflated: a mixture of models, e.g. some probability of getting zero, and some probability of getting a draw from a normal, e.g. poisson.

Too Many Zeros

Fake data: 20% chance of generating a value from a log normal distribution:

set.seed(1000)
p_lnorm <- 0.2
n <- 100

fake_hurdle_data <- tibble(
  r = runif(n),
  x = if_else(r < 0.2, rlnorm(n, meanlog = 4, sdlog = 4), 0))

Too Many Zeros

bmod_hurdle <-
  brm(x ~ 1,
      fake_hurdle_data,
      family = hurdle_lognormal(),
      backend = "cmdstanr",
      file = "hurdle")

Too Many Zeros

bmod_hurdle
 Family: hurdle_lognormal 
  Links: mu = identity; sigma = identity; hu = identity 
Formula: x ~ 1 
   Data: fake_hurdle_data (Number of observations: 100) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     4.32      0.70     3.00     5.69 1.00     3163     2520

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     3.35      0.51     2.54     4.49 1.00     2954     2224
hu        0.75      0.04     0.67     0.83 1.00     3606     3052

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Learn More

Resources

Resources

Next Steps

  • install brms
  • if Windows, get RTools for compilers etc
  • install cmdstanr and follow the instructions
  • Dip into stan?

Bonus: Stan code

bmod1 |> stancode()
// generated with brms 2.22.0
functions {
}
data {
  int<lower=1> N;  // total number of observations
  vector[N] Y;  // response variable
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  int<lower=1> Kc;  // number of population-level effects after centering
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
  matrix[N, Kc] Xc;  // centered version of X without an intercept
  vector[Kc] means_X;  // column means of X before centering
  for (i in 2:K) {
    means_X[i - 1] = mean(X[, i]);
    Xc[, i - 1] = X[, i] - means_X[i - 1];
  }
}
parameters {
  vector[Kc] b;  // regression coefficients
  real Intercept;  // temporary intercept for centered predictors
  real<lower=0> sigma;  // dispersion parameter
}
transformed parameters {
  real lprior = 0;  // prior contributions to the log posterior
  lprior += student_t_lpdf(Intercept | 3, 57, 33.6);
  lprior += student_t_lpdf(sigma | 3, 0, 33.6)
    - 1 * student_t_lccdf(0 | 3, 0, 33.6);
}
model {
  // likelihood including constants
  if (!prior_only) {
    target += normal_id_glm_lpdf(Y | Xc, Intercept, b, sigma);
  }
  // priors including constants
  target += lprior;
}
generated quantities {
  // actual population-level intercept
  real b_Intercept = Intercept - dot_product(means_X, b);
}