Introduction

This RMarkdown document outlines the steps to run a Bayesian workflow using the brms package in R. The brms package allows you to fit Bayesian generalized (non-)linear multivariate multilevel models using Stan for full Bayesian inference. Everything here is loosely based off of Visualization in Bayesian workflow by Gabry et al: https://arxiv.org/pdf/1709.01449.pdf.

Motivation

As a segue into model building in Stan, we will use a ‘canned solution’ for Bayesian inference via Stan’s HMC sampler in this lab.

Some of the major ‘canned solutions’ that allow users to estimate and process Stan models by calling pre-implemented R functions include:

Comparison: Pre-implemented model types

Source: Brückner, Paul-Christian (2022). brms: An R Package for Bayesian Multilevel Models using Stan.

A minor note on multiple imputation in brms

Newer versions of brms now handle missing values:

  • You can either generate multiply imputed data sets before model fitting, run multiple chains across each imputation, and then pool the posterior samples (\(\rightarrow\) fast, but heavy on CPU and RAM)
  • Alternatively, you can use Bayesian imputation during model fitting: brms will treat missing values as unknown quantities and jointly estimate your structural model of interest along with an imputation model (\(\rightarrow\) slow)

Installation and Loading of brms

First, you need to install the brms package if it is not already installed and then load it into your R session.

library(tidyverse)
library(modelsummary)
library(marginaleffects)
library(brms)

Application: A linear model in brms

We’ll work with a dataset that measures support for the AfD party (right-wing populist) in Germany.

load('gles.rda')

Choosing priors

brms uses default priors for certain “classes” of parameters. To check these defaults,supply the model formula, data, and generative model (i.e., family and link function) to brms::get_prior() and investigate the resulting object.

# Get default priors
default_priors = brms::get_prior(
  sup_afd ~                              # outcome
    la_self *                            # immigration preferences
    se_self +                            # redistribution preferences
    fem +                                # gender
    east +                               # east/west residence
    age,                                 # age
  data = gles,                           # data
  family = gaussian(link = "identity")   # family and link
)
default_priors
##                  prior     class            coef group resp dpar nlpar lb ub
##                 (flat)         b                                            
##                 (flat)         b             age                            
##                 (flat)         b            east                            
##                 (flat)         b             fem                            
##                 (flat)         b         la_self                            
##                 (flat)         b la_self:se_self                            
##                 (flat)         b         se_self                            
##  student_t(3, -5, 2.5) Intercept                                            
##   student_t(3, 0, 2.5)     sigma                                        0   
##        source
##       default
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##       default
##       default

Note: Missing entries in the prior column denote flat/uniform priors.

Define custom priors

Suppose you don’t like the default priors. You can create a brmsprior object by specifying the desired distributional properties of parameters of various classes:

custom_priors = c(
  brms::prior(normal(0, 5), class = b),          # normal slopes
  brms::prior(normal(0, 5), class = Intercept),  # normal intercept
  brms::prior(cauchy(0, 5), class = sigma)       # half-cauchy SD
)
custom_priors
##         prior     class coef group resp dpar nlpar   lb   ub source
##  normal(0, 5)         b                            <NA> <NA>   user
##  normal(0, 5) Intercept                            <NA> <NA>   user
##  cauchy(0, 5)     sigma                            <NA> <NA>   user

Fitting the model

Lastly, you can fit the model using brms::brm(). Familiarize yourself with the function arguments and relate them to the concepts discussed in the previous two lectures.

Notes:

  • Model compilation and estimation may take a while.
  • If, for whatever reason, brms throws an error when trying to call rstan on your machine, simply proceed to the next exercise. The required brmsfit object will be pre-loaded.
  • Do yourself a favor and save the model after you run it so you dont need to rerun it everytime.
if(!file.exists('lm_brms.rda')){
  
  lm_brms = brms::brm(
    sup_afd ~                              # outcome
      la_self *                            # immigration preferences
      se_self +                            # redistribution preferences
      fem +                                # gender
      east +                               # east/west residence
      age,                                 # age
    data = gles,                           # data
    family = gaussian(link = "identity"),  # family and link
    prior = custom_priors,                 # priors
    chains = 4,                            # number of chains
    cores = 4,                             # number of cpu cores
    iter = 2000,                           # number of iterations per chain
    warmup = 1000,                         # number of warm-up samples per chain
    seed = 6886                            # seed
  )
  
  save(lm_brms, file='lm_brms.rda')
  
} else {
  
  load('lm_brms.rda')  

}

Model summary and generic diagnostics

Print the model summary an familiarize yourself with the output. Check Rhat for any signs of non-convergence.

lm_brms
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: sup_afd ~ la_self * se_self + fem + east + age 
##    Data: gles (Number of observations: 1321) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Population-Level Effects: 
##                 Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept          -3.81      0.43    -4.63    -2.94 1.00     2974     2891
## la_self             0.31      0.05     0.20     0.41 1.00     2467     2589
## se_self            -0.12      0.06    -0.25     0.00 1.00     2374     2736
## fem                -0.59      0.14    -0.87    -0.33 1.00     5393     2439
## east                0.42      0.15     0.13     0.72 1.00     4534     2692
## age                -0.01      0.00    -0.02    -0.01 1.00     5985     3187
## la_self:se_self     0.01      0.01    -0.01     0.03 1.00     2250     2805
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     2.44      0.05     2.35     2.54 1.00     4482     2623
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Visual diagnostics

Explore the following visualizations of common generic diagnostics:

brms::mcmc_plot(lm_brms, type = "acf")   # Autocorrelation
## Warning: The `facets` argument of `facet_grid()` is deprecated as of ggplot2 2.2.0.
## ℹ Please use the `rows` argument instead.
## ℹ The deprecated feature was likely used in the bayesplot package.
##   Please report the issue at <https://github.com/stan-dev/bayesplot/issues/>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

brms::mcmc_plot(lm_brms, type = "trace") # Trace plots
## No divergences to plot.

Produce trace plots for a paper

modelsummary(lm_brms, gof_map = c('n_obs', 'r.squared', 'rmse'))
## Warning: 
## `modelsummary` uses the `performance` package to extract goodness-of-fit
## statistics from models of this class. You can specify the statistics you wish
## to compute by supplying a `metrics` argument to `modelsummary`, which will then
## push it forward to `performance`. Acceptable values are: "all", "common",
## "none", or a character vector of metrics names. For example: `modelsummary(mod,
## metrics = c("RMSE", "R2")` Note that some metrics are computationally
## expensive. See `?performance::performance` for details.
##  This warning appears once per session.
 (1)
b_Intercept −3.812
b_la_self 0.306
b_se_self −0.120
b_fem −0.594
b_east 0.417
b_age −0.015
b_la_self × se_self 0.012
sigma 2.440
R2 0.176
RMSE 2.43

Interpret interaction using plot_predictions

plot_predictions(
  lm_brms, 
  condition=list('la_self', 'se_self'=c(2, 8))
)

Could also get conditional marginal effects

plot_cme(
  lm_brms, 
  variables='la_self',
  condition='se_self'
)

And vice versa

plot_cme(
  lm_brms, 
  variables='se_self',
  condition='la_self'
)

Canned estimation, manual processing

For more flexibility, you can always extract the posterior samples of the coefficients and process them manually. We will illustrate this through a simple example.

You can extract posterior draws of the model parameters from a brmsfit object using the brms::prepare_predictions() function. The function returns a nested list; the $dpars sublist contains, among other, the posterior draws and the model data. Check it out below:

str(brms::prepare_predictions(lm_brms)$dpars)
## List of 2
##  $ mu   :List of 10
##   ..$ family:List of 11
##   .. ..$ family    : chr "gaussian"
##   .. ..$ link      : chr "identity"
##   .. ..$ linkfun   :function (mu)  
##   .. ..$ linkinv   :function (eta)  
##   .. ..$ dpars     : chr [1:2] "mu" "sigma"
##   .. ..$ type      : chr "real"
##   .. ..$ ybounds   : num [1:2] -Inf Inf
##   .. ..$ closed    : logi [1:2] NA NA
##   .. ..$ ad        : chr [1:7] "weights" "subset" "se" "cens" ...
##   .. ..$ normalized: chr [1:5] "_time_hom" "_time_het" "_lagsar" "_errorsar" ...
##   .. ..$ specials  : chr [1:2] "residuals" "rescor"
##   .. ..- attr(*, "class")= chr [1:2] "brmsfamily" "family"
##   ..$ ndraws: int 4000
##   ..$ nobs  : int 1321
##   ..$ fe    :List of 2
##   .. ..$ X: num [1:1321, 1:7] 1 1 1 1 1 1 1 1 1 1 ...
##   .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. ..$ : chr [1:1321] "1" "2" "3" "4" ...
##   .. .. .. ..$ : chr [1:7] "Intercept" "la_self" "se_self" "fem" ...
##   .. .. ..- attr(*, "assign")= int [1:7] 0 1 2 3 4 5 6
##   .. ..$ b: num [1:4000, 1:7] -3.3 -2.65 -2.16 -4.22 -4.46 ...
##   .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. ..$ draw    : chr [1:4000] "1" "2" "3" "4" ...
##   .. .. .. ..$ variable: chr [1:7] "b_Intercept" "b_la_self" "b_se_self" "b_fem" ...
##   .. .. ..- attr(*, "nchains")= int 4
##   ..$ sp    : list()
##   ..$ cs    : list()
##   ..$ sm    : list()
##   ..$ gp    : list()
##   ..$ re    : list()
##   ..$ ac    : list()
##   ..- attr(*, "class")= chr "bprepl"
##  $ sigma: num [1:4000] 2.5 2.47 2.42 2.43 2.46 ...

As you can see, the function output returns information pertaining to the systematic component of the model (incl. data and coefficients) in $dpars$mu$fe, and everything pertaining to the auxiliary scale parameter, \(\sigma\), in $dpars$mu$sigma.

Lets say we want to calculate the posterior distribution of the expected value of AfD support for a hypothetical male, 60 year-old resident of West Germany who holds strong anti-immigration and centrist redistribution preferences.

# Get posterior samples of coefficients
beta = brms::prepare_predictions(lm_brms)$dpars$mu$fe$b

# Check dimensions
dim(beta)
## [1] 4000    7
# Define covariate scenario
x_star = c(
  1,     # Leading 1 to multiply the intercept
  10,    # Very strong anti-immigration (la_self = 10)
  5,     # Centrist on redistribution (se_self = 5)
  0,     # male (fem = 0)
  0,     # West-German (east = 0)
  60,    # 60 years of age (age = 60)
  10 * 5 # Multiplicative term (la_self * se_self)
)

# Posterior distribution of the expected value
exp_val = beta %*% x_star

# Quantile summary
quantile(exp_val, c(.5, .025, .975))
##       50%      2.5%     97.5% 
## -1.647983 -1.967386 -1.330740

Model code

brms compiles a Stan model program in C++ in the rstan backend according to the brm() function arguments supplied by the user.

We can also get the underlying Stan model code.

brms::stancode(lm_brms)
## // generated with brms 2.20.4
## functions {
## }
## data {
##   int<lower=1> N;  // total number of observations
##   vector[N] Y;  // response variable
##   int<lower=1> K;  // number of population-level effects
##   matrix[N, K] X;  // population-level design matrix
##   int<lower=1> Kc;  // number of population-level effects after centering
##   int prior_only;  // should the likelihood be ignored?
## }
## transformed data {
##   matrix[N, Kc] Xc;  // centered version of X without an intercept
##   vector[Kc] means_X;  // column means of X before centering
##   for (i in 2:K) {
##     means_X[i - 1] = mean(X[, i]);
##     Xc[, i - 1] = X[, i] - means_X[i - 1];
##   }
## }
## parameters {
##   vector[Kc] b;  // regression coefficients
##   real Intercept;  // temporary intercept for centered predictors
##   real<lower=0> sigma;  // dispersion parameter
## }
## transformed parameters {
##   real lprior = 0;  // prior contributions to the log posterior
##   lprior += normal_lpdf(b | 0, 5);
##   lprior += normal_lpdf(Intercept | 0, 5);
##   lprior += cauchy_lpdf(sigma | 0, 5)
##     - 1 * cauchy_lccdf(0 | 0, 5);
## }
## model {
##   // likelihood including constants
##   if (!prior_only) {
##     target += normal_id_glm_lpdf(Y | Xc, Intercept, b, sigma);
##   }
##   // priors including constants
##   target += lprior;
## }
## generated quantities {
##   // actual population-level intercept
##   real b_Intercept = Intercept - dot_product(means_X, b);
## }

Posterior predictive checks

Posterior predictive checks

Posterior predictive checks involve simulating the data-generating process to obtain replicated data given the estimated model. They can help us determine how well our model fits the data.

This usually involves two questions:

  1. Does the family yield an adequate generative model?
    • Does a Gaussian (normal) data-generating processes produce realistic replications of the observed values of sup_afd (support for the AfD on the -5 to +5 scale)?
    • Does the simulated distribution of the replications match the observed distribution of the outcome in the sample?
  2. Does the systematic component accurately predict outcomes?
    • Do our predictors – i.e., an interaction of redistribution and immigration preferences – accurately predict which individuals are more likely to support the AfD?
    • How large is the observation-level discrepancy between simulated replications and observed data?

Distributional congruence

To check whether the generative model produces distributions of replicated outcomes that match the distribution of the observed outcome, we can compare the density of the observed outcome with those of, say, ndraws = 100 simulations. Each simulation is based on one post-warm-up sample from the posterior distribution.

brms::pp_check(lm_brms, ndraws = 100, type = "dens_overlay")

So, what do you think?

Observation-level prediction error

To check the predictive accuracy of the model, we can investigate the distribution of observation-level prediction errors. A model with perfect fit would produce an error of \(0\) for all \(N\) observations.

Below, you see the distribution of errors for our linear model. What do you think?

brms::pp_check(lm_brms, ndraws = 1, type = "error_hist")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Refit the model as a zero-one-inflated beta (ZOIB) regression

Zero-one-inflated beta (ZOIB) regression models bounded continuous outcomes on the unit (i.e., \([0,1]\)) interval. The ZOIB model is a GLM with a multi-family likelihood, meaning that its likelihood is composed of a mixture of several constitutive likelihoods. Specifically, it supplements a beta pdf for values \(y \in ]0, 1[\) with additional pmfs for the boundary values \(y \in \{0,1\}\).

Some background on the ZOIB model

To fully understand the model, we must quickly return to the beta distribution, which we previously characterized in terms of two strictly positive shape parameters, \(a>0\) and \(b>0\).

These shape parameters can also be expressed in terms of a location parameter \(\mu\) and a scale parameter \(\phi\), where \(a = \mu \cdot \phi\) and \(b = (1 - \mu) \cdot \phi\), with \(\mu \in [0, 1]\) and \(\phi > 0\).

Next to this reparameterized beta pdf, the model accommodates two Bernoulli pmfs: The first models 0/1 observations as a function of a probability parameter \(\text{zoi}\) (short for zero-one-inflation), the second models 1’s (as opposed to 0’s), conditional on observing a 0/1 observation, as a function of a probability parameter \(\text{coi}\) (short for conditional-one-inflation).

Taken together, the zero-one-inflated beta model is given by:

\[ p(y_i| \mu_i, \phi_i, \text{zoi}_i, \text{coi}_i) = \begin{cases} \text{Bernoulli}(1 | \text{zoi}_i) \times \text{Bernoulli}(0| \text{coi}_i)& \text{if } y = 0 \\ \text{Bernoulli}(1 | \text{zoi}_i) \times \text{Bernoulli}(1| \text{coi}_i)& \text{if } y = 1 \\ \text{Bernoulli}(0 | \text{zoi}_i) \times \text{Beta}(y| \mu_i \cdot \phi_i, (1 - \mu_i) \cdot \phi_i)& \text{if } y_i \in ]0,1[ \end{cases} \]

where

\[ \begin{align} \text{zoi}_i = & \text{invlogit}(\mathbf{x}_i^\prime{\beta}_{\text{zoi}}) \\ \text{coi}_i = & \text{invlogit}(\mathbf{x}_i^\prime{\beta}_{\text{coi}}) \\ \mu_i = & \text{invlogit}(\mathbf{x}_i^\prime{\beta}_{\mu}) \\ \phi_i = & \log(\mathbf{x}_i^\prime{\beta}_{\phi}) \end{align} \]

Note: Technically, we could use different predictors in each of these four equations. But to keep it “simple”, we will use the same \(\mathbf{x}_i\) across all four.

Inference

To model a bounded continuous outcome on the unit interval, we must first rescale it. Therefore, we will transform the scale of AfD support to range from 0 to 1 (with midpoint 0.5) instead of -5 to +5 (with midpoint 0). No worries, we can always scale it back later on.

gles = gles %>%
  dplyr::mutate(
    sup_afd_unit = scales::rescale(sup_afd, to = c(0, 1))
  )

We can fit in brms (note that the corresponding brmsfit object will be pre-loaded):

if(!file.exists('zoib_brms.rda')){
  zoib_brms = brms::brm(
    formula = bf(
      sup_afd_unit ~ la_self * se_self, # Model for the mean of the beta
      phi ~ la_self * se_self,          # Model for the precision of the beta
      zoi ~ la_self * se_self,          # Model for zero/one inflation
      coi ~ la_self * se_self           # Conditional model for zero vs one infl.
    ),                             
    data = gles,                           # data
    family = zero_one_inflated_beta(       # family and links
      link = "logit",
      link_phi = "log",
      link_zoi = "logit",
      link_coi = "logit"
    ),
    prior = c(                             # priors
      brms::prior(normal(0, 5), class = b),
      brms::prior(normal(0, 5), class = Intercept)
    ),
    chains = 4,                            # number of chains
    cores = 4,                             # number of cores
    iter = 2000,                           # number of iterations per chain
    warmup = 1000,                         # number of warm-up samples per chain
    seed = 6886                            # seed
  )  
  
  save(zoib_brms, file='zoib_brms.rda')
} else {
  load('zoib_brms.rda') 
}

Posterior predictive checks

We first observe the distributional congruence of the ZOIB-generated outcome simulations.

brms::pp_check(zoib_brms, ndraws = 100, type = "dens_overlay")

We then turn to checking observation-level prediction errors.

brms::pp_check(zoib_brms, ndraws = 1, type = "error_hist")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

And lets recheck our substantive interpretation:

plot_predictions(
  zoib_brms, 
  condition=list(
    'la_self', 'se_self'=c(2, 8)
  )
)