This RMarkdown document outlines the steps to run a Bayesian workflow
using the brms
package in R. The brms
package
allows you to fit Bayesian generalized (non-)linear multivariate
multilevel models using Stan for full Bayesian inference. Everything
here is loosely based off of Visualization in Bayesian workflow by Gabry
et al: https://arxiv.org/pdf/1709.01449.pdf.
As a segue into model building in Stan, we will use a ‘canned solution’ for Bayesian inference via Stan’s HMC sampler in this lab.
Some of the major ‘canned solutions’ that allow users to estimate and process Stan models by calling pre-implemented R functions include:
Source: Brückner, Paul-Christian (2022). brms: An R Package for Bayesian Multilevel Models using Stan.
Newer versions of brms
now handle missing
values:
brms
will treat missing values as unknown
quantities and jointly estimate your structural model of interest along
with an imputation model (\(\rightarrow\) slow)First, you need to install the brms
package if it is not
already installed and then load it into your R session.
library(tidyverse)
library(modelsummary)
library(marginaleffects)
library(brms)
We’ll work with a dataset that measures support for the AfD party (right-wing populist) in Germany.
load('gles.rda')
brms
uses default priors for certain “classes” of
parameters. To check these defaults,supply the model formula, data, and
generative model (i.e., family and link function) to
brms::get_prior()
and investigate the resulting object.
# Get default priors
default_priors = brms::get_prior(
sup_afd ~ # outcome
la_self * # immigration preferences
se_self + # redistribution preferences
fem + # gender
east + # east/west residence
age, # age
data = gles, # data
family = gaussian(link = "identity") # family and link
)
default_priors
## prior class coef group resp dpar nlpar lb ub
## (flat) b
## (flat) b age
## (flat) b east
## (flat) b fem
## (flat) b la_self
## (flat) b la_self:se_self
## (flat) b se_self
## student_t(3, -5, 2.5) Intercept
## student_t(3, 0, 2.5) sigma 0
## source
## default
## (vectorized)
## (vectorized)
## (vectorized)
## (vectorized)
## (vectorized)
## (vectorized)
## default
## default
Note: Missing entries in the prior
column
denote flat/uniform priors.
Suppose you don’t like the default priors. You can create a
brmsprior
object by specifying the desired distributional
properties of parameters of various classes:
custom_priors = c(
brms::prior(normal(0, 5), class = b), # normal slopes
brms::prior(normal(0, 5), class = Intercept), # normal intercept
brms::prior(cauchy(0, 5), class = sigma) # half-cauchy SD
)
custom_priors
## prior class coef group resp dpar nlpar lb ub source
## normal(0, 5) b <NA> <NA> user
## normal(0, 5) Intercept <NA> <NA> user
## cauchy(0, 5) sigma <NA> <NA> user
Lastly, you can fit the model using brms::brm()
.
Familiarize yourself with the function arguments and relate them to the
concepts discussed in the previous two lectures.
Notes:
brms
throws an error when
trying to call rstan
on your machine, simply proceed to the
next exercise. The required brmsfit
object will be
pre-loaded.if(!file.exists('lm_brms.rda')){
lm_brms = brms::brm(
sup_afd ~ # outcome
la_self * # immigration preferences
se_self + # redistribution preferences
fem + # gender
east + # east/west residence
age, # age
data = gles, # data
family = gaussian(link = "identity"), # family and link
prior = custom_priors, # priors
chains = 4, # number of chains
cores = 4, # number of cpu cores
iter = 2000, # number of iterations per chain
warmup = 1000, # number of warm-up samples per chain
seed = 6886 # seed
)
save(lm_brms, file='lm_brms.rda')
} else {
load('lm_brms.rda')
}
Print the model summary an familiarize yourself with the output.
Check Rhat
for any signs of non-convergence.
lm_brms
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: sup_afd ~ la_self * se_self + fem + east + age
## Data: gles (Number of observations: 1321)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept -3.81 0.43 -4.63 -2.94 1.00 2974 2891
## la_self 0.31 0.05 0.20 0.41 1.00 2467 2589
## se_self -0.12 0.06 -0.25 0.00 1.00 2374 2736
## fem -0.59 0.14 -0.87 -0.33 1.00 5393 2439
## east 0.42 0.15 0.13 0.72 1.00 4534 2692
## age -0.01 0.00 -0.02 -0.01 1.00 5985 3187
## la_self:se_self 0.01 0.01 -0.01 0.03 1.00 2250 2805
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 2.44 0.05 2.35 2.54 1.00 4482 2623
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Explore the following visualizations of common generic diagnostics:
brms::mcmc_plot(lm_brms, type = "acf") # Autocorrelation
## Warning: The `facets` argument of `facet_grid()` is deprecated as of ggplot2 2.2.0.
## ℹ Please use the `rows` argument instead.
## ℹ The deprecated feature was likely used in the bayesplot package.
## Please report the issue at <https://github.com/stan-dev/bayesplot/issues/>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
brms::mcmc_plot(lm_brms, type = "trace") # Trace plots
## No divergences to plot.
modelsummary(lm_brms, gof_map = c('n_obs', 'r.squared', 'rmse'))
## Warning:
## `modelsummary` uses the `performance` package to extract goodness-of-fit
## statistics from models of this class. You can specify the statistics you wish
## to compute by supplying a `metrics` argument to `modelsummary`, which will then
## push it forward to `performance`. Acceptable values are: "all", "common",
## "none", or a character vector of metrics names. For example: `modelsummary(mod,
## metrics = c("RMSE", "R2")` Note that some metrics are computationally
## expensive. See `?performance::performance` for details.
## This warning appears once per session.
(1) | |
---|---|
b_Intercept | −3.812 |
b_la_self | 0.306 |
b_se_self | −0.120 |
b_fem | −0.594 |
b_east | 0.417 |
b_age | −0.015 |
b_la_self × se_self | 0.012 |
sigma | 2.440 |
R2 | 0.176 |
RMSE | 2.43 |
plot_predictions(
lm_brms,
condition=list('la_self', 'se_self'=c(2, 8))
)
plot_cme(
lm_brms,
variables='la_self',
condition='se_self'
)
And vice versa
plot_cme(
lm_brms,
variables='se_self',
condition='la_self'
)
For more flexibility, you can always extract the posterior samples of the coefficients and process them manually. We will illustrate this through a simple example.
You can extract posterior draws of the model parameters from a
brmsfit
object using the
brms::prepare_predictions()
function. The function returns
a nested list; the $dpars
sublist contains, among other,
the posterior draws and the model data. Check it out below:
str(brms::prepare_predictions(lm_brms)$dpars)
## List of 2
## $ mu :List of 10
## ..$ family:List of 11
## .. ..$ family : chr "gaussian"
## .. ..$ link : chr "identity"
## .. ..$ linkfun :function (mu)
## .. ..$ linkinv :function (eta)
## .. ..$ dpars : chr [1:2] "mu" "sigma"
## .. ..$ type : chr "real"
## .. ..$ ybounds : num [1:2] -Inf Inf
## .. ..$ closed : logi [1:2] NA NA
## .. ..$ ad : chr [1:7] "weights" "subset" "se" "cens" ...
## .. ..$ normalized: chr [1:5] "_time_hom" "_time_het" "_lagsar" "_errorsar" ...
## .. ..$ specials : chr [1:2] "residuals" "rescor"
## .. ..- attr(*, "class")= chr [1:2] "brmsfamily" "family"
## ..$ ndraws: int 4000
## ..$ nobs : int 1321
## ..$ fe :List of 2
## .. ..$ X: num [1:1321, 1:7] 1 1 1 1 1 1 1 1 1 1 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:1321] "1" "2" "3" "4" ...
## .. .. .. ..$ : chr [1:7] "Intercept" "la_self" "se_self" "fem" ...
## .. .. ..- attr(*, "assign")= int [1:7] 0 1 2 3 4 5 6
## .. ..$ b: num [1:4000, 1:7] -3.3 -2.65 -2.16 -4.22 -4.46 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ draw : chr [1:4000] "1" "2" "3" "4" ...
## .. .. .. ..$ variable: chr [1:7] "b_Intercept" "b_la_self" "b_se_self" "b_fem" ...
## .. .. ..- attr(*, "nchains")= int 4
## ..$ sp : list()
## ..$ cs : list()
## ..$ sm : list()
## ..$ gp : list()
## ..$ re : list()
## ..$ ac : list()
## ..- attr(*, "class")= chr "bprepl"
## $ sigma: num [1:4000] 2.5 2.47 2.42 2.43 2.46 ...
As you can see, the function output returns information pertaining to
the systematic component of the model (incl. data and coefficients) in
$dpars$mu$fe
, and everything pertaining to the auxiliary
scale parameter, \(\sigma\), in
$dpars$mu$sigma
.
Lets say we want to calculate the posterior distribution of the expected value of AfD support for a hypothetical male, 60 year-old resident of West Germany who holds strong anti-immigration and centrist redistribution preferences.
# Get posterior samples of coefficients
beta = brms::prepare_predictions(lm_brms)$dpars$mu$fe$b
# Check dimensions
dim(beta)
## [1] 4000 7
# Define covariate scenario
x_star = c(
1, # Leading 1 to multiply the intercept
10, # Very strong anti-immigration (la_self = 10)
5, # Centrist on redistribution (se_self = 5)
0, # male (fem = 0)
0, # West-German (east = 0)
60, # 60 years of age (age = 60)
10 * 5 # Multiplicative term (la_self * se_self)
)
# Posterior distribution of the expected value
exp_val = beta %*% x_star
# Quantile summary
quantile(exp_val, c(.5, .025, .975))
## 50% 2.5% 97.5%
## -1.647983 -1.967386 -1.330740
brms
compiles a Stan model program in C++ in the
rstan
backend according to the brm()
function
arguments supplied by the user.
We can also get the underlying Stan model code.
brms::stancode(lm_brms)
## // generated with brms 2.20.4
## functions {
## }
## data {
## int<lower=1> N; // total number of observations
## vector[N] Y; // response variable
## int<lower=1> K; // number of population-level effects
## matrix[N, K] X; // population-level design matrix
## int<lower=1> Kc; // number of population-level effects after centering
## int prior_only; // should the likelihood be ignored?
## }
## transformed data {
## matrix[N, Kc] Xc; // centered version of X without an intercept
## vector[Kc] means_X; // column means of X before centering
## for (i in 2:K) {
## means_X[i - 1] = mean(X[, i]);
## Xc[, i - 1] = X[, i] - means_X[i - 1];
## }
## }
## parameters {
## vector[Kc] b; // regression coefficients
## real Intercept; // temporary intercept for centered predictors
## real<lower=0> sigma; // dispersion parameter
## }
## transformed parameters {
## real lprior = 0; // prior contributions to the log posterior
## lprior += normal_lpdf(b | 0, 5);
## lprior += normal_lpdf(Intercept | 0, 5);
## lprior += cauchy_lpdf(sigma | 0, 5)
## - 1 * cauchy_lccdf(0 | 0, 5);
## }
## model {
## // likelihood including constants
## if (!prior_only) {
## target += normal_id_glm_lpdf(Y | Xc, Intercept, b, sigma);
## }
## // priors including constants
## target += lprior;
## }
## generated quantities {
## // actual population-level intercept
## real b_Intercept = Intercept - dot_product(means_X, b);
## }
Posterior predictive checks involve simulating the data-generating process to obtain replicated data given the estimated model. They can help us determine how well our model fits the data.
This usually involves two questions:
sup_afd
(support for
the AfD on the -5 to +5 scale)?To check whether the generative model produces distributions of
replicated outcomes that match the distribution of the observed outcome,
we can compare the density of the observed outcome with those of, say,
ndraws = 100
simulations. Each simulation is based on one
post-warm-up sample from the posterior distribution.
brms::pp_check(lm_brms, ndraws = 100, type = "dens_overlay")
So, what do you think?
To check the predictive accuracy of the model, we can investigate the distribution of observation-level prediction errors. A model with perfect fit would produce an error of \(0\) for all \(N\) observations.
Below, you see the distribution of errors for our linear model. What do you think?
brms::pp_check(lm_brms, ndraws = 1, type = "error_hist")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Zero-one-inflated beta (ZOIB) regression models bounded continuous outcomes on the unit (i.e., \([0,1]\)) interval. The ZOIB model is a GLM with a multi-family likelihood, meaning that its likelihood is composed of a mixture of several constitutive likelihoods. Specifically, it supplements a beta pdf for values \(y \in ]0, 1[\) with additional pmfs for the boundary values \(y \in \{0,1\}\).
To fully understand the model, we must quickly return to the beta distribution, which we previously characterized in terms of two strictly positive shape parameters, \(a>0\) and \(b>0\).
These shape parameters can also be expressed in terms of a location parameter \(\mu\) and a scale parameter \(\phi\), where \(a = \mu \cdot \phi\) and \(b = (1 - \mu) \cdot \phi\), with \(\mu \in [0, 1]\) and \(\phi > 0\).
Next to this reparameterized beta pdf, the model accommodates two Bernoulli pmfs: The first models 0/1 observations as a function of a probability parameter \(\text{zoi}\) (short for zero-one-inflation), the second models 1’s (as opposed to 0’s), conditional on observing a 0/1 observation, as a function of a probability parameter \(\text{coi}\) (short for conditional-one-inflation).
Taken together, the zero-one-inflated beta model is given by:
\[ p(y_i| \mu_i, \phi_i, \text{zoi}_i, \text{coi}_i) = \begin{cases} \text{Bernoulli}(1 | \text{zoi}_i) \times \text{Bernoulli}(0| \text{coi}_i)& \text{if } y = 0 \\ \text{Bernoulli}(1 | \text{zoi}_i) \times \text{Bernoulli}(1| \text{coi}_i)& \text{if } y = 1 \\ \text{Bernoulli}(0 | \text{zoi}_i) \times \text{Beta}(y| \mu_i \cdot \phi_i, (1 - \mu_i) \cdot \phi_i)& \text{if } y_i \in ]0,1[ \end{cases} \]
where
\[ \begin{align} \text{zoi}_i = & \text{invlogit}(\mathbf{x}_i^\prime{\beta}_{\text{zoi}}) \\ \text{coi}_i = & \text{invlogit}(\mathbf{x}_i^\prime{\beta}_{\text{coi}}) \\ \mu_i = & \text{invlogit}(\mathbf{x}_i^\prime{\beta}_{\mu}) \\ \phi_i = & \log(\mathbf{x}_i^\prime{\beta}_{\phi}) \end{align} \]
Note: Technically, we could use different predictors in each of these four equations. But to keep it “simple”, we will use the same \(\mathbf{x}_i\) across all four.To model a bounded continuous outcome on the unit interval, we must first rescale it. Therefore, we will transform the scale of AfD support to range from 0 to 1 (with midpoint 0.5) instead of -5 to +5 (with midpoint 0). No worries, we can always scale it back later on.
gles = gles %>%
dplyr::mutate(
sup_afd_unit = scales::rescale(sup_afd, to = c(0, 1))
)
We can fit in brms
(note that the corresponding
brmsfit
object will be pre-loaded):
if(!file.exists('zoib_brms.rda')){
zoib_brms = brms::brm(
formula = bf(
sup_afd_unit ~ la_self * se_self, # Model for the mean of the beta
phi ~ la_self * se_self, # Model for the precision of the beta
zoi ~ la_self * se_self, # Model for zero/one inflation
coi ~ la_self * se_self # Conditional model for zero vs one infl.
),
data = gles, # data
family = zero_one_inflated_beta( # family and links
link = "logit",
link_phi = "log",
link_zoi = "logit",
link_coi = "logit"
),
prior = c( # priors
brms::prior(normal(0, 5), class = b),
brms::prior(normal(0, 5), class = Intercept)
),
chains = 4, # number of chains
cores = 4, # number of cores
iter = 2000, # number of iterations per chain
warmup = 1000, # number of warm-up samples per chain
seed = 6886 # seed
)
save(zoib_brms, file='zoib_brms.rda')
} else {
load('zoib_brms.rda')
}
We first observe the distributional congruence of the ZOIB-generated outcome simulations.
brms::pp_check(zoib_brms, ndraws = 100, type = "dens_overlay")
We then turn to checking observation-level prediction errors.
brms::pp_check(zoib_brms, ndraws = 1, type = "error_hist")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
And lets recheck our substantive interpretation:
plot_predictions(
zoib_brms,
condition=list(
'la_self', 'se_self'=c(2, 8)
)
)