How to fit hierarchical Bayesian models in R with BRMS: partial pooling explained | R bloggers

How to fit hierarchical Bayesian models in R with BRMS: partial pooling explained | R bloggers

11 minutes, 13 seconds Read

Hierarchical Bayesian Modeling (also called multi-level modeling) is one of the most reliable ways to build predictive and inferential models when your data has a natural grouping: teams, players, seasons, leagues, referees, locations, or even game states. In sports analysis, this grouping is unavoidable. In R, hierarchical Bayesian models are commonly implemented via BRMS (Stan), rstanarmor cmdstannr.

This tutorial focuses on partial pooling (also called shrinkage) and why it is the default choice for production-level academic modeling: it reduces overfitting, improves out-of-sample performance, and produces a fair quantification of uncertainty. We’ll use a sports dataset as a concrete example, but the modeling principles are generalizable to many domains (education, marketing, medicine, A/B testing, and more).

Keywords discussed: hierarchical Bayesian models in R, partial pooling, multilevel modeling, Bayesian shrinkage, Bayesian inference in Stan, BRMS tutorial, posterior predictive checks, priors, model calibration, sports analytics in R.

Pooling strategies: no pooling, full pooling, partial pooling

Suppose we want to estimate team strength. There are three classic approaches:

  • No pooling: Estimates a separate parameter for each team using only that team’s data. This could be high variance (too small a fit for small samples).
  • Full pooling: ignore teams and estimate one global parameter. This is low variance but high bias (misses real differences).
  • Partial pooling: Estimate team parameters while sharing information through a population-level distribution. This is the hierarchical Bayesian compromise that tends to dominate in practice.

With partial pooling, each group effect is ‘pulled’ towards the global average in proportion to the amount of information that group has. Teams with few games shrink more; teams with many games shrink less. This is not a trick; it is what the posterior implies under a reasonable hierarchical prior structure.


R Installation

We adjust models using BRMSwhich compiles Bayesian models to Stan and provides a high-level formula interface. The workflow below emphasizes reproducibility and good Bayesian practices: priors, diagnostics, posterior predictive checks, and principled comparison to simpler baselines.

# Core modeling stack
install.packages(c("brms", "tidyverse", "posterior", "bayesplot", "tidybayes"))
# Optional but recommended for speed (CmdStan backend)
install.packages("cmdstanr")

library(brms)
library(tidyverse)
library(posterior)
library(bayesplot)
library(tidybayes)

# If using cmdstanr (recommended), set backend once:
# cmdstanr::install_cmdstan()
options(brms.backend = "cmdstanr")

Sample data: Match results with team effects

To keep this tutorial self-contained, we’ll simulate a dataset that resembles soccer-style goal scoring. The same structure occurs in hockey, handball, indoor football and other sports with a low to moderate score. The key is that every match has two teams and the results depend on it latent team strength.

set.seed(123)

n_teams <- 20
teams   <- paste0("Team_", seq_len(n_teams))

# True latent parameters (unknown in real life)
true_attack  <- rnorm(n_teams, 0, 0.35)
true_defense <- rnorm(n_teams, 0, 0.35)
true_home_adv <- 0.20

# Schedule: random pairings
n_matches <- 600
home_id <- sample(seq_len(n_teams), n_matches, replace = TRUE)
away_id <- sample(seq_len(n_teams), n_matches, replace = TRUE)
# Avoid self-matches
same <- home_id == away_id
while (any(same)) {
  away_id[same] <- sample(seq_len(n_teams), sum(same), replace = TRUE)
  same <- home_id == away_id
}

home_team <- teams[home_id]
away_team <- teams[away_id]

# Poisson rates for goals
log_lambda_home <- true_home_adv + true_attack[home_id] - true_defense[away_id]
log_lambda_away <-               true_attack[away_id] - true_defense[home_id]

lambda_home <- exp(log_lambda_home)
lambda_away <- exp(log_lambda_away)

home_goals <- rpois(n_matches, lambda_home)
away_goals <- rpois(n_matches, lambda_away)

matches <- tibble(
  match_id   = seq_len(n_matches),
  home_team  = factor(home_team),
  away_team  = factor(away_team),
  home_goals = home_goals,
  away_goals = away_goals
)

matches %>% glimpse()

In a true sports analytics pipeline you would replace simulation with data ingestion (CSV/APIs), feature engineering (rest days, injuries, xG proxies, travel, Elo, etc.) and train/test splits. The hierarchical core remains: group-level effects with partial pooling.


Baseline: full pooling (no team effects)

As a basis, model the home goals using only a global intercept and home advantage indicator. This is deliberately oversimplified: it cannot learn that some teams are stronger than others.

m_pool <- brm(
  home_goals ~ 1,
  data   = matches,
  family = poisson(),
  prior  = c(
    prior(normal(0, 1.0), class = "Intercept")
  ),
  chains = 4, cores = 4, iter = 2000, seed = 1
)

summary(m_pool)

Complete bundling is often insufficient: it leads to well-mannered uncertainty, but lacks a systematic structure. Then we go to no pooling and then partial pooling.


No pooling: separate team parameters (high variance)

No pooling estimates a fixed effect for each team without sharing power between teams. This can be unstable if some teams have fewer observations or unbalanced schedules.

m_nopool <- brm(
  home_goals ~ 1 + home_team + away_team,
  data   = matches,
  family = poisson(),
  prior  = c(
    prior(normal(0, 1.0), class = "Intercept"),
    prior(normal(0, 0.5), class = "b")  # regularization, but still no pooling
  ),
  chains = 4, cores = 4, iter = 2000, seed = 2
)

summary(m_nopool)

Even with regularizing priors, no pooling is typically noisier than hierarchical partial pooling, especially in small samples. The multilevel model deals with this in a more principled way.


Partial pooling: hierarchical (multi-level) team effects

A standard approach is goal modeling random effects for home and away teams. In BRMS syntax: (1 | home_team) means that each home team gets its own interception, drawn from a common distribution. The standard deviation of the distribution is learned from data and determines the degree of shrinkage.

Below we apply a home goals model with hierarchical intercepts for both home and away teams. This records:
tendency to house attacks (home team effect) and removing defensive vulnerability (away team effect), albeit in a simplified way.

priors_hier <- c(
  prior(normal(0, 1.0), class = "Intercept"),
  # SD priors control how much team-to-team variation is plausible
  prior(exponential(1.0), class = "sd")
)

m_hier <- brm(
  home_goals ~ 1 + (1 | home_team) + (1 | away_team),
  data   = matches,
  family = poisson(),
  prior  = priors_hier,
  chains = 4, cores = 4, iter = 2500, seed = 3
)

summary(m_hier)

Interpretation: The model estimates an overall goal percentage (intercept) plus deviations for each team, but these deviations are partially aggregated. This is the practical meaning of partial pooling: Group-level parameters derive statistical power.


Model both scores together

A more realistic sports modeling setup estimates both home and away goals and separates the attack/defense structure. brms supports multivariate models. Here is a simple bivariate Poisson-like approach by fitting two Poisson responses with correlated random effects (conceptually similar to attack/defense components).

bf_home <- bf(home_goals ~ 1 + (1 | home_team) + (1 | away_team))
bf_away <- bf(away_goals ~ 1 + (1 | away_team) + (1 | home_team))

m_mv <- brm(
  bf_home + bf_away + set_rescor(FALSE),
  data   = matches,
  family = poisson(),
  prior  = c(
    prior(normal(0, 1.0), class = "Intercept", resp = "homegoals"),
    prior(normal(0, 1.0), class = "Intercept", resp = "awaygoals"),
    prior(exponential(1.0), class = "sd")
  ),
  chains = 4, cores = 4, iter = 3000, seed = 4
)

summary(m_mv)

This multivariate model is already a meaningful step toward academic-quality sports inferences: it produces posterior distributions for team effects with coherent uncertainty, and it avoids overreacting to noisy short-term forms.


Diagnostics and Posterior Predictive Controls (Academic Essentials)

A Bayesian model is only as credible as its diagnostics. Check at least:
R hat (convergence), effective sample sizeand posterior predictive fit.

# Convergence diagnostics
m_hier %>% summary()

# Posterior predictive checks: do simulated goals resemble observed?
pp_check(m_hier, type = "hist", ndraws = 100)

# Another useful view: check distribution by team (subset example)
some_teams <- levels(matches$home_team)[1:6]
pp_check(m_hier, type = "hist", ndraws = 50) +
  ggplot2::facet_wrap(~ home_team, ncol = 3)

Posterior predictive checks help detect misspecification: too many zeros, heavy tails, or systematic under/overdispersion. If your sport has additional diversification, consider a negative binomial probability:
family = negbinomial().

m_hier_nb <- brm(
  home_goals ~ 1 + (1 | home_team) + (1 | away_team),
  data   = matches,
  family = negbinomial(),
  prior  = c(
    prior(normal(0, 1.0), class = "Intercept"),
    prior(exponential(1.0), class = "sd"),
    prior(exponential(1.0), class = "shape")  # NB dispersion
  ),
  chains = 4, cores = 4, iter = 2500, seed = 5
)

pp_check(m_hier_nb, type = "hist", ndraws = 100)

Visualization of partial pooling (shrinkage) in R

The cleanest way to “see” partial pooling is to compare group estimates based on: (a) no pooling and (b) hierarchical pooling. Teams with limited data will shrink towards the global average in the hierarchical model.

# Extract team effects from both models
re_hier <- ranef(m_hier)$home_team[, , "Intercept"] %>%
  as_tibble(.name_repair = "minimal") %>%
  setNames(c("estimate", "est_error", "q2.5", "q97.5")) %>%
  mutate(team = rownames(ranef(m_hier)$home_team[, , "Intercept"]))

# No pooling fixed effects: home_team coefficients (approx comparison)
fix_nopool <- fixef(m_nopool) %>% as.data.frame() %>% rownames_to_column("term") %>%
  filter(str_starts(term, "home_team")) %>%
  mutate(team = str_remove(term, "home_team")) %>%
  transmute(team, estimate = Estimate, q2.5 = Q2.5, q97.5 = Q97.5)

# Join and compare
comp <- re_hier %>%
  left_join(fix_nopool, by = "team", suffix = c("_hier", "_nopool"))

comp %>%
  ggplot(aes(x = estimate_nopool, y = estimate_hier)) +
  geom_point() +
  geom_abline(slope = 1, intercept = 0) +
  labs(
    x = "No pooling estimate (fixed effects)",
    y = "Partial pooling estimate (hierarchical)",
    title = "Shrinkage: hierarchical estimates pull extreme values toward the mean"
  )

In applied work, this shrinkage is a feature and not a bug: it protects you from chase noises, especially at the beginning of a season or when some teams have faced unusually strong/weak opponents.


Posterior predictions: from parameters to predictive distributions

Hierarchical Bayesian models excel when you need uncertainty-aware predictions. Instead of a single number, you get a full posterior predictive distribution, useful for prediction intervals, simulations, and decision analysis.

# Create a small set of future fixtures (example)
new_matches <- tibble(
  home_team = factor(c("Team_1", "Team_2", "Team_3"), levels = levels(matches$home_team)),
  away_team = factor(c("Team_4", "Team_5", "Team_6"), levels = levels(matches$away_team))
)

# Posterior expected goals (lambda) for home_goals model
epred <- posterior_epred(m_hier, newdata = new_matches, ndraws = 2000)
# epred is draws x rows. Summarize mean and interval per match:
pred_summary <- apply(epred, 2, function(x) {
  c(mean = mean(x), q10 = quantile(x, 0.10), q90 = quantile(x, 0.90))
}) %>% t() %>% as_tibble()

bind_cols(new_matches, pred_summary)

If you need simulated goal numbers (not just expected value), use posterior_predict().

yrep <- posterior_predict(m_hier, newdata = new_matches, ndraws = 2000)

sim_summary <- apply(yrep, 2, function(x) {
  c(mean = mean(x), q10 = quantile(x, 0.10), q90 = quantile(x, 0.90))
}) %>% t() %>% as_tibble()

bind_cols(new_matches, sim_summary)

Model comparison: why hierarchical often wins

A common academic question: does partial pooling actually improve predictive accuracy? One principled approach is to use estimated leave-one-out cross-validation (LOO). loo().

# Requires: install.packages("loo")
library(loo)

loo_pool   <- loo(m_pool)
loo_nopool <- loo(m_nopool)
loo_hier   <- loo(m_hier)

loo_compare(loo_pool, loo_nopool, loo_hier)

In many real data sets, hierarchical models dominate models without pooling, because they reduce variance without imposing the large bias of full pooling. If the sample is large and balanced, no-pooling can catch up, but remains more vulnerable to distribution shifts (new season, schedule changes, unbalanced scheduling).


Priorities for multi-level sports models: practical guidance

Priors are not ‘optional awards’; they formalize regularization and encode plausible scales. For sports scores, consider the following:

  • Intercept sooner: reflects the typical target rate (Poisson log scale).
  • Group SD priors: determine how much teams can deviate from the league average.
  • Probability choice: Poisson versus negative binomial for overdispersion.
# Example: informative-ish intercept prior based on typical goals per match
# If average home goals ~ 1.4, then log(1.4) ~ 0.336
priors_sporty <- c(
  prior(normal(log(1.4), 0.5), class = "Intercept"),
  prior(exponential(1.0), class = "sd")
)

m_hier2 <- brm(
  home_goals ~ 1 + (1 | home_team) + (1 | away_team),
  data   = matches,
  family = poisson(),
  prior  = priors_sporty,
  chains = 4, cores = 4, iter = 2500, seed = 6
)

If you’re publishing academic-style modeling content, a quick preliminary sensitivity check is a strong signal of credibility: re-adjust with slightly broader SD priorities and confirm that the conclusions are stable.


Practical notes for real sports data

  • Unbalanced schedules: The hierarchical structure helps stabilize estimates when teams face different opponent qualities.
  • Little monsters: In the early season or in new leagues, partial pooling is most valuable.
  • Covariates: Add rest days, travel, injuries, Elo, or rolling form as fixed effects, but keep team effects hierarchical.
  • Time dynamics: For long seasons, consider random walks or hierarchical tiers per season.
# Add a covariate example (simulated here):
matches2 <- matches %>%
  mutate(rest_diff = rnorm(n(), 0, 1))  # placeholder for real engineered feature

m_cov <- brm(
  home_goals ~ 1 + rest_diff + (1 | home_team) + (1 | away_team),
  data   = matches2,
  family = poisson(),
  prior  = c(
    prior(normal(0, 1.0), class = "Intercept"),
    prior(normal(0, 0.3), class = "b"),   # effect size prior for covariate
    prior(exponential(1.0), class = "sd")
  ),
  chains = 4, cores = 4, iter = 2500, seed = 7
)

summary(m_cov)

Further reading and deeper projects (optional)

If you want to extend this tutorial to a complete sports analytics workflow (data acquisition, feature engineering, predictive evaluation, and model implementation), two longer-form references may be useful as project guides:

You can think of these as optional deep dives; The core multi-level concepts in this article are self-contained and clearly transferable to non-athletic hierarchical modeling problems.


Conclusion

Partial pooling is the practical heart of hierarchical Bayesian modeling. In R, multilevel models with brms/Stan give you: (1) stable group estimates, (2) principled regularization, and (3) posterior predictive uncertainty that supports simulation, prediction, and evaluation. If your data has groups – and most real-world data does – hierarchical Bayes is often the most defensible baseline you can set.


#fit #hierarchical #Bayesian #models #BRMS #partial #pooling #explained #bloggers

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *