
Cracking the Code: A Guide to Bambi’s Hierarchical Formula Syntax
I am learning Bayesian inference, currently working my way through the fantastic “Statistical Rethinking” book and implementing the code examples in PyMC. For someone who has spent years using the frequentist approach, thinking in probabilities is a real but rewarding challenge. To ease this transition, I decided to start with Bambi, a library that makes it wonderfully simple to build complex Bayesian models. I quickly hit a new hurdle, though: the formula syntax felt like another cryptic language to learn. Expressions like (1 | group) and (predictor | group) were confusing, and I realized I wasn’t just learning code, but a new way to think about a model’s structure. This is why I made this guide: to break down that “secret code” into simple, intuitive ideas and help anyone else on this journey translate their assumptions into powerful hierarchical models.
The key is to think in terms of Intercepts and Slopes.
- Intercept: A baseline or starting value.
- Slope: The effect of a predictor; how much the outcome changes for a one-unit change in the predictor.
The vertical bar | is the magic operator. It means “this effect varies across the groups defined by the variable on the right.”
Part 1: Group-Level (“Random”) Effects — The | Operator
This is for when you believe an effect is not constant, but changes depending on some grouping factor (e.g., student, patient, store, country). This allows for partial pooling.
1. Varying Intercepts: (1 | group)
- What it means: “Each level of group gets its own baseline/intercept.”
- The Model Assumes: These intercepts are all drawn from a common group-level distribution (e.g., a Normal(mu=0, sigma=sigma_group)). The model learns sigma_group, which controls how much the intercepts are allowed to vary.
- When to use: When you believe there’s a baseline difference between your groups. This is the most common and fundamental group-level effect.
- “Students have different starting abilities.” -> (1 | student)
- “Some stores are naturally more popular than others.” -> (1 | store)
- “Patients have different baseline health levels.” -> (1 | patient)
- Example Formula: sales ~ advertising + (1 | store)
- Interpretation: We are modeling sales with a common slope for advertising, but we’re allowing each store to have its own unique baseline sales level.
2. Varying Slopes: (0 + predictor | group)
- What it means: “The effect of predictor changes for each level of group.” The 0 + (or – 1) explicitly tells the model not to fit a varying intercept.
- The Model Assumes: These different slopes are drawn from a common group-level distribution.
- When to use: When you believe the relationship between a predictor and the outcome is different across your groups.
- “The effect of tutoring is stronger for some students than others.” -> (0 + tutoring_hours | student)
- “The effectiveness of an advertising campaign varies by store.” -> (0 + advertising | store)
- Example Formula: test_score ~ pre_test_score + (0 + tutoring_hours | student)
- Interpretation: We’re allowing the impact of tutoring_hours on test_score to be different for every student, but assuming a common intercept.
3. Varying Intercepts and Varying Slopes (Uncorrelated): (1 | group) + (0 + predictor | group)
- What it means: You are modeling both varying intercepts and varying slopes, but you are forcing the model to assume they are independent.
- When to use: This is less common. You would use it if you have a strong theoretical reason to believe there is no relationship between the baseline level and the effect’s slope. For example, “A student’s starting ability (intercept) has no bearing on how much they benefit from tutoring (slope).”
- Example Formula: sales ~ advertising + (1 | store) + (0 + advertising | store)
- Interpretation: Each store has its own baseline sales, AND the effect of advertising is different in each store. We assume a store’s baseline popularity is unrelated to its sensitivity to advertising.
4. Varying Intercepts and Varying Slopes (Correlated): (predictor | group)
- This is the most powerful and recommended default!
- What it means: This is the shorthand for (1 + predictor | group). It tells the model: “Fit a varying intercept AND a varying slope for predictor for each level of group, AND estimate the correlation between them.”
- When to use: Whenever you suspect both the baseline and the slope of an effect vary across groups. The correlation can be scientifically interesting.
- “Do students with lower starting scores (intercept) benefit more from tutoring (slope)?” (a negative correlation) -> (tutoring_hours | student)
- “Do already popular stores (intercept) get a bigger boost from advertising (slope)?” (a positive correlation) -> (advertising | store)
- Example Formula: sales ~ advertising + (advertising | store)
- Interpretation: Each store has its own baseline sales, the effect of advertising is different in each store, and we will estimate the relationship between these two effects.
Part 2: Common (“Fixed”) Effects — No | Operator
This is for effects you assume are constant across all observations. There is no pooling.
5. Including an Intercept (Default)
- Syntax: Just include the variable, e.g., y ~ x.
- What it means: The model automatically includes a global intercept term (a 1 + is implicit at the start). The coefficients for categorical variables are then interpreted as differences from a reference level.
- Example: y ~ category (where category has levels A, B, C). The model estimates Intercept (for level A), category[B] (the difference B-A), and category[C] (the difference C-A).
6. Suppressing the Intercept: 0 + predictor or – 1 + predictor
- What it means: “Do not fit a global intercept.”
- When to use: Very useful when you want to estimate the effect for each level of a categorical variable directly, rather than as differences from a reference. This is called “cell means” coding.
- Example Formula: y ~ 0 + category
- Interpretation: The model will now estimate three parameters: category[A], category[B], and category[C], which represent the mean of y for each category level directly. This is often much easier to interpret.
- Your use case: You used 0 + green_pathway_label. If green_pathway_label is a categorical variable with “Green” and “Blue” levels, this estimates the mean log_release_time for Green and the mean for Blue directly.
Going Deeper: More Useful Formula Tricks
Once you’ve mastered intercepts and slopes, you can add more nuance to your models with a few other powerful syntax operators. These help you handle categorical variables, interactions, and data transformations directly within the formula.
- c(variable) – Explicitly Categorical (No Pooling)
- What it does: c() forces Bambi to treat a variable as categorical, creating a separate “fixed” or “common” effect for each of its levels. The key here is that this approach assumes the effect of each category is independent and does not share information (no pooling).
- When to use it: This is the old-school, frequentist way of handling group effects. You might use it if you have very few groups (e.g., 2 or 3) with lots of data, or if you have a strong belief that the groups are completely unrelated. In general, the hierarchical approach with (1 | variable) is statistically more powerful and robust.
- var1:var2 – The Interaction Term
- What it does: This models an interaction, asking the question: “Does the effect of var1 on the outcome depend on the value of var2?” It estimates a single, additional parameter for this conditional effect.
- When to use it: Use this when you hypothesize that two variables work together. For example, in a sales model, an interaction advertising:store_location would test whether the impact of advertising is different for urban stores versus rural stores.
- var1*var2 – Main Effects Plus Interaction
- What it does: This is a convenient shortcut that automatically expands to var1 + var2 + var1:var2. It includes the main effect of var1, the main effect of var2, and their interaction all in one go.
- Why it’s important: It’s a statistical best practice to include the main effects of any variables you are interacting. This shortcut ensures you do so, making it the most common way to specify an interaction.
A Practical Example: The Green Pathway Project
Let’s tie all this theory together and apply it to the problem that started this journey: measuring the impact of a “green pathway” on shipment release times.
Our goal is to model log_release_time based on the green_pathway_label, while accounting for the fact that shipments go through different ports (Branch_x) and involve different product types (hs_code_grouped).
Here’s how we can build our model formula step-by-step:
- The Core Predictors: We start with the main effects. We want to know the release time for green vs. non-green pathways, and we need to control for shipment weight. Using 0 + gives us a direct estimate for each pathway type.
- log_release_time ~ 0 + green_pathway_label + log_weight
- Accounting for Ports (Branch_x): We have two beliefs about the ports: (1) some are inherently faster or slower than others (different baselines), and (2) the time-saving benefit of the green pathway might be larger or smaller depending on the port’s efficiency. This is the perfect scenario for a correlated varying intercept and slope model.
- … + (green_pathway_label | Branch_x)
- Accounting for Product Types (hs_code_grouped): We also suspect that the impact of the green pathway depends on the product. For some products, the “green” designation might allow it to bypass a lengthy inspection, offering a huge time savings. For others, the process might be the same regardless. We are mainly interested in how the effect of the green pathway changes, so a varying slope model is a great fit.
- … + (0 + green_pathway_label | hs_code_grouped)
Putting it all together, our final, powerful hierarchical model formula is:
log_release_time ~ 0 + green_pathway_label + log_weight + (green_pathway_label | Branch_x) + (0 + green_pathway_label | hs_code_grouped)
In plain English, this single line of code tells our model:
- Estimate the average release time for each pathway (0 + green_pathway_label).
- Control for the effect of shipment weight (log_weight).
- Estimate a unique baseline speed and a unique green pathway effect for each port, assuming these two things might be correlated ((green_pathway_label | Branch_x)).
- Estimate a unique green pathway effect for each product type ((0 + green_pathway_label | hs_code_grouped)).