Why do Marketing Mix Models struggle when all your channels move together? #84
simba-quokka
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Multicollinearity is the MMM problem nobody talks about enough. If your TV spend, digital display, and paid social all ramp up in Q4 and pull back in Q1 together, the model faces a fundamental challenge: it can never observe TV without digital, so it cannot cleanly separate their individual effects. The result is unstable coefficient estimates, implausibly signed effects (spending more on a profitable channel appears to hurt revenue), and wide confidence intervals that make results hard to defend in a board meeting.
Why it happens so often
Marketing budgets are seasonal by design. You flight campaigns together, cut spend in January, push hard in November. That coordination is sensible from a business standpoint but creates a statistical headache. When channels are correlated above roughly 0.8 in your spend data, traditional regression treats the correlated channels as interchangeable and allocates credit somewhat arbitrarily between them.
A 2025 academic study by Opella (a pharmaceutical marketing analytics company), presented at the Mathematics with Industry Study Group in the Netherlands, focused specifically on this problem. Their key finding: the issue is not just about statistical efficiency. It affects the causal interpretation of the estimates. A model that cannot distinguish correlated channels will systematically misattribute contributions, leading to budget decisions that move money toward channels that happen to be correlated with a high-performing period rather than genuinely high-performing channels.
How Bayesian MMM handles it better
Frequentist regression under multicollinearity inflates standard errors and becomes numerically unstable. Bayesian MMM does something more useful: it preserves uncertainty. Instead of picking one plausible attribution between correlated channels, the model holds multiple possible attributions simultaneously in the posterior distribution.
This is where priors become practical rather than theoretical. An InverseGamma prior that constrains a channel effect to be positive prevents the model from flipping to a nonsensical negative estimate just because two channels are correlated. A relatively tight prior based on lift test evidence (say, paid search ROAS between 2x and 4x from a holdout experiment) gives the model enough signal to pin down that channel even when the time series is collinear with others.
Practically, Bayesian shrinkage priors act like regularization: they pull uncertain estimates toward plausible values rather than letting them fly off in compensatory directions. This does not eliminate the information problem created by multicollinearity, but it does prevent the worst artifacts.
Practical things you can do
Check your correlation matrix before modeling. Pairwise correlations above 0.8 between channels warrant investigation. A variance inflation factor (VIF) above 5-10 is a common diagnostic threshold.
Run experiments to break the correlation. A geo holdout test that turns off one channel in some markets while leaving others running creates the independent variation a time-series model cannot get from observational data alone.
Collapse channels that are structurally correlated. If your programmatic display is always purchased alongside video at a fixed ratio, modeling them as separate variables is an illusion. Combine them and interpret the combined effect.
Use lift test results as calibration observations. Feeding a lift test result into the model as a likelihood observation (not just a prior) directly reduces the ambiguity in that channel's contribution, which also improves estimates for the correlated channels around it.
Simba's Bayesian modeling approach and priors configuration are specifically designed to handle this — combining smart default priors with the ability to inject lift test evidence where you have it, reducing the model's dependence on collinear spend patterns to do all the work.
How do you handle multicollinearity in your models? Do you collapse channels, engineer spend experiments, or lean on priors?
- Quokka
Beta Was this translation helpful? Give feedback.
All reactions