Add ExperimentDesigner: posterior-aware experiment design for lift tests by drbenvincent · Pull Request #2356 · pymc-labs/pymc-marketing

drbenvincent · 2026-03-02T13:49:11Z

Summary

Adds ExperimentDesigner class that recommends which marketing experiment to run (channel, spend level, duration) based on a fitted MMM's posterior uncertainty about channel response functions
Implements adstock-aware lift prediction, Bayesian assurance (posterior-predictive power), weighted composite scoring across 5 dimensions, and 5 plotting methods
Includes fixture generator for creating realistic test posteriors, with both fast synthetic mode and full MCMC fitting

Details

Core computation: For each candidate experiment design (channel × spend change × duration), evaluates the posterior-predicted lift accounting for geometric adstock ramp-up, computes measurement noise σ = σ_residual · √T, and derives Bayesian assurance (expected power over the posterior distribution of the true effect).

Scoring dimensions: Channels are ranked by a configurable weighted sum of: posterior uncertainty, spend correlation, saturation gradient, assurance, and cost efficiency — all min-max normalised.

v1 scope: National-level experiments with LogisticSaturation + GeometricAdstock (adstock_first=True). Geo-level designs, pulse/switchback experiments, and Fisher Information are deferred to v2.

New files

pymc_marketing/mmm/experiment_design/designer.py — ExperimentDesigner class
pymc_marketing/mmm/experiment_design/recommendation.py — ExperimentRecommendation dataclass
pymc_marketing/mmm/experiment_design/functions.py — Numpy logistic_saturation
pymc_marketing/mmm/experiment_design/fixture.py — generate_experiment_fixture()
3 test files with 65 tests covering all components

Test plan

65 new tests pass (functions, designer, fixture, plotting)
Pre-commit hooks pass (ruff, mypy, formatting)
Adstock ramp verified against analytic geometric series
Assurance calibrated (α for zero effects, ~1.0 for large effects)
NetCDF round-trip for fixtures verified
Integration test with fitted MMM (requires MCMC, deferred to CI)

Towards #2355

Made with Cursor

📚 Documentation preview 📚: https://pymc-marketing--2356.org.readthedocs.build/en/2356/

Implements a posterior-aware experiment designer that recommends which marketing experiment to run based on a fitted MMM's uncertainty about channel response functions. Computes adstock-aware lift predictions, Bayesian assurance (posterior-predictive power), and weighted composite scores across candidate experiments. Includes ExperimentDesigner class with recommend() and 5 plotting methods, ExperimentRecommendation dataclass, numpy response functions, fixture generator, and 65 tests. Closes #2355 Made-with: Cursor

codecov · 2026-03-02T13:56:56Z

Codecov Report

❌ Patch coverage is 96.49682% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.05%. Comparing base (7dfb966) to head (0fa1336).

Files with missing lines	Patch %	Lines
pymc_marketing/mmm/experiment_design/fixture.py	82.82%	17 Missing ⚠️
pymc_marketing/mmm/experiment_design/designer.py	98.88%	5 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2356      +/-   ##
==========================================
+ Coverage   92.99%   93.05%   +0.06%     
==========================================
  Files          82       86       +4     
  Lines       13256    13884     +628     
==========================================
+ Hits        12327    12920     +593     
- Misses        929      964      +35

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- Add end-to-end walkthrough notebook (docs/source/notebooks/mmm/) - Add gallery entry under "Experiment Design" section - Ship pre-built InferenceData fixture (simulated_3channel.nc) - Add slow simulation-based assurance calibration tests - Add tests for scoring weight redistribution and channel ranking - Register 'slow' pytest marker in pyproject.toml Made-with: Cursor

Made-with: Cursor

review-notebook-app · 2026-03-02T14:20:41Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

…ansformer Eliminate experiment_design/functions.py by delegating to pymc_marketing.mmm.transformers.logistic_saturation via a new _eval_saturation() static method that calls .eval() on the PyTensor result. Made-with: Cursor

…c-labs/pymc-marketing into fix/2355-experiment-designer

drbenvincent · 2026-03-17T11:57:44Z

Some review comments received from @carlosagostini, but restated in my own words

Point 1: The posterior-as-design-prior is epistemically fragile

The reviewer likes the concept of Bayesian assurance but raises a fundamental concern about where the design prior comes from. In the classical assurance literature (O'Hagan et al., 2005), the design prior is independently elicited — it represents genuine uncertainty about the effect. In your implementation, the "design prior" is the MMM's posterior, which may suffer from the very identifiability problems that motivate running experiments in the first place.

The dangerous case: if a channel appears effective due to confounding (e.g., TV spend correlates with seasonal demand), the posterior will be confidently wrong — concentrated around a large positive effect. Assurance will then be confidently high, and the tool will enthusiastically recommend an experiment designed around a misleading belief. The tool works well when the model is already well-identified, which is precisely when you need it least. When identification is weakest (the case where experiments matter most), the tool's recommendations are least trustworthy.

Point 2: Channels the model believes are null will never be recommended for testing

If a channel's posterior is concentrated near zero (the model thinks it does nothing), assurance is mathematically bounded by P(effect > 0 | posterior). No matter how long or large you make the experiment, assurance stays low because the posterior says the effect is probably zero. So the tool will never recommend testing these channels.

But these are exactly the channels you might most want to test — to confirm the model's belief and potentially stop wasting budget on them. The tool has a structural blind spot for "validating the null." If a channel truly is ineffective, that's enormously valuable to know with certainty, but the scoring system will always deprioritise it.

Point 3: The IID residuals assumption likely leads to systematically overestimated assurance

The noise model uses sigma_d = residual_std * sqrt(T), which is only valid if weekly residuals are independent. In a time series context (which is what MMM data inherently is), residuals are likely autocorrelated — positive residuals tend to follow positive residuals. When residuals are positively autocorrelated, Var(sum of residuals) > T * sigma^2, meaning the true measurement noise is larger than the formula assumes. This causes the tool to systematically overestimate assurance across the board. The reviewer wants to confirm whether this was a deliberate simplifying assumption or an oversight.

Point 4: The weighted scoring system feels arbitrary and gameable

The 5-dimensional scoring system (uncertainty, correlation, gradient, assurance, cost efficiency) with configurable weights adds a layer of complexity that most users won't be able to navigate. The reviewer's concerns are:

No principled heuristic: How does a user decide that uncertainty should get 0.2 vs 0.3? There's no guidance for making that choice.
Gameability: Because the weights are fully configurable and the score is just a weighted average of min-max normalised values, a user can effectively reverse-engineer the weights to get whatever channel they want on top. This feels more problematic than in typical scoring systems because the dimensions themselves are somewhat opaque.
Accessibility: Only a handful of technically sophisticated users would be able to meaningfully tune these weights; for everyone else, it's an unexplained black box.

The upstream `logistic_saturation` transformer now requires xtensor dims via `as_xtensor()`, which rejects raw numpy arrays. Replace the PyTensor round-trip with the equivalent numpy formula so `_eval_saturation` works with plain arrays. Made-with: Cursor

drbenvincent · 2026-03-17T12:27:19Z

All good points. Basically these are by-products of this being a very early MVP. Goal is to determine level of external interest in order to evaluate it this is something worth sinking extra developer time on.

Mulling over whether to just deal with some of the bigger problems at this point. Fair enough coming up with an MVP, but it does need to be actually useful.

…s, simplified scoring, identification warnings - Fix systematic overestimation of assurance by adding AR(1) autocorrelation correction to the measurement noise model (sigma_d) - Report null-confirmation candidates: channels the model believes have near-zero effect are flagged so they aren't silently ignored - Simplify default scoring to two transparent dimensions (assurance + cost efficiency); advanced dimensions remain available via score_weights - Add identification warning in rationale when spend correlation exceeds 0.7 - Add prominent docstring note that assurance is conditional on model identification quality - 12 new tests (77 total), notebook updated Made-with: Cursor

- Remove stale 5-dimension score_weights, use new 2-dimension defaults - Add AR(1) correction visibility cell after fixture creation - Add "Identification Safeguards" section with pathological fixture demonstrating null-confirmation candidates and correlation warnings - Update recommendation table description to mention null candidates Made-with: Cursor

drbenvincent · 2026-03-17T13:09:13Z

Response to @carlosagostini's review feedback

Thank you for the thorough and insightful review. All four points are well-taken. We've addressed each one in this PR, balancing pragmatic improvements now with a clear path to deeper solutions as the feature matures.

Point 1: Posterior-as-design-prior is epistemically fragile

What we did:

Identification warnings in rationale text. When a channel has high pairwise spend correlation (r >= 0.70) with any other channel, the auto-generated rationale now includes a "Caution" paragraph warning that the posterior may be influenced by confounding and that assurance should be interpreted with care.
Prominent documentation. The ExperimentDesigner class docstring, recommend() docstring, and the notebook all contain explicit warnings that assurance is conditional on the model being reasonably well-identified — a confidently wrong posterior produces confidently high assurance.
Notebook demonstration. A new "Identification Safeguards" section creates a pathological fixture with r = 0.85 between tv and search, and shows the correlation warning appearing in the rationale.

Deferred to v2: Prior sensitivity diagnostic (assurance under a skeptical alternative prior), which would directly quantify how much the recommendations change under different beliefs. Fisher Information-based scoring is also on the roadmap.

Point 2: Channels the model believes are null will never be recommended

What we did:

Null-confirmation candidates. After scoring, recommend() now identifies channels whose posterior mean contribution (|beta * saturation(x_current)|) falls below the residual noise floor. These are reported as null_confirmation_candidates on the ExperimentRecommendations object.
Visible in output. Null-confirmation candidates appear below the recommendation table in both repr and HTML rendering, with guidance that testing these channels can confirm the model's belief and justify budget reallocation.
Notebook demonstration. The "Identification Safeguards" section includes a channel with beta = 0.01, which is flagged as a null-confirmation candidate in the rendered output.

Deferred to v2: A proper "confirmation value" scoring dimension and a two-mode recommendation system (discover vs. confirm).

Point 3: IID residuals assumption overestimates assurance

What we did:

AR(1) autocorrelation correction. _compute_residual_std() now also estimates the lag-1 autocorrelation (ρ) of the MMM residuals. A new _effective_sigma(T) method applies the AR(1) variance inflation factor (1 + ρ) / (1 - ρ) to the cumulative noise, so σ_d = σ_ε √(T · (1 + ρ) / (1 - ρ)).
Conservative by default. For ρ = 0 (no autocorrelation), the formula reduces to the original IID formula. For ρ > 0 (the typical case), assurance estimates become more conservative — and more honest.
Notebook demonstration. A new cell right after fixture creation prints the residual autocorrelation, the IID sigma, the corrected sigma, and the correction factor, so the user can see exactly what adjustment is being made.

To directly answer the reviewer's question: the IID assumption was a deliberate simplifying assumption in the initial MVP, not an oversight. We've now replaced it with the AR(1) correction as the default.

Deferred to v2: Simulation-based sigma estimation (placebo-in-time), geo-specific sigma from synthetic control fit quality.

Point 4: Scoring system feels arbitrary and gameable

What we did:

Simplified defaults. The default score now uses only 2 dimensions with equal weight: assurance (0.5) and cost efficiency (0.5). These are intuitive and hard to game — one measures detectability, the other measures cost per unit of detectability.
Advanced dimensions opt-in. The three additional dimensions (uncertainty, correlation, gradient) are still computed and displayed in diagnostic plots, but they no longer contribute to the default score. Users who want them can pass them explicitly via score_weights.
Clear documentation. The notebook explains the two default dimensions in a simple table, and includes a callout inviting feedback on scoring — specifically asking whether a single "risk appetite" parameter would be more useful than configurable weights.

Deferred to v2: Fisher Information-based information gain as a single principled objective; Pareto frontier of information gain vs. cost; potentially a single-parameter tradeoff control.

All changes include unit tests and are demonstrated end-to-end in the updated notebook. Happy to discuss any of these further.

Improve patch coverage from ~82% to ~87% by testing: - from_idata without residual_autocorr / spend_correlation - normalize=False branches (steady-state, adstock ramp) - recommend() with defaults and decrease direction - single-channel correlation info - ExperimentRecommendations equality protocol - fixture default parameters - plotting: existing axes, spend_levels, single-row/col layouts, no-correlation diagnostics, decrease direction markers - null-confirmation HTML with non-empty recommendations Made-with: Cursor

drbenvincent · 2026-03-17T15:09:03Z

Failing test notebooks nothing to do with me. This was introduced by PR #2361 (merged to main)

Add TestInitFromMockMMM class that exercises the __init__ path using lightweight stub classes (LogisticSaturation, GeometricAdstock) and a minimal mock MMM object, covering lines 106-217 in designer.py. Also add TestFindChannelDim for the static helper, fixture edge-case tests (_simulate_spend default rng, short-series autocorrelation), and edge-case tests for predict length mismatch and spend correlation failure. designer.py now at 100% coverage. Made-with: Cursor

… fix construction Address 7 of 11 issues from code review: - Extract _channel_metrics(), _ramp_fractions_matrix(), _evaluate_candidates() to eliminate duplicated computation across scoring and plotting - Simplify scoring to only assurance + cost_efficiency (remove unused dimensions) - Add _init_common() so __init__ and from_idata share attribute setup - Replace silent exception swallowing with UserWarning - Remove dead code: unused scalers, dead _SUPPORTED_SATURATION entry, unused recommendations param on plot_adstock_ramp - Add 14 golden regression tests pinning public API numerical output All 128 tests pass with zero numerical change (verified by golden tests). Made-with: Cursor

drbenvincent · 2026-03-17T15:51:21Z

ExperimentDesigner Refactoring — Summary of Changes

This commit addresses the main structural issues identified during code review. All 14 golden regression tests pass, confirming zero numerical change.

Changes Made

1. Eliminated computation duplication

_channel_metrics() — New method that computes per-channel HDI width, mean spend correlation, saturation gradient, and mean alpha in one pass. plot_channel_diagnostics now consumes this dict instead of computing each metric inline.

_ramp_fractions_matrix() — Single source of truth for the adstock ramp fraction matrix (shape n_draws × T). Both _compute_ramp_fraction and plot_adstock_ramp now delegate to it, eliminating the duplicated geometric series calculation.

_get_uncertainty_ranks() — Now delegates to _channel_metrics() instead of recomputing HDI widths independently.

2. Simplified scoring system

Removed the three unused scoring dimensions (uncertainty, gradient, correlation) that were computed but never contributed to the default score. The _compute_scoring_dimensions method has been removed entirely — _compute_scores now works directly with assurance and cost_efficiency arrays.

The weight redistribution logic (for redistributing correlation weight when spend correlation is unavailable) has been removed. Unknown dimension keys in score_weights are silently ignored.

3. Fixed `from_idata` construction anti-pattern

Introduced _init_common() — a single method that sets all instance attributes from pre-extracted data. Both __init__ and from_idata funnel through it, so new attributes only need to be added in one place. This eliminates the maintenance trap where every attribute added to __init__ had to be manually mirrored in from_idata.

4. Fixed silent exception swallowing

_compute_residual_std now emits a UserWarning when mmm.predict() fails and the fallback residual_std=1.0 is used. Similarly, spend correlation computation failure now warns instead of silently falling back to None. _compute_residual_std is now a @staticmethod that returns (std, autocorr) instead of mutating instance attributes.

5. Removed unused scalers

_channel_scaler and _target_scaler were stored but never referenced. Removed entirely.

6. Decomposed `recommend()`

Extracted _evaluate_candidates() which returns a list of raw metric dicts (everything except score and rationale). The recommend() method now follows a clear 4-phase pipeline: evaluate → score → attach rationale → sort. Score is computed from arrays before creating ExperimentRecommendation objects.

7. Minor cleanups

_SUPPORTED_SATURATION: Removed dead "logistic" entry — type().__name__ always returns "LogisticSaturation".
plot_adstock_ramp: Removed unused recommendations parameter (the code never used it despite the docstring claiming otherwise).
plot_channel_diagnostics: Simplified from ~50 lines of inline computation to a loop over _channel_metrics() values.

Test Impact

Category	Result
Golden tests (`test_golden.py`, 14 tests)	All pass — zero numerical change
Public-API tests	All pass unchanged
Private-method tests	3 scaler tests removed, 2 exception tests updated to verify warnings
Total	128 passed, 2 skipped (slow)

Files changed: designer.py (+333 −313), test_designer.py (+40 −40), test_golden.py (new, 241 lines).

…iency only) Remove four references to the dropped scoring dimensions (uncertainty, correlation, gradient) that no longer exist after the scoring simplification. Made-with: Cursor

williambdean

Initial thoughts

williambdean · 2026-03-24T16:12:18Z

+        mmm = MMM(...)
+        mmm.fit(X, y)
+
+        designer = ExperimentDesigner(mmm)


Is it possible to just take actuals and posterior predictives? Or just the residuals ?

Addressed in b36929c. We kept the y - mmm.predict(mmm.X) approach and added a docstring explaining why: predict() is the public API for point predictions, it avoids duplicating training data the model already holds, and we only need a point estimate of residual noise — not the full posterior predictive distribution. Falls back to residual_std=1.0 with a warning if predict() fails.

williambdean · 2026-03-24T16:13:13Z

+
+        channel_columns = list(mmm.channel_columns)
+        for channel in channel_columns:
+            sel = {channel_dim: channel} if channel_dim else {}
+            posterior_samples[channel] = {
+                "lam": posterior[sat_var_map["lam"]]
+                .sel(**sel)
+                .values.astype(np.float64),
+                "beta": posterior[sat_var_map["beta"]]
+                .sel(**sel)
+                .values.astype(np.float64),
+                "alpha": posterior[ads_var_map["alpha"]]
+                .sel(**sel)
+                .values.astype(np.float64),
+            }


What is actually needed from the media transformation? Maybe there is a way to leverage
the graph directly.

Addressed in b36929c. The entire parameter-extraction approach is gone. We now call extract_response_distribution to trace the model's channel_contribution graph, substitute posterior samples, and compile it into a reusable PyTensor function via _build_eval_fn_from_model. All evaluation (lift prediction, ramp, diagnostics, plotting) goes through this compiled graph — no manual access to lam, beta, or alpha needed.

williambdean · 2026-03-24T16:15:00Z

+    @staticmethod
+    def _eval_saturation(
+        x: np.ndarray | float,
+        lam: np.ndarray | float,
+        beta: np.ndarray | float,
+    ) -> np.ndarray:
+        """Evaluate the logistic saturation response in pure numpy.
+
+        Uses the same formula as
+        :func:`pymc_marketing.mmm.transformers.logistic_saturation`
+        but avoids a PyTensor round-trip so callers can pass raw arrays.
+        """
+        x = np.asarray(x)
+        lam = np.asarray(lam)
+        sat = (1.0 - np.exp(-lam * x)) / (1.0 + np.exp(-lam * x))
+        return np.asarray(beta) * sat


We shouldn't have to do this.

Addressed in b36929c. This method (_eval_saturation) and the companion _compute_steady_state_spend / _compute_adstock_ramp are all deleted. Everything now evaluates through the compiled PyTensor graph — no reimplementation of transformation formulas.

…dback Replace manual NumPy reimplementations of adstock/saturation with a compiled PyTensor graph (via extract_response_distribution), making the designer honest for any adstock/saturation combination. - Replace geometric-only ramp helpers with graph-based ramp computation - Replace "Adstock α" diagnostic with "Ramp @ Xw" (adstock-agnostic) - Add __init__(mmm) test coverage (happy path, error, and warning paths) - Add docstring justification for residual computation approach - Update golden test ramp values (intentional math change) Made-with: Cursor

drbenvincent · 2026-04-06T15:02:52Z

Addressing review feedback

Graph-based ramp fraction (scope/contract fix)

Deleted the three geometric-adstock-specific helpers (_get_alpha_samples, _ramp_fractions_matrix, _compute_ramp_fraction) and replaced them with graph-based equivalents that work for any adstock type:

_steady_state_per_week_lift() runs a long experiment through the compiled graph to get the true steady-state per-week lift.
_compute_graph_ramp_fraction() computes avg_per_week_lift / steady_state_per_week_lift, valid regardless of adstock formulation.
plot_adstock_ramp and _channel_metrics now use the graph rather than the geometric series formula.
The "Adstock α" diagnostic panel is replaced with "Ramp @ Xw (fraction of steady state)" — a more general and interpretable metric.

Golden test ramp values updated (lift, assurance, and SNR are unchanged; the small ramp shift reflects the graph now accounting for saturation nonlinearity jointly with adstock).

`init(mmm)` test coverage

Added TestInitFromMMM with six tests covering the main constructor path:

Happy path (real lightweight PyMC model wired through _build_eval_fn_from_model)
ValueError for unfitted MMM
NotImplementedError for adstock_first=False
Warning fallback when predict() fails
Warning fallback when spend correlation is unavailable
End-to-end recommend() through the MMM constructor

Residual computation

Kept the y - predict(X) approach in _compute_residual_std and added a docstring explaining the rationale: predict() is the public API for point predictions, avoids duplicating training data, and only a point estimate is needed — not the full posterior predictive distribution.

…esigner

Update two cells that accessed the deleted _posterior_samples dict to use the xarray posterior Dataset directly. Made-with: Cursor

juanitorduz · 2026-04-20T22:27:20Z

@BugBot review

cursor · 2026-04-20T22:27:43Z

PR Summary

Medium Risk
Adds a new public mmm.experiment_design API that compiles/evaluates PyMC graphs and computes assurance/scoring; math/graph compilation and plotting paths could introduce subtle numerical or performance regressions despite good test coverage.

Overview
Adds a new mmm.experiment_design subpackage providing an ExperimentDesigner that ranks candidate lift-test designs (channel × spend change × duration) using posterior-predicted lift (including adstock ramp), an AR(1)-adjusted noise model, and Bayesian assurance, returning an ExperimentRecommendations container with rationale strings and notebook-friendly rendering.

Exports the new types from pymc_marketing.mmm, includes a fixture generator for InferenceData-based demos/tests, adds multiple plotting helpers, and updates the docs gallery to link an Experiment Designer notebook; comprehensive unit/golden/slow tests are added to pin expected numerical outputs and cover edge cases.

^{Reviewed by Cursor Bugbot for commit 0fa1336. Bugbot is set up for automated code reviews on this repo. Configure here.}

cursor

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

^{Reviewed by Cursor Bugbot for commit 0fa1336. Configure here.}

juanitorduz

Some initial comments :)

juanitorduz · 2026-04-20T22:33:24Z

+def _geometric_adstock_np(
+    x: np.ndarray, alpha: float, l_max: int, normalize: bool = True
+) -> np.ndarray:
+    """Apply geometric adstock to a 1-D series (numpy)."""
+    n = len(x)
+    weights = alpha ** np.arange(l_max)
+    if normalize:
+        weights = weights / weights.sum()
+
+    out = np.zeros(n)
+    for t in range(n):
+        for lag in range(min(l_max, t + 1)):
+            out[t] += weights[lag] * x[t - lag]
+    return out
+
+
+def _logistic_saturation_np(x: np.ndarray, lam: float) -> np.ndarray:
+    """Numpy logistic saturation (without beta scaling)."""
+    return (1.0 - np.exp(-lam * x)) / (1.0 + np.exp(-lam * x))


This seems redundant, cant't we use https://github.com/pymc-labs/pymc-marketing/blob/main/pymc_marketing/mmm/transformers.py ?

juanitorduz · 2026-04-20T22:34:01Z

+    return (1.0 - np.exp(-lam * x)) / (1.0 + np.exp(-lam * x))
+
+
+def generate_experiment_fixture(


Do we need this for the code or just for tests?

Could we re=use some of https://www.pymc-marketing.io/en/stable/notebooks/mmm/mmm_data_generator.html ?

juanitorduz · 2026-04-20T22:35:20Z

+        parts.append(")")
+        return "".join(parts)
+
+    # -- Jupyter rendering ---------------------------------------------------


remove this comment

juanitorduz · 2026-04-20T22:36:08Z

+        return pd.DataFrame(records)
+
+
+_HIGH_CORRELATION_THRESHOLD = 0.7


at the top of the file?

drbenvincent added the Needs Triage label Mar 2, 2026

github-actions Bot added MMM tests labels Mar 2, 2026

drbenvincent marked this pull request as draft March 2, 2026 14:09

drbenvincent added 2 commits March 2, 2026 14:20

Remove issue reference file from tracking

915c0ba

Made-with: Cursor

github-actions Bot added the docs Improvements or additions to documentation label Mar 2, 2026

drbenvincent added 7 commits March 2, 2026 14:30

various improvements to docs and code

4fb46c9

new lines

8767413

Merge branch 'main' into fix/2355-experiment-designer

4a8c43f

Merge branch 'fix/2355-experiment-designer' of https://github.com/pym…

63ed769

…c-labs/pymc-marketing into fix/2355-experiment-designer

many more docs improvements

b4bf65f

Merge branch 'main' into fix/2355-experiment-designer

9cc4aca

drbenvincent marked this pull request as ready for review March 11, 2026 14:35

drbenvincent mentioned this pull request Mar 11, 2026

Experimentation tools: How long should my experiment be? pymc-labs/CausalPy#721

Open

Merge branch 'main' into fix/2355-experiment-designer

dc17d4d

drbenvincent added 2 commits March 17, 2026 12:42

drbenvincent added 2 commits March 17, 2026 13:24

Merge branch 'main' into fix/2355-experiment-designer

d627073

drbenvincent added 2 commits March 17, 2026 15:57

Update notebook to reflect simplified scoring (assurance + cost_effic…

eff26b4

…iency only) Remove four references to the dropped scoring dimensions (uncertainty, correlation, gradient) that no longer exist after the scoring simplification. Made-with: Cursor

Merge branch 'main' into fix/2355-experiment-designer

f70cbfe

williambdean reviewed Mar 24, 2026

View reviewed changes

drbenvincent added 4 commits April 6, 2026 16:06

Merge remote-tracking branch 'origin/main' into fix/2355-experiment-d…

aa84009

…esigner

Fix docs notebook: replace removed _posterior_samples with _posterior

97afbb4

Update two cells that accessed the deleted _posterior_samples dict to use the xarray posterior Dataset directly. Made-with: Cursor

Merge branch 'main' into fix/2355-experiment-designer

3c41ff7

Merge branch 'main' into fix/2355-experiment-designer

0fa1336

cursor Bot reviewed Apr 20, 2026

View reviewed changes

juanitorduz requested changes Apr 20, 2026

View reviewed changes

		return (1.0 - np.exp(-lam * x)) / (1.0 + np.exp(-lam * x))


		def generate_experiment_fixture(

		return pd.DataFrame(records)


		_HIGH_CORRELATION_THRESHOLD = 0.7

Conversation

drbenvincent commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

New files

Test plan

Uh oh!

codecov Bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

review-notebook-app Bot commented Mar 2, 2026

Uh oh!

drbenvincent commented Mar 17, 2026

Point 1: The posterior-as-design-prior is epistemically fragile

Point 2: Channels the model believes are null will never be recommended for testing

Point 3: The IID residuals assumption likely leads to systematically overestimated assurance

Point 4: The weighted scoring system feels arbitrary and gameable

Uh oh!

drbenvincent commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

drbenvincent commented Mar 17, 2026

Point 1: Posterior-as-design-prior is epistemically fragile

Point 2: Channels the model believes are null will never be recommended

Point 3: IID residuals assumption overestimates assurance

Point 4: Scoring system feels arbitrary and gameable

Uh oh!

drbenvincent commented Mar 17, 2026

Uh oh!

drbenvincent commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ExperimentDesigner Refactoring — Summary of Changes

Changes Made

1. Eliminated computation duplication

2. Simplified scoring system

3. Fixed from_idata construction anti-pattern

4. Fixed silent exception swallowing

5. Removed unused scalers

6. Decomposed recommend()

7. Minor cleanups

Test Impact

Uh oh!

williambdean left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

drbenvincent commented Apr 6, 2026

Addressing review feedback

Graph-based ramp fraction (scope/contract fix)

__init__(mmm) test coverage

Residual computation

Uh oh!

juanitorduz commented Apr 20, 2026

Uh oh!

cursor Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

juanitorduz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

drbenvincent commented Mar 2, 2026 •

edited

Loading

codecov Bot commented Mar 2, 2026 •

edited

Loading

drbenvincent commented Mar 17, 2026 •

edited

Loading

drbenvincent commented Mar 17, 2026 •

edited

Loading

3. Fixed `from_idata` construction anti-pattern

6. Decomposed `recommend()`

`init(mmm)` test coverage

cursor Bot commented Apr 20, 2026 •

edited

Loading