Consolidate MMM data input into unified xr.Dataset representation#2596
Open
williambdean wants to merge 2 commits into
Open
Consolidate MMM data input into unified xr.Dataset representation#2596williambdean wants to merge 2 commits into
williambdean wants to merge 2 commits into
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2596 +/- ##
===========================================
- Coverage 93.96% 73.10% -20.87%
===========================================
Files 95 96 +1
Lines 14371 14432 +61
===========================================
- Hits 13504 10550 -2954
- Misses 867 3882 +3015 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
juanitorduz
reviewed
Jun 3, 2026
Collaborator
juanitorduz
left a comment
There was a problem hiding this comment.
Great initiative! Just a simple comment as the test are failing :)
Comment on lines
+77
to
+79
| # --------------------------------------------------------------------------- | ||
| # Public API — single entry point | ||
| # --------------------------------------------------------------------------- |
Collaborator
There was a problem hiding this comment.
can we remove these comments?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Consolidate all MMM data input paths into a single
to_mmm_datasetentry point, usingxr.Datasetas the canonical internal representation. Removesself.X/self.yfromMMM, eliminates ~700 lines of scattered type-dispatching code, and opens the door to different granularities of data sources.Context
This refactoring standardizes on
xr.Datasetas the single internal data format. The_channel,_target, and_controlvariables now have consistent dims and coordinates, regardless of whether input comes from pandas, xarray, or numpy. This means:dimsparameter threads through naturally — addingdims=("country",)ordims=("region", "product")just adds extra coordinates to the Datasetbuild_modelandfitnow acceptpd.DataFrame,xr.Dataset, orxr.DataArray— users can pass data directly from xarray-based pipelinesChanges
pymc_marketing/mmm/_data_conversion.py(new, 423 lines)to_mmm_dataset(X, y, ...)— singlesingledispatchentry point that normalizespd.DataFrame,xr.Dataset, andxr.DataArrayinputs into a canonicalxr.DatasetdimsparameterMultiIndex+sort_index+to_xarrayalignment for proper coordinate orderingpymc_marketing/mmm/mmm.py(-364 net)_normalize_target,_create_xarray_from_pandas+ 5 sub-methods, duplicatecreate_fit_datalogic_generate_and_preprocess_model_data— singleto_mmm_datasetcall, noself.X/self.y_posterior_predictive_data_transformation— usesto_mmm_dataset_apply_budget_distribution_pattern/_apply_carryover_effect— operate onxr.Datasetdirectlybuild_from_idata— setsself.idata = idatasosample_posterior_predictiveworks after loadbuild_model/fit— acceptpd.DataFrame,xr.Dataset, orxr.DataArraycreate_fit_data— handles embedded target in xr.Dataset, cleans up underscore columnspymc_marketing/mmm/utils.py(-102 net)create_zero_dataset— returnsxr.Datasetwith_channel/_controlvariablesadd_noise_to_channel_allocation— supports bothpd.DataFrameandxr.Datasetpymc_marketing/model_builder.py(+1 net)RegressionModelBuilder.build_from_idata— setsself.idata = idatatests/mmm/test_utils.py— Updated fake model classes and assertions forxr.Datasetreturn typeTesting
tests/mmm/pass (230 intest_mmm.py, 151 in budget/optimizer/cost/utils, 774 in remaining suites)📚 Documentation preview 📚: https://pymc-marketing--2596.org.readthedocs.build/en/2596/