Centralize materialized view registry and refactor refresh task#292
Centralize materialized view registry and refactor refresh task#292shlokgilda wants to merge 4 commits into
Conversation
|
How easily would this method handle a move of all materialized views to their own db schema? |
37f7acd to
0db2f05
Compare
|
im taking this over/picking up this issue on behalf of shlok, just rebased it and made an adjustment to more intelligently handle jumping straight to a non-concurrent refresh when there is no index (indicating we can run concurrently) |
|
Note to self: materialized views should be moved to a schema called |
0db2f05 to
b9b7f02
Compare
|
need to also test an actual refresh cycle for this |
b9b7f02 to
2a8f113
Compare
- add collectoss/application/db/materialized_views.py: single source of truth for all 15 existing materialized view definitions (SQL + unique index columns). get_refresh_sql() helper constructs REFRESH statements. - refactor collectoss/tasks/db/refresh_materialized_views.py: replace 14 hardcoded try/except-pass blocks with a dynamic loop over the registry. uses an AUTOCOMMIT connection so REFRESH CONCURRENTLY works correctly (it cannot run inside a transaction block). raises RuntimeError if any view fails both concurrent and non-concurrent refresh. - wire alembic_utils into collectoss/application/schema/alembic/env.py: register all views from the registry so that alembic revision --autogenerate detects SQL definition changes automatically. - add alembic-utils==0.8.8 to pyproject.toml and regenerate uv.lock. closes chaoss#243 (partial — heatmap views follow in a separate PR) Signed-off-by: Shlok Gilda <gildashlok@hotmail.com>
Replace the list-of-dicts registry with a frozen MaterializedView dataclass exposing fqn, refresh_sql(), and to_pg_view(). Brings the registry's shape in line with the declarative ORM style used elsewhere in the codebase and gives callers attribute access + type checking instead of string-keyed dict lookups. unique_index_columns is a tuple so frozen=True actually means immutable. __repr__ is overridden to keep the multi-hundred-line view SQL out of debug logs. Refresh task and alembic env.py updated to use the new API; get_refresh_sql free function removed (only two call sites). Emitted REFRESH SQL is byte-identical to the previous version.
…e correct way based on whether indexes exist or not Signed-off-by: Adrian Edwards <adredwar@redhat.com>
Signed-off-by: Adrian Edwards <adredwar@redhat.com>
2a8f113 to
fd9a862
Compare
|
I wonder if DBT may be better for this usecase by allowing us to move the transformation of data from our internal messy schema to an external presentation ready schema See #18 I think this could move a lot of our processing outside of postgres. Unsure how that would help with resource contention though. That said, DBT seems to be a standard thing that huge data systems use.... |
Description:
Introduces a single source of truth for all 15 existing PostgreSQL materialized view definitions and replaces a fragile hardcoded refresh task with a dynamic loop.
collectoss/application/db/materialized_views.py— registry of all 15 existing materialized views (SQL definitions + unique index columns).get_refresh_sql()helper buildsREFRESH MATERIALIZED VIEWstatements.collectoss/tasks/db/refresh_materialized_views.py— replaces 14 hardcodedtry/except: passblocks with a dynamic loop over the registry. Uses anAUTOCOMMITconnection soREFRESH CONCURRENTLYactually works(it cannot run inside a transaction block, which the old code silently violated). Raises
RuntimeErrorat the end if any view failed both concurrent and non-concurrent refresh, so failures are visible in Celery instead of swallowed.alembic_utilsintocollectoss/application/schema/alembic/env.py— registers all views from the registry soalembic revision --autogeneratedetects SQL definition changes automatically (phase 2 of Add new materialized views for heatmaps on 8knot (and make a system for it) #243).alembic-utils==0.8.8topyproject.tomland regenerateuv.lock.No schema changes in this PR. The 3 new heatmap views for 8Knot follow in a separate incremental PR on top of this one.
This PR fixes (partial — heatmap views follow separately) #243
Notes for Reviewers:
The key correctness fix worth understanding:
REFRESH MATERIALIZED VIEW CONCURRENTLYwhich wraps everything inengine.begin(), so every concurrent refresh was silently failing and falling back to a blocking refresh on every run. This PR fixes that by usingengine.connect().execution_options(isolation_level="AUTOCOMMIT")directly.The
; COMMIT;embedded in the old SQL strings has also been removed — those were breaking SQLAlchemy's transaction management.First post-deploy
alembic revision --autogeneratemay propose a no-op normalization revision for views whose SQL Postgres normalizes differently from what's in the registry (Postgres reformats SQL on storage). Safe to discard that revision.Signed commits
AI Disclosure: Claude Code was used to draft this PR description and write docstrings in the new
materialized_views.pymodule. I reviewed and verified all code changes, SQL definitions, and fixes before committing.