Skip to content

Add reduce() list folding function#2435

Merged
jrgemignani merged 1 commit into
apache:masterfrom
jrgemignani:add-reduce-list-folding-function
Jun 26, 2026
Merged

Add reduce() list folding function#2435
jrgemignani merged 1 commit into
apache:masterfrom
jrgemignani:add-reduce-list-folding-function

Conversation

@jrgemignani

Copy link
Copy Markdown
Contributor

Implement the openCypher reduce(acc = init, var IN list | body) expression, which folds an arbitrary expression over a list, threading an accumulator across the elements in list order. This closes a long-standing gap (reduce() was previously unsupported) and works both at the SQL top level and inside a cypher() RETURN/WHERE.

Implementation

reduce() is desugared, at transform time, into a correlated scalar subquery over a new ordered aggregate rather than a new executor node, so no PostgreSQL core changes are required:

CASE WHEN list IS NULL THEN NULL
     ELSE COALESCE((SELECT ag_catalog.age_reduce(
                        <init>, '<serialized-body>'::text,
                        r.elem ORDER BY r.ord)
                    FROM unnest(<list>) WITH ORDINALITY AS r(elem, ord)),
                   <init>)
END
  • A new cypher_reduce extensible node carries the accumulator/element names and the init/list/body expressions (grammar production, keyword, and the copy/out/read serialization plumbing).
  • The fold body is transformed against a throwaway two-column agtype namespace, its accumulator and element Var references are rewritten to PARAM_EXEC params 0 and 1, and it is serialized with nodeToString() into a text argument.
  • age_reduce_transfn (a custom agtype aggregate transition function) deserializes and compiles the body once per group with ExecInitExpr, then evaluates it per element with ExecEvalExpr, rebinding the two params. The body is normalized to agtype at transform time so a boolean or other non-agtype result cannot be misread as a by-reference Datum.

Semantics

  • List order is preserved (unnest WITH ORDINALITY + aggregate ORDER BY).
  • An empty list yields the initial value; a NULL list yields NULL.
  • The list and initial value may reference outer-query variables (e.g. reduce(total = 0, n IN nodes(p) | total + n.age)); the body may reference only the accumulator and element.
  • Arithmetic, string, list-building, boolean/comparison (AND/OR/=/>), CASE, and element property-access bodies are all supported.
  • Outer-variable, query-parameter, nested-reduce, and aggregate references inside the body raise a clean ERRCODE_FEATURE_NOT_SUPPORTED error.
  • reduce is registered as a safe keyword so it remains usable as a property or map key, preserving backward compatibility.

Tests

Adds the age_reduce regression test (registered in the install SQL and the upgrade template so age_upgrade passes), covering: arithmetic/product/string folds; order sensitivity; empty/NULL list; NULL element and NULL init; list-building and CASE bodies; boolean and comparison bodies; element property access; multiple and nested (in list/init) reduce(); reduce() in boolean expressions, WHERE, and list comprehensions; folds over collected nodes and node list properties; the not-supported rejections; and reduce as a map key. All multi-row results are ordered. 38/38 installcheck pass.

Future work

The body restriction (accumulator and element only) is a property of the standalone expression evaluation and can be relaxed without core changes:

  • Allow loop-invariant outer-variable and cypher $parameter references in the body by capturing them as additional eager aggregate arguments bound to extra param slots.
  • Support a nested reduce() inside the body via an SPI-based evaluation fallback for subquery-bearing bodies. Aggregates inside the body remain intentionally unsupported, matching the openCypher specification.

Co-authored-by: Copilot copilot@github.com

modified: Makefile
modified: age--1.7.0--y.y.y.sql
new file: regress/expected/age_reduce.out
new file: regress/sql/age_reduce.sql
modified: sql/age_aggregate.sql
modified: src/backend/nodes/ag_nodes.c
modified: src/backend/nodes/cypher_copyfuncs.c
modified: src/backend/nodes/cypher_outfuncs.c
modified: src/backend/nodes/cypher_readfuncs.c
modified: src/backend/parser/cypher_analyze.c
modified: src/backend/parser/cypher_clause.c
modified: src/backend/parser/cypher_gram.y
modified: src/backend/utils/adt/agtype.c
modified: src/include/nodes/ag_nodes.h
modified: src/include/nodes/cypher_copyfuncs.h
modified: src/include/nodes/cypher_nodes.h
modified: src/include/nodes/cypher_outfuncs.h
modified: src/include/nodes/cypher_readfuncs.h
modified: src/include/parser/cypher_kwlist.h

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds openCypher reduce(acc = init, var IN list | body) support by desugaring the expression at transform time into a correlated scalar subquery that folds over unnest(list) WITH ORDINALITY using a new ordered aggregate (ag_catalog.age_reduce) and a serialized standalone fold-body expression evaluated inside the aggregate transition function.

Changes:

  • Adds a new cypher_reduce ExtensibleNode plus grammar/keyword support for parsing reduce(...).
  • Implements transform_cypher_reduce() to rewrite reduce() into an ordered aggregate over unnest(... ) WITH ORDINALITY, serializing the fold body with accumulator/element rewritten to PARAM_EXEC slots.
  • Adds ag_catalog.age_reduce + age_reduce_transfn (C) and a comprehensive regression test suite (age_reduce).

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
Makefile Registers age_reduce in the regression test list.
age--1.7.0--y.y.y.sql Adds upgrade-time creation of age_reduce_transfn and age_reduce aggregate.
regress/expected/age_reduce.out Expected output for the new reduce() regression coverage.
regress/sql/age_reduce.sql New regression tests covering semantics, order, null/empty behavior, and unsupported constructs.
sql/age_aggregate.sql Defines age_reduce_transfn and age_reduce aggregate in install SQL.
src/backend/nodes/ag_nodes.c Registers cypher_reduce in node name/method tables.
src/backend/nodes/cypher_copyfuncs.c Adds copy support for cypher_reduce.
src/backend/nodes/cypher_outfuncs.c Adds serialization support for cypher_reduce.
src/backend/nodes/cypher_readfuncs.c Adds deserialization support for cypher_reduce.
src/backend/parser/cypher_analyze.c Extends raw-expression walker to traverse cypher_reduce fields.
src/backend/parser/cypher_clause.c Implements transform_cypher_reduce() desugaring and fold-body param rewrite/validation.
src/backend/parser/cypher_gram.y Adds reduce(...) grammar production and marks REDUCE as safe keyword for map keys.
src/backend/utils/adt/agtype.c Implements age_reduce_transfn (aggregate transition fn) that deserializes/compiles and evaluates the fold body.
src/include/nodes/ag_nodes.h Adds cypher_reduce_t tag.
src/include/nodes/cypher_copyfuncs.h Declares copy_cypher_reduce.
src/include/nodes/cypher_nodes.h Defines the cypher_reduce ExtensibleNode struct.
src/include/nodes/cypher_outfuncs.h Declares out_cypher_reduce.
src/include/nodes/cypher_readfuncs.h Declares read_cypher_reduce.
src/include/parser/cypher_kwlist.h Adds reduce keyword token (REDUCE).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/backend/utils/adt/agtype.c
Comment thread src/backend/parser/cypher_clause.c
@jrgemignani jrgemignani force-pushed the add-reduce-list-folding-function branch 2 times, most recently from 36162be to 6c68874 Compare June 11, 2026 14:04
@jrgemignani jrgemignani force-pushed the add-reduce-list-folding-function branch from 6c68874 to 80841fd Compare June 12, 2026 16:15
@jrgemignani jrgemignani force-pushed the add-reduce-list-folding-function branch from 80841fd to 56fb282 Compare June 21, 2026 04:23
@jrgemignani jrgemignani requested a review from gregfelice June 21, 2026 04:30
@gregfelice

Copy link
Copy Markdown
Contributor

Reviewed the full diff and the changed files in context, with two deep passes on the executor/planner mechanics. I independently verified the two subtle correctness traps this kind of desugaring is prone to — the COALESCE-NULL ambiguity and PARAM_EXEC isolation — and both are genuinely handled. Solid work; no blockers or majors. (Source-level review; I could not run make here — the 38/38 installcheck is the author's.)

Verified correct

  • The empty-list vs. fold-to-null ambiguity is correctly resolved — the trap I most expected here. age_reduce_transfn never returns SQL NULL: a null fold result is normalized to an agtype 'null' varlena (agtype.c:11730-11733), and the accumulator is seeded the same way. So the aggregate returns SQL NULL only over zero rows. The desugaring CASE WHEN list IS NULL THEN NULL ELSE COALESCE(<agg>, init) END therefore reads correctly: empty list -> SQL NULL -> init; a legitimate fold-to-null -> agtype 'null' (not SQL NULL) -> passes through COALESCE. The [1, null, 3] -> null test (age_reduce.sql:186) exercises exactly this. The classic bug is not present.
  • By-value Datum safety holds. The body is normalized to agtype at transform time, and the transfn independently re-checks exprType(body_node) == AGTYPEOID at runtime (agtype.c:11680) because age_reduce is SQL-callable, so datumCopy(result, false, -1) can never misread a by-value bool/int as a varlena. Good defense in depth.
  • Accumulator survives transitionsdatumCopy'd into aggcontext, not the per-tuple context; PG_ARGISNULL(0) is a reliable first-element signal precisely because the state is never SQL NULL.
  • PARAM_EXEC 0/1 cannot collide with planner params. The body is evaluated in a private standalone ExprContext whose ecxt_param_exec_vals is the transfn's own 2-element array; the outer planner never sees ids 0/1, and no node type that would need a parent (Aggref/SubLink/WindowFunc) can reach the body — all are rejected before serialization.
  • Nested-reduce / outer-var / param rejection is enforced at transform time (a nested reduce lowers to a SubLink, which reduce_body_check_walker rejects; the varlevelsup == 0 guard prevents an outer RTE sharing a varno from being rewritten into the accumulator param). The three rejection tests confirm it.
  • Init-evaluated-once is correct. Wrapping init in CASE WHEN reduce_ordinality = 1 THEN <init> ELSE NULL::agtype END is sound because the aggregate-level ORDER BY delivers ordinality 1 first and the transfn only reads init when the state is NULL; PARALLEL UNSAFE + no combinefn prevents any partial-aggregate reordering.
  • Serialization plumbing is complete and symmetriccypher_reduce is wired into the tag enum, node_names[]/node_methods[], all five fields in copy/out/read with matching headers, and the raw-expr walker in cypher_analyze.c descends all three expr fields. No field missed.
  • Backward compat preservedreduce is added to the safe_keywords grammar production (like any/none/order), and {reduce: 1} still works as a map key.

Test coverage gaps (cheap to close, no bug implied)

The mechanisms below are correct by inspection, but the suite doesn't lock in the exact semantics that make them correct:

  1. A fold that legitimately produces null mid-way and then recovers (e.g. a coalesce/CASE body that climbs back out of null). Only "null propagates to a null result" is tested; the recovery path through the agtype-'null' state is the load-bearing one and isn't directly exercised.
  2. Empty list with a NULL init (reduce(s = null, x IN [] | ...)) — the COALESCE(NULL, null) path is unverified.
  3. A type error / runtime failure in the body (e.g. a non-coercible or arithmetic-invalid body) — the coerce_to_common_type(..., "reduce") and runtime error paths are unexercised.

(Large-list / interrupt-path stress and deeper nesting are nice-to-haves; the SubLink check makes nesting depth irrelevant.)

Adding #1#3 would be a few lines each and would pin down precisely the behavior this implementation gets right. Nice piece of work overall — the agtype-'null' state trick and the standalone-ExprContext param isolation are the right calls.

@gregfelice

Copy link
Copy Markdown
Contributor

Design check before this lands — desugar-to-aggregate as the foundation for reduce()

Flagging the core design decision for lazy consensus, since it'll likely become the template for future folding/comprehension work. reduce() here is implemented entirely at transform time — desugared into a correlated scalar subquery over a new ordered aggregate (age_reduce), with the fold body serialized via nodeToString() and re-evaluated per element through a standalone ExecInitExpr/ExecEvalExpr bound to PARAM_EXEC 0/1 — so no PostgreSQL core changes are needed. List order comes from the aggregate's ORDER BY; init is evaluated once via CASE WHEN ordinality = 1.

I've reviewed it closely and it's correct (the empty-list-vs-fold-to-null ambiguity and the param-isolation traps are both handled) and green under cassert PG18. Two things worth a second committer's eyes before merge:

  1. Is desugar-to-aggregate the foundation we want for reduce() (and as a precedent for reduce-adjacent features), versus a dedicated executor node?
  2. The body-scope restriction (accumulator + element only; outer vars / $params / nested reduce / aggregates rejected with FEATURE_NOT_SUPPORTED) — acceptable as a first cut given the author's documented path to relax it?

No objection from me on either; raising for lazy consensus rather than landing a design-setting change on a lone review.

@jrgemignani

jrgemignani commented Jun 21, 2026

Copy link
Copy Markdown
Contributor Author

@gregfelice I will review this Monday to see what can be adjusted. BTW, it is -> my opinion <- that reduce is closer to an aggregate than to anything else.

Also, this is part 1 of 2, maybe 3 for reduce. Those features not supported are covered in the second PR. It was just going to be too big to create just one and review. This is also true for shortest path.

@jrgemignani

Copy link
Copy Markdown
Contributor Author

@gregfelice Per Opus, aggregate is the appropriate place as reduce is basically an aggregate function.

I am working on rebasing and applying the gap fixes now and will update this PR shortly.

Implement the openCypher reduce(acc = init, var IN list | body) expression,
which folds an arbitrary expression over a list, threading an accumulator
across the elements in list order. This closes a long-standing gap (reduce()
was previously unsupported) and works both at the SQL top level and inside a
cypher() RETURN/WHERE.

Implementation
--------------
reduce() is desugared, at transform time, into a correlated scalar subquery
over a new ordered aggregate rather than a new executor node, so no PostgreSQL
core changes are required:

    CASE WHEN list IS NULL THEN NULL
         ELSE COALESCE((SELECT ag_catalog.age_reduce(
                            <init>, '<serialized-body>'::text,
                            r.elem ORDER BY r.ord)
                        FROM unnest(<list>) WITH ORDINALITY AS r(elem, ord)),
                       <init>)
    END

  - A new cypher_reduce extensible node carries the accumulator/element names
    and the init/list/body expressions (grammar production, keyword, and the
    copy/out/read serialization plumbing).
  - The fold body is transformed against a throwaway two-column agtype
    namespace, its accumulator and element Var references are rewritten to
    PARAM_EXEC params 0 and 1, and it is serialized with nodeToString() into a
    text argument.
  - age_reduce_transfn (a custom agtype aggregate transition function)
    deserializes and compiles the body once per group with ExecInitExpr, then
    evaluates it per element with ExecEvalExpr, rebinding the two params. The
    body is normalized to agtype at transform time so a boolean or other
    non-agtype result cannot be misread as a by-reference Datum.

Semantics
---------
  - List order is preserved (unnest WITH ORDINALITY + aggregate ORDER BY).
  - An empty list yields the initial value; a NULL list yields NULL.
  - The list and initial value may reference outer-query variables (e.g.
    reduce(total = 0, n IN nodes(p) | total + n.age)); the body may reference
    only the accumulator and element.
  - Arithmetic, string, list-building, boolean/comparison (AND/OR/=/>), CASE,
    and element property-access bodies are all supported.
  - Outer-variable, query-parameter, nested-reduce, and aggregate references
    inside the body raise a clean ERRCODE_FEATURE_NOT_SUPPORTED error.
  - reduce is registered as a safe keyword so it remains usable as a property
    or map key, preserving backward compatibility.

Tests
-----
Adds the age_reduce regression test (registered in the install SQL and the
upgrade template so age_upgrade passes), covering: arithmetic/product/string
folds; order sensitivity; empty/NULL list; NULL element and NULL init;
list-building and CASE bodies; boolean and comparison bodies; element property
access; multiple and nested (in list/init) reduce(); reduce() in boolean
expressions, WHERE, and list comprehensions; folds over collected nodes and
node list properties; the not-supported rejections; and reduce as a map key.

Following reviewer feedback, three further semantics-coverage gaps are
pinned directly so the mechanisms that make the aggregate desugaring
correct are exercised by tests rather than only correct by inspection:
  - A fold body that produces null mid-fold and then recovers: the
    agtype 'null' running state is a readable value, so a later element
    folds back out of it (distinct from "null propagates to a null
    result", which was already covered).
  - An empty list with a NULL initial value: COALESCE(<no rows>, init)
    yields NULL, kept distinct from a body that legitimately folds to
    agtype 'null', which must not be resurrected to the initial value.
  - A type error and a runtime division-by-zero error in the body: both
    abort cleanly out of the standalone per-element evaluator rather
    than corrupting the running aggregate state.

All multi-row results are ordered. 42/42 installcheck pass.

Future work
-----------
The body restriction (accumulator and element only) is a property of the
standalone expression evaluation and can be relaxed without core changes:
  - Allow loop-invariant outer-variable and cypher $parameter references in
    the body by capturing them as additional eager aggregate arguments bound
    to extra param slots.
  - Support a nested reduce() inside the body via an SPI-based evaluation
    fallback for subquery-bearing bodies.
Aggregates inside the body remain intentionally unsupported, matching the
openCypher specification.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

modified:   Makefile
modified:   age--1.7.0--y.y.y.sql
new file:   regress/expected/age_reduce.out
new file:   regress/sql/age_reduce.sql
modified:   sql/age_aggregate.sql
modified:   src/backend/nodes/ag_nodes.c
modified:   src/backend/nodes/cypher_copyfuncs.c
modified:   src/backend/nodes/cypher_outfuncs.c
modified:   src/backend/nodes/cypher_readfuncs.c
modified:   src/backend/parser/cypher_analyze.c
modified:   src/backend/parser/cypher_clause.c
modified:   src/backend/parser/cypher_gram.y
modified:   src/backend/utils/adt/agtype.c
modified:   src/include/nodes/ag_nodes.h
modified:   src/include/nodes/cypher_copyfuncs.h
modified:   src/include/nodes/cypher_nodes.h
modified:   src/include/nodes/cypher_outfuncs.h
modified:   src/include/nodes/cypher_readfuncs.h
modified:   src/include/parser/cypher_kwlist.h
@jrgemignani jrgemignani force-pushed the add-reduce-list-folding-function branch from 56fb282 to e3053e8 Compare June 22, 2026 22:11
@jrgemignani

Copy link
Copy Markdown
Contributor Author

@gregfelice Rebased and added the coverage for the gaps requested.

@gregfelice gregfelice left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — approving.

I diffed this against my local add-reduce-list-folding-function working branch: the reduce() implementation here is identical to mine (agtype.c fold logic, the node copy/out/read serialization, cypher_analyze.c, the reduce keyword, and age_aggregate.sql all match line-for-line). The only non-test differences are master drift picked up by rebasing onto current master (the hasTargetSRFs propagation from the shortest_path SRF work, and the %expect/make_exists_pattern_sublink grammar bits from pattern expressions).

The added test coverage is a real improvement — it nails down the subtle cases: empty list with a NULL initial value, the fold-to-null vs empty-list distinction, a null accumulator recovering mid-fold, and clean error propagation from the body (type error and division by zero). Good hardening.

@jrgemignani

Copy link
Copy Markdown
Contributor Author

@gregfelice I am going to merge this one in so that I can get the second part out today.

@jrgemignani jrgemignani merged commit 92e48e9 into apache:master Jun 26, 2026
6 checks passed
@jrgemignani jrgemignani mentioned this pull request Jul 2, 2026
jrgemignani added a commit that referenced this pull request Jul 2, 2026
* Make age extension usable from shared_preload_libraries (#2438)

When AGE is loaded via shared_preload_libraries, its hooks
(post_parse_analyze, set_rel_pathlist, object_access) are active
before CREATE EXTENSION age is run. This causes errors when
non-Cypher queries trigger those hooks and ag_catalog does not
yet exist.

Changes:

- Add is_age_extension_exist() with a relcache callback cache so that
  checking pg_extension is not repeated on every hook invocation.
- Guard post_parse_analyze, set_rel_pathlist, and object_access hooks
  with is_age_extension_exist() so they become no-ops when the
  extension is not installed.
- Refactor ag_ProcessUtility_hook to detect CREATE/DROP EXTENSION age
  and broadcast a relcache invalidation via
  CacheInvalidateRelcacheByRelid(ExtensionRelationId) so other
  backends update their cached extension state.
- Wrap DROP EXTENSION processing in PG_TRY/PG_CATCH to restore
  object_access_hook if the drop fails (e.g. dependent objects).
- Skip _PG_init during pg_upgrade (IsBinaryUpgrade) to avoid hook
  registration when the binary-upgrade machinery is running.
- Add regression tests that verify hooks do not error when ag_catalog
  schema is absent.

* feature: add create_subgraph() (#2441)

Add the feature create_subgraph() for materialized induced-subgraph
extraction.

Add ag_catalog.create_subgraph(new_graph, from_graph, node_filter,
relationship_filter) which materializes a new, persistent, fully
Cypher-queryable AGE graph as the induced subgraph of an existing graph.

Selection follows the graph-theory induced-subgraph definition as
operationalized by Neo4j GDS gds.graph.filter():
  * a vertex is kept iff node_filter holds ('*' keeps all);
  * an edge is kept iff relationship_filter holds AND both of its
    endpoints were kept (no dangling edges).

Filters are arbitrary Cypher predicates bound to `n` (nodes) and `r`
(relationships) and are evaluated by AGE's own Cypher engine against the
source graph, so the full predicate language is available; label
selection uses label(n)/label(r) since the match pattern is fixed.

Implementation notes:
  * Result is a real, ACID, registered graph (create_graph + create_v/
    elabel), not a virtual view; it composes with cypher() and itself.
  * Entity graphids are reassigned from the destination labels' own
    sequences (graphid encodes a per-graph label id), and edge endpoints
    are remapped through an old->new vertex map, enforcing the induced
    rule via inner joins.
  * Source label tables are read with FROM ONLY to avoid double-copying
    children under PostgreSQL table inheritance.
  * Properties of any agtype are preserved; self-loops and parallel
    edges (multigraph structure) are retained.
  * SECURITY INVOKER: reads respect the caller's table privileges and
    RLS; the new graph is owned by the caller.
  * Validates NULL/identical graph names, missing source, pre-existing
    destination, and a reserved dollar-quote token in predicates.

Wire-up:
  * sql/age_subgraph.sql (new) registered in sql/sql_files after
    age_pg_upgrade; identical body added to age--1.7.0--y.y.y.sql so the
    upgrade-path catalog comparison matches.
  * regress/sql/subgraph.sql + expected output (new), added to REGRESS.
    Covers full copy, vertex-induced, node+rel, label-only edge drop,
    bipartite, empty result, composability, self-loops/parallel edges,
    property fidelity, and error cases over a ~4500-vertex / 2000-edge
    source graph.

All 38 regression tests pass against PostgreSQL 18.

Co-authored-by: GitHub Copilot (Claude Opus 4.8) <[email protected]>

modified:   Makefile
modified:   age--1.7.0--y.y.y.sql
new file:   regress/expected/subgraph.out
new file:   regress/sql/subgraph.sql
new file:   sql/age_subgraph.sql
modified:   sql/sql_files

* cypher_vle: add ORDER BY to non-deterministic RETURN queries (#2434)

Several VLE regression queries RETURN multiple rows without an ORDER BY,
so their row order depends on traversal/scan order and can vary between
runs and platforms. Add ORDER BY ASC to those queries (on the path,
edge-list, or graphid as appropriate) so the expected output is stable.
The queries are pinned by path (p), edge list (e), or graphid
(id(u)/id(v)/id(e[n])) depending on what each RETURN projects.

Full audit of cypher_vle: all 38 multi-row result blocks were checked.
After this change, every multi-row RETURN is deterministically ordered
except the two SELECT * FROM show_list_use_vle('list01') calls, which are
already deterministic because the function body orders its results with
RETURN v ORDER BY id(v) (added in #2417); their result blocks are
unchanged by this commit.

This is a test-only change (regress/sql/cypher_vle.sql and
regress/expected/cypher_vle.out); no extension C code or SQL is modified.
Row counts are unchanged (pure reordering).

All 37 regression tests pass (installcheck) on PostgreSQL 18.3.

Co-authored-by: GitHub Copilot <noreply@github.com>

modified:   regress/expected/cypher_vle.out
modified:   regress/sql/cypher_vle.sql

* age_global_graph: stabilize regression tests (#2431)

age_global_graph: stabilize regression tests under concurrent xid load

Wrap both vertex_stats() context-building phases in a single
BEGIN ISOLATION LEVEL REPEATABLE READ; ... COMMIT; transaction so the
three calls share one snapshot. This prevents the snapshot-fallback path
in is_ggctx_invalid() from purging an already-built graph context when
concurrent xid activity (autovacuum, parallel installcheck, replication,
shared CI) advances the snapshot between calls, which would otherwise
make the targeted delete_global_graphs(name) checks return false instead
of the expected true. Read Committed is insufficient because it acquires
a fresh snapshot per statement; REPEATABLE READ pins one snapshot for the
whole transaction.

Also add explicit ORDER BY id to the three direct-SQL label-table SELECTs
(_ag_label_vertex x2, _ag_label_edge) that return multiple rows, so their
output no longer depends on heap scan order.

This is a test-only change (regress/sql/age_global_graph.sql and
regress/expected/age_global_graph.out); no extension C code or SQL is
modified.

All 37 regression tests pass (installcheck) on PostgreSQL 18.3.

Co-authored-by: GitHub Copilot <noreply@github.com>

modified:   regress/expected/age_global_graph.out
modified:   regress/sql/age_global_graph.sql

* Makefile: add installcheck-existing target and improve readability (#2437)

Add an "installcheck-existing" target that runs the regression suite
against an already-running PostgreSQL server, complementing the default
"installcheck" (which builds a private temp instance). It points
pg_regress at the server via the standard libpq variables
(PGHOST/PGPORT/PGUSER; PGDATABASE defaults to contrib_regression) and
lets pg_regress create the database and load the extension through
--load-extension=age. It deliberately avoids --use-existing -- that
option skips database creation and disables --load-extension -- so no
manual CREATE EXTENSION step is required. The upgrade test (age_upgrade)
is excluded because it stages synthetic extension files into the local
sharedir that a running server would not see.

Readability and maintainability improvements (no behavior change):
- Derive age_sql from AGE_CURR_VER (read from age.control) so the
  version number is defined in exactly one place.
- Add section banners and a top-of-file layout/target index; group the
  scattered upgrade-test pieces and move the ag_scanner flex rule in
  with the other parser-generation rules.
- Wrap the long REGRESS_OPTS and EXTRA_CLEAN assignments across lines.
- Fix the DATA filter-out pattern to use a double dash
  (age--%--y.y.y.sql) matching the actual template filename; the prior
  single-dash pattern only matched via greedy '%' expansion.
- Anchor the age.control version regex (/^default_version/).
- Replace the hardcoded "31 tests" comment with generic wording and add
  a "help" target listing the common targets.

Verified on PostgreSQL 18: make installcheck (temp instance) and
make installcheck-existing both pass; clean rebuild and make clean
unaffected.

Co-authored-by: GitHub Copilot <noreply@github.com>

modified:   Makefile

* Make ag_catalog ownership and built-in resolution explicit (#2440)

AGE places all of its objects in the ag_catalog schema. Make the
assumptions around that schema explicit so installs and upgrades behave
predictably regardless of how a database is provisioned:

- Ownership-checked install: CREATE EXTENSION age installs into
  ag_catalog only when that schema does not already exist under a
  different owner, keeping ownership of AGE's catalog well-defined.
- Deterministic name resolution: the pg_upgrade helper functions resolve
  built-ins from pg_catalog first and schema-qualify their
  format()/hashtext() calls, so their behavior does not depend on what
  else is defined in ag_catalog.
- README note describing ag_catalog ownership and the install-time check.

The upgrade script applies the same helper changes so existing
installations get them on ALTER EXTENSION UPDATE. Adds an
extension_security regression test covering the ownership check and the
qualified-call / search_path properties.

Assisted-by: GitHub Copilot (Claude Opus 4.8)

modified:   Makefile
modified:   README.md
modified:   age--1.7.0--y.y.y.sql
new file:   regress/expected/extension_security.out
new file:   regress/sql/extension_security.sql
modified:   sql/age_main.sql
modified:   sql/age_pg_upgrade.sql

Resolved Conflicts: Makefile

* cypher_with: add ORDER BY to non-deterministic RETURN queries (#2436)

Several cypher_with regression queries RETURN multiple rows without an
ORDER BY, so their row order depends on heap/scan order and can vary
between runs, build types, and platforms. Add ORDER BY ASC to those
queries so the expected output is stable. Ordering keys use id() (a
single int64 that bypasses the locale-sensitive string comparison path
and is reproducible from the test's deterministic setup order), or the
projected path/scalar where that is what the query returns. Where the
underlying vertex/edge was dropped by a WITH projection, its id is
threaded through as an alias rather than reordering the projection.

Full audit of cypher_with: all 23 multi-row result blocks were checked.
After this change, every multi-row, non-EXPLAIN RETURN is deterministically
ordered. The two remaining unordered multi-row blocks are left as-is:
- "RETURN lbl" returns two identical "Person" rows, so order cannot drift;
- the 13 EXPLAIN (VERBOSE, COSTS OFF) plan blocks emit a fixed serial plan
  (no parallel/gather nodes), so their row order is already deterministic.

This is a test-only change (regress/sql/cypher_with.sql and
regress/expected/cypher_with.out); no extension C code or SQL is modified.
Row counts are unchanged (pure reordering).

All 37 regression tests pass (installcheck) on PostgreSQL 18.3.

Co-authored-by: GitHub Copilot <noreply@github.com>

modified:   regress/expected/cypher_with.out
modified:   regress/sql/cypher_with.sql

* Add shortest_path / all_shortest_paths SRFs (#2430)

Add two C set-returning functions that compute unweighted (hop-count)
shortest paths over the cached global graph adjacency via BFS, callable
both at the SQL top level and inside a cypher() RETURN:

  - age_shortest_path(...)       -> the single shortest path (0 or 1 rows)
  - age_all_shortest_paths(...)  -> every shortest path, one per row

The signature follows the natural Cypher argument order
(graph, start, end, edge_types, direction, min_hops, max_hops), registered
in sql/agtype_typecast.sql (install) and age--1.7.0--y.y.y.sql (upgrade).
Unimplemented parameters fail loudly: multiple relationship types and a
non-zero min_hops raise ERRCODE_FEATURE_NOT_SUPPORTED. A single edge type
(string or one-element array) is honored, and a NULL endpoint yields no
rows per Cypher null semantics (wrong-typed endpoints / NULL graph still
error).

To call the SRFs inside a cypher() RETURN, transform_cypher_return now sets
query->hasTargetSRFs (it was the only results-producing clause that didn't,
so the planner never added a ProjectSet node), and transform_FuncCall
auto-prepends the graph name for snake_case shortest_path /
all_shortest_paths. camelCase names are reserved for the future native
grammar.

Robustness:
  - BFS guards against non-existent endpoints (returns 0 rows instead of
    crashing) and honors CHECK_FOR_INTERRUPTS.
  - An unknown edge label now matches no edges instead of silently
    traversing all of them (get_label_relation returns InvalidOid).

Adds the age_shortest_path regression test (directed/undirected, label
filtering, parallel edges, self-loops, max_hops, the not-supported stubs,
NULL and non-existent endpoint/graph guards).

38/38 installcheck pass.

Co-authored-by: Copilot <copilot@github.com>

modified:   Makefile
modified:   age--1.7.0--y.y.y.sql
modified:   sql/agtype_typecast.sql
modified:   src/backend/parser/cypher_clause.c
modified:   src/backend/parser/cypher_expr.c
modified:   src/backend/utils/adt/age_vle.c
new file:   regress/expected/age_shortest_path.out
new file:   regress/sql/age_shortest_path.sql

* Support pattern expressions as boolean expressions (#2360)

* Support pattern expressions in WHERE clause via GLR parser (issue #1577)

Enable bare graph patterns as boolean expressions in WHERE clauses:

  MATCH (a:Person), (b:Person)
  WHERE (a)-[:KNOWS]->(b)        -- now valid, equivalent to EXISTS(...)
  RETURN a.name, b.name

Previously, this required wrapping in EXISTS():
  WHERE EXISTS((a)-[:KNOWS]->(b))

The bare pattern syntax is standard openCypher and is used extensively
in Neo4j.  Its absence was the most frequently cited migration blocker.

Implementation approach:
- Switch the Cypher parser from LALR(1) to Bison GLR mode.  GLR handles
  the inherent ambiguity between parenthesized expressions '(' expr ')'
  and graph path nodes '(' var_name label_opt props ')' by forking the
  parse stack and discarding the failing path.
- Add anonymous_path as an expr_atom alternative with %dprec 1 (lower
  priority than expression path at %dprec 2).  The action wraps the
  pattern in a cypher_sub_pattern + EXISTS SubLink, reusing the same
  transform_cypher_sub_pattern() machinery as explicit EXISTS().
- Extract make_exists_pattern_sublink() helper shared by both
  EXISTS(pattern) and bare pattern rules.
- Fix YYLLOC_DEFAULT to use YYRHSLOC() for GLR compatibility.
- %dprec annotations on expr_var/var_name_opt resolve the reduce/reduce
  conflict between expression variables and pattern node variables.

Conflict budget: 7 shift/reduce (path extension vs arithmetic on -/<),
3 reduce/reduce (expr_var vs var_name_opt on )/}/=).  All are expected
and handled correctly by GLR forking + %dprec disambiguation.

All 32 regression tests pass (31 existing + 1 new).  New
pattern_expression test covers: bare patterns, NOT patterns, labeled
nodes, AND/OR combinations, left-directed patterns, anonymous nodes,
multi-hop patterns, EXISTS() backward compatibility, and non-pattern
expression regression checks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Address Copilot review: comment placement, %expect docs, test wording

1. Move "Helper function to create an ExplainStmt node" comment from
   above make_exists_pattern_sublink() to above make_explain_stmt()
   where it belongs.

2. Add block comment documenting the %expect/%expect-rr conflict
   budget: 7 S/R from path vs arithmetic on - and <, 3 R/R from
   expr_var vs var_name_opt on ) } =.

3. Clarify test comment: "Regular expressions" -> "Regular (non-pattern)
   expressions" to avoid confusion with regex.

Regression test: pattern_expression OK.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Address Copilot round 3: broaden scope, remove %expect fragility

- Pattern expressions are now accepted anywhere an expr is valid
  (RETURN, WITH, SET, CASE, boolean combinations), not only WHERE.
  This matches openCypher semantics and documents the broader surface
  area that was already implicitly enabled by adding anonymous_path
  to expr_atom.  Added regression tests for each new context:
  RETURN projection (bare and AS-aliased), mixed with other
  projections, CASE WHEN, boolean AND/OR combinators, SET to
  persist a computed boolean property, and WITH ... WHERE pipeline.

- Remove the hardcoded `%expect 7` / `%expect-rr 3` conflict budget
  from cypher_gram.y.  The exact conflict counts can drift across
  Bison versions and distros, which would break builds even though
  the grammar is correct (GLR handles the conflicts at runtime via
  fork + %dprec).  Instead, pass -Wno-conflicts-sr / -Wno-conflicts-rr
  via BISONFLAGS in the Makefile so the build stays clean without
  binding us to a specific Bison release.  Kept a block comment in
  the grammar explaining why GLR conflicts are expected and how
  they resolve.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Address jrgemignani review: keep -Werror, restore %expect budget

Reverts the broad `-Werror` drop and the no-%expect approach from the
prior round on jrgemignani's request.  The earlier framing — that
conflict counts drift across Bison versions, so %expect is fragile —
overcorrected: it removed the only build-time alarm bell for unintended
new conflicts.

Makefile: keep -Werror so any unexpected Bison warning (unused rules,
undeclared types, etc.) still fails the build; downgrade only the two
conflict categories to plain warnings via -Wno-error=conflicts-sr
-Wno-error=conflicts-rr.  pgxs auto-adds -Wno-deprecated, so existing
%name-prefix= / %pure-parser directives remain non-erroring.

cypher_gram.y: add `%expect 7` and `%expect-rr 3` matching the
Bison 3.8.2 totals.  Bison treats %expect as exact-match, not as a
ceiling — any deviation fails the build and forces an audit of the new
conflicts.  Comment updated to reflect that future Bison versions
reporting different counts should bump the numbers explicitly with a
version note in the commit message, rather than removing the directive.

No grammar or runtime change.  Cassert installcheck 34/34 AGE tests
green.

* Add follow-up regression coverage for pattern expressions (#2360)

Addresses the non-blocking test-coverage follow-ups from the review:
pattern expressions in additional contexts opened up by allowing
anonymous_path as an expr_atom.

New cases (all verified against a PG18 build):
- Single-node pattern on a bound variable (a:Label). Documented as an
  EXISTS existence check, NOT an openCypher label predicate: a matching
  label is always true, and a non-matching label hits AGE's pre-existing
  "multiple labels for variable" restriction (captured as expected error).
- Pattern expressions inside list and map literals.
- Pattern expressions as function arguments: collect() shows correct
  per-row booleans; count() counts all rows (non-null bool) -- documented
  so the value is not mistaken for a bug.
- Pattern expression in OPTIONAL MATCH ... WHERE (null-preserving).
- EXISTS() and a bare pattern combined in one predicate.

make installcheck: 33/33 green.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add reduce() list folding function (#2435)

Implement the openCypher reduce(acc = init, var IN list | body) expression,
which folds an arbitrary expression over a list, threading an accumulator
across the elements in list order. This closes a long-standing gap (reduce()
was previously unsupported) and works both at the SQL top level and inside a
cypher() RETURN/WHERE.

Implementation
--------------
reduce() is desugared, at transform time, into a correlated scalar subquery
over a new ordered aggregate rather than a new executor node, so no PostgreSQL
core changes are required:

    CASE WHEN list IS NULL THEN NULL
         ELSE COALESCE((SELECT ag_catalog.age_reduce(
                            <init>, '<serialized-body>'::text,
                            r.elem ORDER BY r.ord)
                        FROM unnest(<list>) WITH ORDINALITY AS r(elem, ord)),
                       <init>)
    END

  - A new cypher_reduce extensible node carries the accumulator/element names
    and the init/list/body expressions (grammar production, keyword, and the
    copy/out/read serialization plumbing).
  - The fold body is transformed against a throwaway two-column agtype
    namespace, its accumulator and element Var references are rewritten to
    PARAM_EXEC params 0 and 1, and it is serialized with nodeToString() into a
    text argument.
  - age_reduce_transfn (a custom agtype aggregate transition function)
    deserializes and compiles the body once per group with ExecInitExpr, then
    evaluates it per element with ExecEvalExpr, rebinding the two params. The
    body is normalized to agtype at transform time so a boolean or other
    non-agtype result cannot be misread as a by-reference Datum.

Semantics
---------
  - List order is preserved (unnest WITH ORDINALITY + aggregate ORDER BY).
  - An empty list yields the initial value; a NULL list yields NULL.
  - The list and initial value may reference outer-query variables (e.g.
    reduce(total = 0, n IN nodes(p) | total + n.age)); the body may reference
    only the accumulator and element.
  - Arithmetic, string, list-building, boolean/comparison (AND/OR/=/>), CASE,
    and element property-access bodies are all supported.
  - Outer-variable, query-parameter, nested-reduce, and aggregate references
    inside the body raise a clean ERRCODE_FEATURE_NOT_SUPPORTED error.
  - reduce is registered as a safe keyword so it remains usable as a property
    or map key, preserving backward compatibility.

Tests
-----
Adds the age_reduce regression test (registered in the install SQL and the
upgrade template so age_upgrade passes), covering: arithmetic/product/string
folds; order sensitivity; empty/NULL list; NULL element and NULL init;
list-building and CASE bodies; boolean and comparison bodies; element property
access; multiple and nested (in list/init) reduce(); reduce() in boolean
expressions, WHERE, and list comprehensions; folds over collected nodes and
node list properties; the not-supported rejections; and reduce as a map key.

Following reviewer feedback, three further semantics-coverage gaps are
pinned directly so the mechanisms that make the aggregate desugaring
correct are exercised by tests rather than only correct by inspection:
  - A fold body that produces null mid-fold and then recovers: the
    agtype 'null' running state is a readable value, so a later element
    folds back out of it (distinct from "null propagates to a null
    result", which was already covered).
  - An empty list with a NULL initial value: COALESCE(<no rows>, init)
    yields NULL, kept distinct from a body that legitimately folds to
    agtype 'null', which must not be resurrected to the initial value.
  - A type error and a runtime division-by-zero error in the body: both
    abort cleanly out of the standalone per-element evaluator rather
    than corrupting the running aggregate state.

All multi-row results are ordered. 42/42 installcheck pass.

Future work
-----------
The body restriction (accumulator and element only) is a property of the
standalone expression evaluation and can be relaxed without core changes:
  - Allow loop-invariant outer-variable and cypher $parameter references in
    the body by capturing them as additional eager aggregate arguments bound
    to extra param slots.
  - Support a nested reduce() inside the body via an SPI-based evaluation
    fallback for subquery-bearing bodies.
Aggregates inside the body remain intentionally unsupported, matching the
openCypher specification.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

modified:   Makefile
modified:   age--1.7.0--y.y.y.sql
new file:   regress/expected/age_reduce.out
new file:   regress/sql/age_reduce.sql
modified:   sql/age_aggregate.sql
modified:   src/backend/nodes/ag_nodes.c
modified:   src/backend/nodes/cypher_copyfuncs.c
modified:   src/backend/nodes/cypher_outfuncs.c
modified:   src/backend/nodes/cypher_readfuncs.c
modified:   src/backend/parser/cypher_analyze.c
modified:   src/backend/parser/cypher_clause.c
modified:   src/backend/parser/cypher_gram.y
modified:   src/backend/utils/adt/agtype.c
modified:   src/include/nodes/ag_nodes.h
modified:   src/include/nodes/cypher_copyfuncs.h
modified:   src/include/nodes/cypher_nodes.h
modified:   src/include/nodes/cypher_outfuncs.h
modified:   src/include/nodes/cypher_readfuncs.h
modified:   src/include/parser/cypher_kwlist.h

* resolve subgraph staging sequences via regclass (#2446)

The vertex/edge staging copies in create_subgraph() generated new
graphids with nextval(%L), which binds the sequence as a string literal
and invokes the nextval(text) overload. That re-resolves the
schema-qualified sequence name on each call.

Cast the literal to regclass (nextval(%L::regclass)) so the sequence is
resolved once to its OID, matching how AGE defines its label id defaults
(nextval('schema.seq'::regclass)). Applied to both the vertex and edge
staging queries, in sql/age_subgraph.sql and the identical body in the
age--1.7.0--y.y.y.sql upgrade template so the upgrade-path catalog
comparison still matches.

Behavior is unchanged; all 38 regression tests pass against PostgreSQL 18.

Addresses Copilot review feedback on #2441.

Co-authored-by: GitHub Copilot (Claude Opus 4.8) <[email protected]>

modified:   age--1.7.0--y.y.y.sql
modified:   sql/age_subgraph.sql

* Support relationship-type filters and a minimum hop count (#2442)

Support relationship-type filters and a minimum hop count in shortest_path SRFs

age_shortest_path / age_all_shortest_paths gain two related capabilities,
both following openCypher / Neo4j semantics.

Relationship-type filtering: the edge_types argument now accepts an array
of types; an edge matches when its label is any one of the requested
types. A bare string or a one-element array keeps the single-type
behaviour, an empty string/array or NULL means no filter, and an unknown
type matches nothing. sp_run_bfs takes an Oid set rather than a single
oid, and sp_compute_paths resolves the argument into that set.

Minimum hop count: the new min_hops argument is a lower bound on the path
length. When it does not exceed the true shortest distance it imposes no
constraint, so the normal BFS shortest-path result is returned. When it
exceeds the shortest distance, BFS cannot produce a qualifying path, so
the search falls back to the variable-length-edge depth-first engine
(sp_minhops_fallback), which enumerates edge-distinct paths
(relationship-uniqueness / trail semantics) and returns the shortest
path(s) whose length is at least min_hops. This regime permits revisiting
a vertex and closed walks back to the start, but never reusing an edge. A
private memory context bounds the search and a cost guard caps the number
of examined paths, raising PROGRAM_LIMIT_EXCEEDED (with a hint to bound the
search with a maximum hop count) when the cap is exceeded. The hard regime
combined with multiple relationship types is unsupported, because the VLE
engine matches a single label; that case raises FEATURE_NOT_SUPPORTED.

Regression coverage spans single- and multi-type filters, directed and
undirected reachability, multiplicity of equal-length paths, max_hops
bounds, NULL and non-existent endpoints, and both min_hops regimes,
including a vertex-revisiting longer path (sp_revisit) and a closed-walk
cycle back to the start (sp_tri). The in-cypher() Tier 1 call forms are
exercised as well.

Review feedback addressed:

1. Error messages now report the function actually called. age_shortest_path
   and age_all_shortest_paths share their argument-resolution helpers, which
   hard-coded an "age_shortest_path" prefix regardless of the caller; the
   caller's name is now threaded through so each function reports its own
   (this also corrects a mislabeled multi-type min_hops error). A new
   regression case (sp_errname) pins the behaviour for both functions.

2. age_all_shortest_paths now bounds the number of materialized result paths.
   The shortest-path DAG can contain exponentially many equal-length paths,
   all built up front before the first row streams; enumeration is capped at
   SP_MAX_RESULT_PATHS (1,000,000), raising PROGRAM_LIMIT_EXCEEDED with a hint
   to narrow the search, mirroring the existing min-hops candidate cap.

3. The BFS search state (visited table, frontier queue, predecessor multiset,
   and intermediate path arrays) now lives in a private scratch memory context
   that is deleted once the surviving result Datums are built in the SRF
   context, rather than persisting in multi_call_memory_ctx for the life of
   the SRF. This bounds peak memory to the result set plus one search and
   matches the pattern sp_minhops_fallback already used.

4. A second review round tightened memory hygiene and reporting: the
   pnstrdup'd relationship-type name is freed once resolved to an oid (it was
   retained for the life of the SRF) in both the array and scalar cases; the
   invalid-direction error now carries the called function's name like the
   other argument errors; the min-hops fallback's private context is renamed
   to a caller-neutral "age shortest path minhops" (it is shared by both
   SRFs); and the multi-type label-filter comment is corrected to note that an
   unknown type merely contributes no matches -- known types in the same set
   still match, and only an all-unknown set leaves just the zero-length path.

41/41 installcheck.

Co-authored-by: Copilot <copilot@github.com>

modified:   regress/expected/age_shortest_path.out
modified:   regress/sql/age_shortest_path.sql
modified:   src/backend/utils/adt/age_vle.c

* Fix single-node labeled pattern expressions not filtering by label (#2443) (#2444)

A single-node labeled pattern used as a boolean expression -- e.g.
`WHERE (a:Person)`, `WHERE EXISTS((a:Person))` -- was accepted but did not
test the bound vertex's label. It desugars to an EXISTS sub-pattern, and
make_path_join_quals() returned early for vertex-only patterns
(list_length(entities) < 3), emitting no quals. With no edge to carry a
correlation, the sub-pattern referenced nothing from the enclosing query,
so the planner produced an uncorrelated one-time InitPlan that was trivially
true whenever any vertex of that label existed -- the predicate matched every
outer row.

Emit an explicit label-id filter for a vertex-only pattern whose vertex
carries a non-default label and whose variable is declared in an ancestor
parse state (i.e. a correlated reference). make_qual() builds a name-based id
reference that resolves to the outer variable, so the filter both correlates
the sub-pattern to that variable and enforces the label. Freshly scanned,
non-correlated vertices (no ancestor binding) are untouched, so plain
MATCH (a:Person) and "does any X exist" EXISTS checks are unaffected.

Add regression coverage in pattern_expression: WHERE (a:Person),
WHERE NOT (a:Person), and EXISTS((a:Company)) against a graph with a
non-Person vertex. All 41 regression tests pass.

* Preserve null-valued keys in map literals (#2391) (#2412)

* Preserve null-valued keys in map literals (#2391)

Map literals such as RETURN {a: null} previously dropped keys whose
values were null, producing {} instead of {"a": null}. This
diverged from the openCypher / Neo4j semantics where map literals
preserve every key the user wrote, including those bound to null.

Root cause: cypher_map.keep_null defaulted to false (zero-initialised),
so the grammar-produced node fed agtype_build_map_nonull, which strips
null entries. Call sites that legitimately need strip-null semantics
(CREATE node/edge property maps and SET = assignments) already set
keep_null=false explicitly, and the MATCH pattern path sets it to true
explicitly. Flipping the grammar default to true therefore only affects
the cases that were buggy (bare map expressions and nested map values),
and leaves CREATE/SET behaviour unchanged.

Two preexisting tests encoded the old buggy output and are updated:
expr.out (bare RETURN maps now keep the null value) and agtype.out
(a nested map inside an orderability test no longer drops its null
entry, shifting one row in the ORDER BY result). Dedicated regression
coverage for #2391 is added to regress/sql/expr.sql.

* Address review: move 'End of tests' marker and drop order claim

- Move 'End of tests' to after the Issue 2391 test block so it marks
  the actual end of the file.
- Remove 'in order' from the mixed-values comment since map key
  ordering is not guaranteed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Update accessor EXPLAIN expected output for null-preserving map literals

Map literals now build with agtype_build_map (keep_null) instead of
agtype_build_map_nonull, so the accessor-optimization EXPLAIN plan in
expr.out must reflect the new function name. Also strip a stray trailing
CRLF on the last line of expr.sql/expr.out that leaked a carriage return
into the regression output.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add nested-map write coverage and clarify top-level strip wording

A reviewer noted the keep_null=true default reaches further than the
PR summary stated: CREATE / SET = only override keep_null=false on the
top-level property map, not recursively. A nested map value is its own
cypher_map node, so it now inherits the new default and preserves its
null-valued keys (e.g. CREATE (n {a: {b: null}}) stores a -> {"b": null}).

- regress/sql/expr.sql, regress/expected/expr.out: pin the nested case
  under a write with CREATE (n:Nested {a: {b: null}}), MATCH ... SET
  n = {a: {b: null}}, and a MATCH verify.
- src/backend/parser/cypher_gram.y: clarify the map: rule comment to
  state the CREATE / SET override is top-level only and nested map
  values keep the null-preserving default.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix Node.js driver CI build broken by @types/node drift (#2452)

The Node.js driver CI (npm install -> npm run build -> tsc) failed with
parser errors in node_modules/@types/node/ffi.d.ts (TS1139/TS1005/TS1109/
TS1128). package-lock.json is gitignored, so CI resolves dependencies
purely from package.json. @types/node was only pulled transitively via a
wildcard range (@types/pg and jest depend on @types/node@*), so a fresh
install grabbed the latest (26.x). That version uses `const` type
parameters (a TypeScript 5.0 feature) in ffi.d.ts, which typescript@4.9
cannot parse. skipLibCheck does not suppress these parser-level errors.

The runtime Node version is unrelated: @types/node is resolved from the
npm dependency graph, not the Node.js runtime.

Fix:
- Add a bounded direct devDependency "@types/node": "^20.19.0" so a fresh
  install constrains the typings to the Node 20 LTS line, which is
  compatible with typescript@4.9 and keeps the toolchain consistent
  (eslint 7 / typescript-eslint 4 / TS 4.9 / Node 20 typings).
- Pin CI to Node 20 (setup-node@v4, node-version: 20) for reproducibility
  and to match the pinned typings; replaces the deprecated setup-node@v3
  and floating node-version: latest.

Verified: a clean, no-lockfile install (matching CI) now resolves
@types/node@20.19.43 and tsc builds successfully.

Co-authored-by: Copilot <copilot@github.com>

modified:   .github/workflows/nodejs-driver.yaml
modified:   drivers/nodejs/package.json

* Fix stack overflow and precision loss in toFloatList() conversion (#2451)

toFloatList()'s AGTV_FLOAT branch formatted each element with
sprintf(buffer, "%f", ...) into a fixed 64-byte stack buffer and then
re-parsed the string back into a float. This had two defects:

  1. Stack overflow. "%f" prints the full integer part with no width
     limit, so a large magnitude overflows the 64-byte buffer. The value
     is query-reachable: RETURN toFloatList([1.0e308]) needs ~317 bytes
     (309 integer digits + ".000000") and smashes the stack. This is the
     issue reported in #2410.

  2. Precision loss. "%f" emits only 6 fractional digits, so the
     format-and-reparse round trip was lossy -- toFloatList([0.123456789])
     returned 0.123457.

The element is already a float8, so the whole format/reparse step is
unnecessary. Assign elem->val.float_value directly. This removes the
stack buffer entirely (no magic buffer size to justify) and fixes both
the overflow and the precision loss at once.

Also harden toStringList(): its "%.*g"/"%ld" conversions use bounded
formats and were never overflow-prone, but switch them from sprintf to
snprintf as defensive depth.

Add regression coverage to regress/sql/expr.sql for both the large
magnitude case (no overflow) and precision preservation.

This reimplements the fix originally proposed by David Christensen in
#2410, whose report identified the sprintf overflow.

Co-authored-by: David Christensen <david.christensen@snowflake.com>

* Support outer references in reduce() fold bodies (#2448)

Allow a reduce(acc = init, var IN list | body) fold body to reference
loop-invariant values from the enclosing query -- outer-query variables and
cypher() parameters -- in addition to the accumulator and element. These were
previously rejected with ERRCODE_FEATURE_NOT_SUPPORTED.

How it works
------------
The fold body is still compiled to a standalone expression evaluated by
age_reduce_transfn, so an outer reference (which cannot be evaluated there)
is captured at transform time and supplied as a value:

  - After the accumulator and element are rewritten to PARAM_EXEC params 0 and
    1, transform_cypher_reduce() walks the body and replaces each loop-invariant
    outer reference -- one that references an outer Var or a cypher() $parameter
    but not the accumulator/element -- with a new PARAM_EXEC param 2, 3, ... in
    body order. Capture is at leaf granularity: only the bare outer value is
    hoisted out of the body, while the operators, function calls and
    CASE/AND/OR/coalesce branches around it stay in the serialized body.
    Because a captured value becomes an aggregate argument that the executor
    evaluates eagerly and unconditionally, hoisting a whole computed subtree
    (for example "1/z" under a never-taken CASE branch) would defeat the fold's
    short-circuiting; capturing only the leaf keeps evaluation under the body's
    own control flow. The one exception is an outer reference that is not itself
    agtype-typed -- most commonly the graphid inside a graph vertex/edge
    variable -- whose smallest enclosing agtype-typed subtree is captured whole,
    since it cannot stand alone as an agtype[] extra.
  - The captured expressions are passed to the aggregate as a trailing
    agtype[] argument; age_reduce(agtype, text, agtype, agtype[]) and its
    transition function gain this argument.
  - age_reduce_transfn sizes its param array to 2 + the number of captures and
    binds the captured values to params 2.. on every row. Because the captures
    are evaluated in the outer query context as ordinary aggregate arguments, a
    correlated capture is re-evaluated per group, so an outer value that varies
    per row (for example under UNWIND) is folded with the correct value.
    Each capture slot is rebound on every row, and the trailing extras
    argument is read only when the aggregate actually passes it (PG_NARGS),
    keeping the transition safe under direct age_reduce() SQL calls and an
    older 4-argument signature.

This keeps the no-core-patch design: the body is still a serialized standalone
expression, and the only new machinery is the captured-value plumbing.

Still rejected
--------------
Subqueries in the body (including a nested reduce()) and aggregate functions
remain unsupported and raise a clean ERRCODE_FEATURE_NOT_SUPPORTED error: a
subquery cannot be planned as a plain aggregate argument, and an aggregate in a
per-element fold is undefined per the openCypher specification.

Tests
-----
age_reduce gains an "Outer references in the fold body" section covering a
plain outer variable, an outer variable used as a multiplier, two distinct
outer variables, a property of an outer graph variable, the same outer variable
referenced more than once, a property of an outer map, a subexpression that
mixes an outer reference with the element (only the loop-invariant part is
captured), an outer reference inside a CASE branch of the body, a NULL outer
value propagating through the fold, multiple captures mixing a NULL and a
non-NULL outer value, an outer variable that changes per row (captured per
group), and a cypher() parameter supplied via a prepared statement.

A "Short-circuit evaluation is preserved for outer references in the body"
section verifies that a guarded outer sub-expression is not evaluated on a
branch that is not taken: a never-taken CASE THEN branch, a never-taken CASE
ELSE branch, an OR and an AND that short-circuit, and a coalesce -- each of
which would divide by zero if the outer "1/w" were hoisted into an eagerly
evaluated aggregate argument -- plus a guarded branch that is taken and
evaluates its outer division normally.

The previously-rejected outer-variable case is moved out of the not-supported
section, which now covers a nested reduce() (any subquery in the body is
unsupported) and an aggregate in the body.

The same change also broadens the base reduce() coverage with value-type folds
(a float accumulator, negative numbers, a map accumulator passed through
unchanged, and list elements indexed in the body), function calls in the fold
body (a scalar function over the element and the list itself produced by a
function), reduce() composed with surrounding expressions (consumed by another
function and used in a comparison), and syntax-error checks for each required
piece of the form -- the "= init", ", var IN list", and "| body" clauses, plus
a rejected qualified iterator variable. 42/42 installcheck pass.

Co-authored-by: Copilot <copilot@github.com>

modified:   age--1.7.0--y.y.y.sql
modified:   regress/expected/age_reduce.out
modified:   regress/sql/age_reduce.sql
modified:   sql/age_aggregate.sql
modified:   src/backend/parser/cypher_clause.c
modified:   src/backend/utils/adt/agtype.c

* Fix segfault and out-of-bounds reads in file loaders on malformed rows (#2453)

* Fix segfault and out-of-bounds reads in file loaders on malformed rows

load_edges_from_file() and load_labels_from_file() build their COPY
parser with only format=csv and header=false, so COPY uses its default
comma delimiter. A file delimited by anything else (or a malformed row)
then parses with an unexpected column count, and the loaders indexed the
parsed fields without validating that count:

- process_edge_row() reads the four fixed fields fields[0..3]
  unconditionally. A non-comma-delimited edge file parses as a single
  column, so fields[1..3] are out of bounds -> segfault (issue #2449).
- create_agtype_from_list()/_i() pair header[i] with fields[i] for all
  i < nfields, so a row with more fields than the header reads header[i]
  out of bounds.

Add bounds validation that turns these into clear errors:

- Edge header must have >= 4 columns; a smaller count almost always
  means the wrong delimiter, so the error carries a hint.
- Each edge row must have >= 4 columns and no more than the header's.
- Each label row must have no more than the header's column count.

Rows with fewer trailing columns than the header remain allowed, matching
existing behavior (exercised by the existing conversion tests).

This closes the segfault and out-of-bounds reads. The silent mis-parsing
of a non-comma file whose header and rows share the same (wrong) column
count is not detectable here; adding a delimiter option to the load
functions is a separate follow-up.

Adds a regression test in age_load using a pipe-delimited edge file.

Addresses #2449.

* loader guards: clarify error wording and add per-row regression coverage

Address review feedback on the nfields guards:

- Error messages now say "the header's %d columns" (was "the header's %d"),
  making the count's unit explicit.
- Add regression cases exercising the per-row guards, which previously only
  had coverage for the mis-delimited-header path:
    * an edge row with fewer than 4 columns
    * an edge row with more columns than the header
    * a label row with more columns than the header
  Each asserts a clean ERROR (these were the out-of-bounds reads the guards
  now catch).

* ci: pin Build/Regression runner to ubuntu-24.04 and guard Bison version for GLR grammar (#2445)

* ci: pin runner to ubuntu-24.04 + guard Bison version for GLR grammar

The Cypher GLR grammar pins exact shift/reduce and reduce/reduce conflict
counts via %expect / %expect-rr in cypher_gram.y, and Bison treats %expect
as exact-match: a different Bison version can report different counts and
break the build. ubuntu-latest floats to new Ubuntu releases (and new Bison
versions), which would silently shift those counts.

Pin runs-on to ubuntu-24.04 to freeze Bison at 3.8.x, and add a guard step
that fails loudly with a pointer to cypher_gram.y if Bison ever drifts off
3.8.x. Reproducibility comes from pinning the variable rather than widening
the conflict-count tolerance, keeping the exact-match alarm for genuinely
new grammar conflicts intact.

* ci: make Bison version parse robust (awk + explicit empty guard)

Address Copilot review on #2445: the previous grep-based parse could
silently yield an empty version and fail with a confusing
'Bison  != 3.8.x' message. Parse the version field with awk and error
explicitly when it can't be determined.

Resolved conflict: .github/workflows/installcheck.yaml

* Add automatic header-dependency tracking to the Makefile (#2454)

AGE lists OBJS explicitly and relies on PGXS, whose built-in dependency
tracking only runs when the server was built with --enable-depend (often
disabled). Consequently, editing a header did not recompile the .c files
that include it, leaving stale .o files. This is especially dangerous for
node/struct headers: a stale ag_nodes.o keeps an outdated node_size, so
_readExtensibleNode under-allocates and readNode corrupts the heap
("unrecognized node type").

Emit a .d file beside each object via -MMD -MP and -include them, deriving
DEPFILES from OBJS. The mechanism is self-contained (independent of
--enable-depend): -MMD skips system headers and -MP tolerates deleted
headers. On servers built with --enable-depend, PGXS appends its own -MF
after CFLAGS (last -MF wins), so this degrades cleanly to PGXS's tracking.
Add DEPFILES to EXTRA_CLEAN and *.d to .gitignore.

Co-authored-by: Copilot <copilot@github.com>

modified:   .gitignore
modified:   Makefile

---------

Co-authored-by: serdarmumcu <serdar.mumcu@udemy.com>
Co-authored-by: Greg Felice <gregfelice@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Prashant Chinnam <5108573+crprashant@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: David Christensen <david.christensen@snowflake.com>
jrgemignani added a commit that referenced this pull request Jul 2, 2026
Implement the openCypher reduce(acc = init, var IN list | body) expression,
which folds an arbitrary expression over a list, threading an accumulator
across the elements in list order. This closes a long-standing gap (reduce()
was previously unsupported) and works both at the SQL top level and inside a
cypher() RETURN/WHERE.

Implementation
--------------
reduce() is desugared, at transform time, into a correlated scalar subquery
over a new ordered aggregate rather than a new executor node, so no PostgreSQL
core changes are required:

    CASE WHEN list IS NULL THEN NULL
         ELSE COALESCE((SELECT ag_catalog.age_reduce(
                            <init>, '<serialized-body>'::text,
                            r.elem ORDER BY r.ord)
                        FROM unnest(<list>) WITH ORDINALITY AS r(elem, ord)),
                       <init>)
    END

  - A new cypher_reduce extensible node carries the accumulator/element names
    and the init/list/body expressions (grammar production, keyword, and the
    copy/out/read serialization plumbing).
  - The fold body is transformed against a throwaway two-column agtype
    namespace, its accumulator and element Var references are rewritten to
    PARAM_EXEC params 0 and 1, and it is serialized with nodeToString() into a
    text argument.
  - age_reduce_transfn (a custom agtype aggregate transition function)
    deserializes and compiles the body once per group with ExecInitExpr, then
    evaluates it per element with ExecEvalExpr, rebinding the two params. The
    body is normalized to agtype at transform time so a boolean or other
    non-agtype result cannot be misread as a by-reference Datum.

Semantics
---------
  - List order is preserved (unnest WITH ORDINALITY + aggregate ORDER BY).
  - An empty list yields the initial value; a NULL list yields NULL.
  - The list and initial value may reference outer-query variables (e.g.
    reduce(total = 0, n IN nodes(p) | total + n.age)); the body may reference
    only the accumulator and element.
  - Arithmetic, string, list-building, boolean/comparison (AND/OR/=/>), CASE,
    and element property-access bodies are all supported.
  - Outer-variable, query-parameter, nested-reduce, and aggregate references
    inside the body raise a clean ERRCODE_FEATURE_NOT_SUPPORTED error.
  - reduce is registered as a safe keyword so it remains usable as a property
    or map key, preserving backward compatibility.

Tests
-----
Adds the age_reduce regression test (registered in the install SQL and the
upgrade template so age_upgrade passes), covering: arithmetic/product/string
folds; order sensitivity; empty/NULL list; NULL element and NULL init;
list-building and CASE bodies; boolean and comparison bodies; element property
access; multiple and nested (in list/init) reduce(); reduce() in boolean
expressions, WHERE, and list comprehensions; folds over collected nodes and
node list properties; the not-supported rejections; and reduce as a map key.

Following reviewer feedback, three further semantics-coverage gaps are
pinned directly so the mechanisms that make the aggregate desugaring
correct are exercised by tests rather than only correct by inspection:
  - A fold body that produces null mid-fold and then recovers: the
    agtype 'null' running state is a readable value, so a later element
    folds back out of it (distinct from "null propagates to a null
    result", which was already covered).
  - An empty list with a NULL initial value: COALESCE(<no rows>, init)
    yields NULL, kept distinct from a body that legitimately folds to
    agtype 'null', which must not be resurrected to the initial value.
  - A type error and a runtime division-by-zero error in the body: both
    abort cleanly out of the standalone per-element evaluator rather
    than corrupting the running aggregate state.

All multi-row results are ordered. 42/42 installcheck pass.

Future work
-----------
The body restriction (accumulator and element only) is a property of the
standalone expression evaluation and can be relaxed without core changes:
  - Allow loop-invariant outer-variable and cypher $parameter references in
    the body by capturing them as additional eager aggregate arguments bound
    to extra param slots.
  - Support a nested reduce() inside the body via an SPI-based evaluation
    fallback for subquery-bearing bodies.
Aggregates inside the body remain intentionally unsupported, matching the
openCypher specification.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

modified:   Makefile
modified:   age--1.7.0--y.y.y.sql
new file:   regress/expected/age_reduce.out
new file:   regress/sql/age_reduce.sql
modified:   sql/age_aggregate.sql
modified:   src/backend/nodes/ag_nodes.c
modified:   src/backend/nodes/cypher_copyfuncs.c
modified:   src/backend/nodes/cypher_outfuncs.c
modified:   src/backend/nodes/cypher_readfuncs.c
modified:   src/backend/parser/cypher_analyze.c
modified:   src/backend/parser/cypher_clause.c
modified:   src/backend/parser/cypher_gram.y
modified:   src/backend/utils/adt/agtype.c
modified:   src/include/nodes/ag_nodes.h
modified:   src/include/nodes/cypher_copyfuncs.h
modified:   src/include/nodes/cypher_nodes.h
modified:   src/include/nodes/cypher_outfuncs.h
modified:   src/include/nodes/cypher_readfuncs.h
modified:   src/include/parser/cypher_kwlist.h
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants