feat(cache): Stage 4b — pgvector semantic cache backend (DP) by moonming · Pull Request #86 · api7/aisix

moonming · 2026-05-05T09:20:30Z

Summary

DP half of Stage 4b. When a chat request matches a CachePolicy with backend = pgvector, the proxy now embeds the prompt locally (provider creds stay on DP) and looks it up against the dp-manager /dp/cache/{lookup,put} endpoints (cosine-ANN over pgvector). Misses fall through to the upstream and the response is PUT back into the index on success.
New aisix-cache::pgvector HTTP wire client + aisix-cache::embed helper that resolves embedding creds from the env's first OpenAI Model and reuses the existing Bridge::embed surface with the policy's embedding_model swapped onto the wire.
New CacheBackend::Pgvector enum variant for cache_policy.rs; ProxyState::pgvector_cache field + with_pgvector_cache builder; chat dispatch path with fail-open on every error edge (transport, embed, decode) → cache_status = Disabled.
Bootstrap wiring in aisix-server: piggybacks on the heartbeat mTLS bundle and dpmgr origin, exactly like BudgetClient. No new config knob; self-hosted dev simply runs without the pgvector path.

Companion PRs

AISIX-Cloud Gateway should forward upstream Retry-After header on 429 pass-through #144 — schema (cache_entries_semantic + HNSW cosine index) + internal/dpmgr/api/cache.go handlers
AISIX-Cloud-dashboard Anthropic bridge leaks upstream error taxonomy into error.message #145 — backend dropdown polish + SemanticBackendNote

Test plan

cargo clippy --workspace --all-targets -- -D warnings clean
aisix-cache: 20 tests (5 pgvector wiremock + 3 embed snapshot) green
aisix-core: 94 tests green (cache_policy parser + Pgvector variant)
aisix-proxy: 105 tests green (chat applies_to matcher coverage)
aisix-server: 31 tests green (bootstrap unaffected by new wiring)
live e2e: similar-prompt round trip against dpmgr + pgvector — follow-up PR

Notes

Streaming requests bypass semantic cache (existing behaviour).
Embedding-cred sourcing is intentionally implicit: first OpenAI Model in the snapshot. Per Stage 4b design (option C), there is no embedding_provider_key_id field on CachePolicy yet — that's a follow-up if operators need to bill embedding against a different key.
Pre-existing aisix-guardrails::bedrock test failures are environmental (TLS root cert parsing) and untouched by this PR.

Tracks #90.

Stage 4b data-plane half: route chat requests with a matched `backend = pgvector` policy through dp-manager's `/dp/cache/{lookup,put}` endpoints for cosine-ANN cache hits. - aisix-cache: new `pgvector` module — `PgvectorCache` HTTP client for the dp-manager handler, with fail-open `SemanticCacheError` and `SemanticHit { response, prompt_tokens, completion_tokens, similarity }`. - aisix-cache: new `embed` module — `embed_prompt` reuses the env's first OpenAI Model's provider creds via `Hub::get(Provider::Openai)` and `Bridge::embed`, swapping in the policy's `embedding_model` on the wire. - aisix-core: `CacheBackend::Pgvector` enum variant. - aisix-proxy::state: `pgvector_cache: Option<Arc<PgvectorCache>>` field + `with_pgvector_cache` builder. - aisix-proxy::chat: dispatch path — when matched policy's backend is `pgvector`, embed the last user message, lookup against the vector index; on miss continue to upstream and PUT the result on success. Embedding/transport failures fall open with `cache_status = Disabled` per the Stage 4b design note. Streaming requests bypass semantic cache (existing behaviour). The embed call reuses the chat request id so it correlates in upstream logs. Tests: pgvector wiremock coverage (hit, miss, handler-error, put, base-url normalisation), embed snapshot resolver coverage, proxy cache-policy applies_to matcher coverage. All 219 tests green across aisix-cache (20), aisix-core (94), aisix-proxy (105).

Build a `PgvectorCache` against the dpmgr origin reusing the heartbeat mTLS bundle (same pattern as `BudgetClient`), and attach it to the `ProxyState`. In self-hosted dev (no heartbeat_cfg) and on bundle-build failure the proxy falls back to surfacing matched pgvector policies as `cache_status = Disabled` — no traffic impact.

Copilot

Pull request overview

Adds the DP-side pieces for Stage 4b semantic caching: pgvector-backed cache policy handling, local embedding generation, and dp-manager lookup/put wiring in the proxy/server bootstrap.

Changes:

Added a new pgvector cache-policy backend plus a DP HTTP client for /dp/cache/lookup and /dp/cache/put.
Wired semantic-cache dispatch into chat handling, including local prompt embedding and fail-open behavior.
Bootstrapped the new client in aisix-server and exposed it through ProxyState.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`crates/aisix-server/src/main.rs`	Bootstraps and attaches the pgvector cache client from managed-mode mTLS config.
`crates/aisix-proxy/src/state.rs`	Extends shared proxy state with an optional pgvector cache handle.
`crates/aisix-proxy/src/chat.rs`	Adds policy matching, semantic lookup/writeback flow, and cache-status handling.
`crates/aisix-core/src/models/cache_policy.rs`	Introduces the `Pgvector` cache-policy backend variant and related docs.
`crates/aisix-cache/src/pgvector.rs`	Implements the dp-manager wire client for semantic cache lookup and put.
`crates/aisix-cache/src/lib.rs`	Exports the new embedding helper and pgvector client types.
`crates/aisix-cache/src/embed.rs`	Adds helper logic for selecting credentials and calling the embedding bridge.
`crates/aisix-cache/Cargo.toml`	Adds `reqwest` and `wiremock` dependencies needed by the new cache client/tests.
`Cargo.lock`	Locks the added crate dependencies.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

    Memory,
    Redis,
    RedisSemantic,
+    Pgvector,


+    let matched_policy = snapshot
        .cache_policies
        .entries()
-        .iter()
-        .any(|entry| {
+        .into_iter()
+        .find(|entry| {


+    // For the moka path: only consult the cache when the matched
+    // policy's backend is Memory (or the matched policy isn't
+    // pgvector AND the moka cache is configured). Stage 2's
+    // any-policy gate is replaced here by the per-backend dispatch.
+    let cache_active_by_policy = matched_policy
+        .as_ref()
+        .map(|p| matches!(p.value.backend, CacheBackend::Memory))


+fn first_openai_model(snapshot: &AisixSnapshot) -> Option<Arc<ResourceEntry<Model>>> {
+    snapshot
+        .models
+        .entries()
+        .into_iter()
+        .find(|entry| matches!(entry.value.provider(), Some(Provider::Openai)))


+    if let (Some(matched), Some(pgvector)) =
+        (pgvector_match, state.pgvector_cache.as_ref())
+    {


moonming added 2 commits May 5, 2026 17:15

Copilot AI review requested due to automatic review settings May 5, 2026 09:20

Copilot started reviewing on behalf of moonming May 5, 2026 09:21 View session

Copilot AI reviewed May 5, 2026

View reviewed changes

moonming mentioned this pull request May 5, 2026

feat(cache): pgvector semantic cache backend (Stage 4b) #90

Open

6 tasks

jarvis9443 mentioned this pull request Jun 11, 2026

feat(cache): respect per-policy backend; unavailable redis fails visible, not silent-memory #587

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cache): Stage 4b — pgvector semantic cache backend (DP)#86

feat(cache): Stage 4b — pgvector semantic cache backend (DP)#86
moonming wants to merge 2 commits into
mainfrom
feat/cache-policies-stage4b-dp

moonming commented May 5, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

moonming commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Companion PRs

Test plan

Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

moonming commented May 5, 2026 •

edited

Loading