Skip to content

feat(cache): Stage 4b — pgvector semantic cache backend (DP)#86

Open
moonming wants to merge 2 commits into
mainfrom
feat/cache-policies-stage4b-dp
Open

feat(cache): Stage 4b — pgvector semantic cache backend (DP)#86
moonming wants to merge 2 commits into
mainfrom
feat/cache-policies-stage4b-dp

Conversation

@moonming

@moonming moonming commented May 5, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • DP half of Stage 4b. When a chat request matches a CachePolicy with backend = pgvector, the proxy now embeds the prompt locally (provider creds stay on DP) and looks it up against the dp-manager /dp/cache/{lookup,put} endpoints (cosine-ANN over pgvector). Misses fall through to the upstream and the response is PUT back into the index on success.
  • New aisix-cache::pgvector HTTP wire client + aisix-cache::embed helper that resolves embedding creds from the env's first OpenAI Model and reuses the existing Bridge::embed surface with the policy's embedding_model swapped onto the wire.
  • New CacheBackend::Pgvector enum variant for cache_policy.rs; ProxyState::pgvector_cache field + with_pgvector_cache builder; chat dispatch path with fail-open on every error edge (transport, embed, decode) → cache_status = Disabled.
  • Bootstrap wiring in aisix-server: piggybacks on the heartbeat mTLS bundle and dpmgr origin, exactly like BudgetClient. No new config knob; self-hosted dev simply runs without the pgvector path.

Companion PRs

Test plan

  • cargo clippy --workspace --all-targets -- -D warnings clean
  • aisix-cache: 20 tests (5 pgvector wiremock + 3 embed snapshot) green
  • aisix-core: 94 tests green (cache_policy parser + Pgvector variant)
  • aisix-proxy: 105 tests green (chat applies_to matcher coverage)
  • aisix-server: 31 tests green (bootstrap unaffected by new wiring)
  • live e2e: similar-prompt round trip against dpmgr + pgvector — follow-up PR

Notes

  • Streaming requests bypass semantic cache (existing behaviour).
  • Embedding-cred sourcing is intentionally implicit: first OpenAI Model in the snapshot. Per Stage 4b design (option C), there is no embedding_provider_key_id field on CachePolicy yet — that's a follow-up if operators need to bill embedding against a different key.
  • Pre-existing aisix-guardrails::bedrock test failures are environmental (TLS root cert parsing) and untouched by this PR.

Tracks #90.

moonming added 2 commits May 5, 2026 17:15
Stage 4b data-plane half: route chat requests with a matched
`backend = pgvector` policy through dp-manager's
`/dp/cache/{lookup,put}` endpoints for cosine-ANN cache hits.

- aisix-cache: new `pgvector` module — `PgvectorCache` HTTP client
  for the dp-manager handler, with fail-open `SemanticCacheError`
  and `SemanticHit { response, prompt_tokens, completion_tokens,
  similarity }`.
- aisix-cache: new `embed` module — `embed_prompt` reuses the env's
  first OpenAI Model's provider creds via `Hub::get(Provider::Openai)`
  and `Bridge::embed`, swapping in the policy's `embedding_model`
  on the wire.
- aisix-core: `CacheBackend::Pgvector` enum variant.
- aisix-proxy::state: `pgvector_cache: Option<Arc<PgvectorCache>>`
  field + `with_pgvector_cache` builder.
- aisix-proxy::chat: dispatch path — when matched policy's backend
  is `pgvector`, embed the last user message, lookup against the
  vector index; on miss continue to upstream and PUT the result on
  success. Embedding/transport failures fall open with
  `cache_status = Disabled` per the Stage 4b design note.

Streaming requests bypass semantic cache (existing behaviour). The
embed call reuses the chat request id so it correlates in upstream
logs.

Tests: pgvector wiremock coverage (hit, miss, handler-error,
put, base-url normalisation), embed snapshot resolver coverage,
proxy cache-policy applies_to matcher coverage. All 219 tests
green across aisix-cache (20), aisix-core (94), aisix-proxy (105).
Build a `PgvectorCache` against the dpmgr origin reusing the
heartbeat mTLS bundle (same pattern as `BudgetClient`), and
attach it to the `ProxyState`. In self-hosted dev (no
heartbeat_cfg) and on bundle-build failure the proxy falls
back to surfacing matched pgvector policies as
`cache_status = Disabled` — no traffic impact.
Copilot AI review requested due to automatic review settings May 5, 2026 09:20

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds the DP-side pieces for Stage 4b semantic caching: pgvector-backed cache policy handling, local embedding generation, and dp-manager lookup/put wiring in the proxy/server bootstrap.

Changes:

  • Added a new pgvector cache-policy backend plus a DP HTTP client for /dp/cache/lookup and /dp/cache/put.
  • Wired semantic-cache dispatch into chat handling, including local prompt embedding and fail-open behavior.
  • Bootstrapped the new client in aisix-server and exposed it through ProxyState.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
crates/aisix-server/src/main.rs Bootstraps and attaches the pgvector cache client from managed-mode mTLS config.
crates/aisix-proxy/src/state.rs Extends shared proxy state with an optional pgvector cache handle.
crates/aisix-proxy/src/chat.rs Adds policy matching, semantic lookup/writeback flow, and cache-status handling.
crates/aisix-core/src/models/cache_policy.rs Introduces the Pgvector cache-policy backend variant and related docs.
crates/aisix-cache/src/pgvector.rs Implements the dp-manager wire client for semantic cache lookup and put.
crates/aisix-cache/src/lib.rs Exports the new embedding helper and pgvector client types.
crates/aisix-cache/src/embed.rs Adds helper logic for selecting credentials and calling the embedding bridge.
crates/aisix-cache/Cargo.toml Adds reqwest and wiremock dependencies needed by the new cache client/tests.
Cargo.lock Locks the added crate dependencies.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Memory,
Redis,
RedisSemantic,
Pgvector,
Comment on lines +450 to +454
let matched_policy = snapshot
.cache_policies
.entries()
.iter()
.any(|entry| {
.into_iter()
.find(|entry| {
Comment on lines +605 to +611
// For the moka path: only consult the cache when the matched
// policy's backend is Memory (or the matched policy isn't
// pgvector AND the moka cache is configured). Stage 2's
// any-policy gate is replaced here by the per-backend dispatch.
let cache_active_by_policy = matched_policy
.as_ref()
.map(|p| matches!(p.value.backend, CacheBackend::Memory))
Comment on lines +99 to +104
fn first_openai_model(snapshot: &AisixSnapshot) -> Option<Arc<ResourceEntry<Model>>> {
snapshot
.models
.entries()
.into_iter()
.find(|entry| matches!(entry.value.provider(), Some(Provider::Openai)))
Comment on lines +488 to +490
if let (Some(matched), Some(pgvector)) =
(pgvector_match, state.pgvector_cache.as_ref())
{
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants