feat(cache): Stage 4b — pgvector semantic cache backend (DP)#86
Open
moonming wants to merge 2 commits into
Open
feat(cache): Stage 4b — pgvector semantic cache backend (DP)#86moonming wants to merge 2 commits into
moonming wants to merge 2 commits into
Conversation
Stage 4b data-plane half: route chat requests with a matched
`backend = pgvector` policy through dp-manager's
`/dp/cache/{lookup,put}` endpoints for cosine-ANN cache hits.
- aisix-cache: new `pgvector` module — `PgvectorCache` HTTP client
for the dp-manager handler, with fail-open `SemanticCacheError`
and `SemanticHit { response, prompt_tokens, completion_tokens,
similarity }`.
- aisix-cache: new `embed` module — `embed_prompt` reuses the env's
first OpenAI Model's provider creds via `Hub::get(Provider::Openai)`
and `Bridge::embed`, swapping in the policy's `embedding_model`
on the wire.
- aisix-core: `CacheBackend::Pgvector` enum variant.
- aisix-proxy::state: `pgvector_cache: Option<Arc<PgvectorCache>>`
field + `with_pgvector_cache` builder.
- aisix-proxy::chat: dispatch path — when matched policy's backend
is `pgvector`, embed the last user message, lookup against the
vector index; on miss continue to upstream and PUT the result on
success. Embedding/transport failures fall open with
`cache_status = Disabled` per the Stage 4b design note.
Streaming requests bypass semantic cache (existing behaviour). The
embed call reuses the chat request id so it correlates in upstream
logs.
Tests: pgvector wiremock coverage (hit, miss, handler-error,
put, base-url normalisation), embed snapshot resolver coverage,
proxy cache-policy applies_to matcher coverage. All 219 tests
green across aisix-cache (20), aisix-core (94), aisix-proxy (105).
Build a `PgvectorCache` against the dpmgr origin reusing the heartbeat mTLS bundle (same pattern as `BudgetClient`), and attach it to the `ProxyState`. In self-hosted dev (no heartbeat_cfg) and on bundle-build failure the proxy falls back to surfacing matched pgvector policies as `cache_status = Disabled` — no traffic impact.
There was a problem hiding this comment.
Pull request overview
Adds the DP-side pieces for Stage 4b semantic caching: pgvector-backed cache policy handling, local embedding generation, and dp-manager lookup/put wiring in the proxy/server bootstrap.
Changes:
- Added a new
pgvectorcache-policy backend plus a DP HTTP client for/dp/cache/lookupand/dp/cache/put. - Wired semantic-cache dispatch into chat handling, including local prompt embedding and fail-open behavior.
- Bootstrapped the new client in
aisix-serverand exposed it throughProxyState.
Reviewed changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
crates/aisix-server/src/main.rs |
Bootstraps and attaches the pgvector cache client from managed-mode mTLS config. |
crates/aisix-proxy/src/state.rs |
Extends shared proxy state with an optional pgvector cache handle. |
crates/aisix-proxy/src/chat.rs |
Adds policy matching, semantic lookup/writeback flow, and cache-status handling. |
crates/aisix-core/src/models/cache_policy.rs |
Introduces the Pgvector cache-policy backend variant and related docs. |
crates/aisix-cache/src/pgvector.rs |
Implements the dp-manager wire client for semantic cache lookup and put. |
crates/aisix-cache/src/lib.rs |
Exports the new embedding helper and pgvector client types. |
crates/aisix-cache/src/embed.rs |
Adds helper logic for selecting credentials and calling the embedding bridge. |
crates/aisix-cache/Cargo.toml |
Adds reqwest and wiremock dependencies needed by the new cache client/tests. |
Cargo.lock |
Locks the added crate dependencies. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Memory, | ||
| Redis, | ||
| RedisSemantic, | ||
| Pgvector, |
Comment on lines
+450
to
+454
| let matched_policy = snapshot | ||
| .cache_policies | ||
| .entries() | ||
| .iter() | ||
| .any(|entry| { | ||
| .into_iter() | ||
| .find(|entry| { |
Comment on lines
+605
to
+611
| // For the moka path: only consult the cache when the matched | ||
| // policy's backend is Memory (or the matched policy isn't | ||
| // pgvector AND the moka cache is configured). Stage 2's | ||
| // any-policy gate is replaced here by the per-backend dispatch. | ||
| let cache_active_by_policy = matched_policy | ||
| .as_ref() | ||
| .map(|p| matches!(p.value.backend, CacheBackend::Memory)) |
Comment on lines
+99
to
+104
| fn first_openai_model(snapshot: &AisixSnapshot) -> Option<Arc<ResourceEntry<Model>>> { | ||
| snapshot | ||
| .models | ||
| .entries() | ||
| .into_iter() | ||
| .find(|entry| matches!(entry.value.provider(), Some(Provider::Openai))) |
Comment on lines
+488
to
+490
| if let (Some(matched), Some(pgvector)) = | ||
| (pgvector_match, state.pgvector_cache.as_ref()) | ||
| { |
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
CachePolicywithbackend = pgvector, the proxy now embeds the prompt locally (provider creds stay on DP) and looks it up against the dp-manager/dp/cache/{lookup,put}endpoints (cosine-ANN over pgvector). Misses fall through to the upstream and the response is PUT back into the index on success.aisix-cache::pgvectorHTTP wire client +aisix-cache::embedhelper that resolves embedding creds from the env's first OpenAI Model and reuses the existingBridge::embedsurface with the policy'sembedding_modelswapped onto the wire.CacheBackend::Pgvectorenum variant forcache_policy.rs;ProxyState::pgvector_cachefield +with_pgvector_cachebuilder; chat dispatch path with fail-open on every error edge (transport, embed, decode) →cache_status = Disabled.aisix-server: piggybacks on the heartbeat mTLS bundle and dpmgr origin, exactly likeBudgetClient. No new config knob; self-hosted dev simply runs without the pgvector path.Companion PRs
cache_entries_semantic+ HNSW cosine index) +internal/dpmgr/api/cache.gohandlersSemanticBackendNoteTest plan
cargo clippy --workspace --all-targets -- -D warningscleanPgvectorvariant)Notes
embedding_provider_key_idfield onCachePolicyyet — that's a follow-up if operators need to bill embedding against a different key.aisix-guardrails::bedrocktest failures are environmental (TLS root cert parsing) and untouched by this PR.Tracks #90.