Skip to content

Improve token caching.#1366

Merged
jsourcebot merged 8 commits into
mainfrom
jminnetian/improve-token-caching
Jun 24, 2026
Merged

Improve token caching.#1366
jsourcebot merged 8 commits into
mainfrom
jminnetian/improve-token-caching

Conversation

@jsourcebot

@jsourcebot jsourcebot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Improved Ask Sourcebot prompt caching by splitting static and dynamic prompt sections and advancing cache breakpoints after every agent step instead of only after each message

Summary by CodeRabbit

  • New Features
    • Enhanced Enterprise “Ask Sourcebot” prompt caching with provider-aware strategies, separating byte-stable static vs dynamic prompt content.
    • Added server controls to enable caching, set static TTL, and optionally detect cache misses/breakpoints.
    • Cache breakpoints now advance after each agent step (not just per message).
  • Bug Fixes
    • Stabilized prompt/tool byte layout by making MCP tool/server and repository ordering deterministic.
  • Tests
    • Expanded coverage for cache markers, TTL behavior, no-op providers, and static prompt byte identity across repo selections.
  • Chores
    • Updated changelog to document the improved caching behavior.

@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a PromptCacheStrategy abstraction to the EE chat agent that separates static and dynamic prompt sections, enforces deterministic byte ordering in MCP tool/client queries, and attaches Anthropic ephemeral cache-control markers to the static system block and per-step tail message. The strategy is wired through all chat entry points via three new env flags.

Changes

EE Ask Sourcebot Prompt Caching

Layer / File(s) Summary
PromptCacheStrategy module and env flags
packages/shared/src/env.server.ts, packages/web/src/ee/features/chat/promptCaching.ts
Defines CacheTtl, PromptCacheStrategy interface, no-op and Anthropic ephemeral strategies, getPromptCacheStrategy, mergeProviderOptions, detectPromptCacheBreak, and detectUnexpectedCacheMiss. Adds three env schema fields for static prefix enable, TTL selection (5m/1h), and break-detection toggle.
Deterministic MCP and tool ordering
packages/web/src/ee/features/chat/mcp/mcpClientFactory.ts, packages/web/src/ee/features/chat/mcp/mcpToolRegistry.ts
Adds orderBy: { serverId: 'asc' } to the getConnectedMcpClients Prisma query. Sorts tool entries by name in buildMcpToolRegistry before mapping to stabilize byte layout across requests.
Agent prompt split and caching wiring
packages/web/src/ee/features/chat/agent.ts
Extends CreateMessageStreamResponseProps and AgentOptions with promptCacheStrategy. Sorts selectedRepos for byte stability. Refactors createPrompt into staticPrompt/dynamicPrompt via a dynamicSections array. Implements static-prefix mode with activationToolMarker, applies tailMarker per step in prepareStep, and gates observability helpers behind env flags.
Strategy wiring at chat entry points
packages/web/src/app/api/(server)/ee/chat/route.ts, packages/web/src/ee/features/mcp/askCodebase.ts
Computes promptCacheStrategy from the selected provider and SOURCEBOT_CHAT_PROMPT_CACHING_ENABLED flag in both chat/route.ts and askCodebase.ts, then passes it into createMessageStream.
Tests and changelog
packages/web/src/ee/features/chat/promptCaching.test.ts, packages/web/src/ee/features/chat/agent.test.ts, CHANGELOG.md
Adds promptCaching.test.ts covering strategy and merge behavior across providers and TTL variants. Extends agent.test.ts with a full caching suite asserting static-block markers, per-step tail-marker relocation, non-Anthropic no-ops, tool marker presence, and static prompt byte identity. Updates changelog.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant ChatRoute as chat/route.ts
  participant Agent as createAgentStream
  participant Prompt as createPrompt
  participant CacheStrategy as PromptCacheStrategy
  participant StreamText as streamText (AI SDK)

  Client->>ChatRoute: POST /api/ee/chat
  ChatRoute->>CacheStrategy: getPromptCacheStrategy(provider, enabled)
  CacheStrategy-->>ChatRoute: strategy (Anthropic or no-op)
  ChatRoute->>Agent: createMessageStream({ promptCacheStrategy })
  Agent->>Prompt: createPrompt(sortedRepos)
  Prompt-->>Agent: { staticPrompt, dynamicPrompt }
  Agent->>CacheStrategy: strategy.cacheControl({ ttl: staticTtl })
  CacheStrategy-->>Agent: staticMarker (providerOptions)
  Agent->>StreamText: systemMessages[0] with staticMarker + dynamicPrompt
  loop each agent step
    StreamText->>Agent: prepareStep(stepMessages)
    Agent->>Agent: move tailMarker onto last message
    Agent-->>StreamText: messages with tailMarker applied
    StreamText-->>Agent: stepResult { cacheReadTokens }
    Agent->>Agent: detectUnexpectedCacheMiss(stepIndex, cacheReadTokens)
  end
  StreamText-->>Client: streamed response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • sourcebot-dev/sourcebot#1278: Implements the initial Anthropic prompt caching by wiring providerOptions.anthropic.cacheControl into the chat agent's streamText call and adding token-cache metrics to the UI — the direct predecessor to this PR's strategy abstraction and break-detection layer.

Suggested reviewers

  • brendan-kellam
🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title "Improve token caching" is vague and generic, using non-specific language that doesn't convey the specific changes made to prompt caching architecture. Consider a more descriptive title that captures the key changes, such as "Split static/dynamic prompt sections and advance cache breakpoints per agent step" or "Improve prompt caching with static prefix separation and per-step breakpoints".
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch jminnetian/improve-token-caching

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@CHANGELOG.md`:
- Line 11: In the CHANGELOG.md file, locate the line containing the Ask
Sourcebot prompt caching improvement entry under the [Unreleased] section.
Replace both instances of the placeholder `<id>` in the markdown link reference
`[#<id>](https://github.com/sourcebot-dev/sourcebot/pull/<id>)` with the actual
pull request number for this change. The same numeric PR id should be used in
both the link text and the URL to create a valid GitHub PR link.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 70bb4df8-9bdb-46d6-994c-b4004e501537

📥 Commits

Reviewing files that changed from the base of the PR and between 889e2b1 and 05e306d.

📒 Files selected for processing (10)
  • CHANGELOG.md
  • packages/shared/src/env.server.ts
  • packages/web/src/app/api/(server)/ee/chat/route.ts
  • packages/web/src/ee/features/chat/agent.test.ts
  • packages/web/src/ee/features/chat/agent.ts
  • packages/web/src/ee/features/chat/mcp/mcpClientFactory.ts
  • packages/web/src/ee/features/chat/mcp/mcpToolRegistry.ts
  • packages/web/src/ee/features/chat/promptCaching.test.ts
  • packages/web/src/ee/features/chat/promptCaching.ts
  • packages/web/src/ee/features/mcp/askCodebase.ts

Comment thread CHANGELOG.md Outdated
Adds a divergence-proof static front checkpoint (cross-chat reuse of
tool + static-system bytes) and an MCP activation-resilience breakpoint
on top of the existing moving tail marker, behind a provider-aware
resolver that is a no-op for non-Anthropic providers. Splits the system
prompt into static/dynamic blocks and hardens MCP ordering for byte
stability, all gated by new env flags.
The moving tail breakpoint was set once on the last input message before
streamText's loop, so a turn's tool calls and outputs accumulated past it
and were reprocessed uncached on each later step. Apply it in prepareStep
to the last message of every step instead, caching the growing in-turn
delta incrementally. prepareStep now runs without MCP too, and stays a
no-op for non-Anthropic providers.
…ignature

cacheBreakSnapshots was keyed by chatId and never evicted, so with cache-break detection enabled it grew with the cumulative number of distinct chats served. Add a FIFO cap that drops the oldest entry on overflow, and replace the hand-rolled djb2 signature hash with a sha256 slice matching getOAuthScopeHash (observability-only and compared in-process, so determinism is all it needs).
Marker 1 only saved re-writing the built-in tool schemas on mid-turn MCP
activation steps, and only when those schemas cleared the model's minimum
cacheable size. The static-system checkpoint and moving tail carry the
value, so this collapses the scheme to two breakpoints and removes the
activeTools insertion-order reasoning it required.
Remove stale references to the dropped tools-block breakpoint and tighten verbose prompt-caching comments. Comments only, no code changes.
@jsourcebot jsourcebot force-pushed the jminnetian/improve-token-caching branch from 7de1198 to 8610781 Compare June 24, 2026 01:24
Comment thread packages/shared/src/env.server.ts Outdated

SOURCEBOT_CHAT_MAX_STEP_COUNT: numberSchema.default(100),
SOURCEBOT_CHAT_PROMPT_CACHING_ENABLED: booleanSchema.default('true'),
// Phased-rollout lever for the static checkpoint. Set to 'false' to fall

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we use /** **/ comments s.t., we get inline JSDoc rendering when hovering over these symbols in the ide?

brendan-kellam
brendan-kellam previously approved these changes Jun 24, 2026
Comment thread packages/shared/src/env.server.ts Outdated
// Phased-rollout lever for the static checkpoint. Set to 'false' to fall
// back to the single moving tail marker. Only takes effect when prompt
// caching is enabled.
SOURCEBOT_CHAT_PROMPT_CACHE_STATIC_PREFIX_ENABLED: booleanSchema.default('true'),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there advantage of making this configurable?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly a safe guard in case some issue makes the static portion not cache properly, you can disable it so you stop paying the extra cost for no benefit. But it's maybe overly defensive, leaning towards removing it and keeping the env var to a minimal required set.


const logger = createLogger('prompt-caching');

export type CacheTtl = '5m' | '1h';

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can the TTL only be 5m or 1h?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, at least for anthropic

Remove the SOURCEBOT_CHAT_PROMPT_CACHE_STATIC_PREFIX_ENABLED lever so the static checkpoint is always emitted, and switch the remaining cache env vars to JSDoc comments so their descriptions render on IDE hover.
@jsourcebot jsourcebot merged commit 5e1b8ee into main Jun 24, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants