Skip to content

Redo summarization infrastructure — simplify background/foreground compaction #307860

@bhavyaus

Description

@bhavyaus

The current summarization/compaction infrastructure in Copilot Chat has grown complex and fragile:

Current state

  • Three summarization modes: Full (with tool schemas + full history), Simple (truncated text fallback), and Inline (appended user message)
  • Two execution paths: Foreground (blocking, triggered on BudgetExceededError) and Background (async, kicked off at ≥80% context, applied at ≥95% or next iteration)
  • Multiple threshold checks: pre-render (previous iteration token count), post-render (current iteration), and budget-exceeded catch
  • Cascading fallbacks: Background → foreground → Full → Simple → renderWithoutSummarization → hard error
  • Retry guards: per-turn metadata tracking to prevent repeated failed foreground attempts
  • Budget mismatches: summarization endpoint vs main render endpoint, tool schema filtering for deferred tools, different ChatLocation behavior

Problems

  • Hard to reason about which code path runs in which scenario
  • Failure metadata, telemetry, and fallback logic duplicated across many branches
  • Budget calculation for summarization is coupled to the main render tool token reduction logic but the two have fundamentally different needs
  • Background compaction retry from Failed state has no limit
  • The relationship between toolTokens, safeBudget, baseBudget, contextRatio, and postRenderRatio is non-obvious

Related: microsoft/vscode-copilot-chat#4981

Metadata

Metadata

Assignees

Labels

debtCode quality issues

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions