The current summarization/compaction infrastructure in Copilot Chat has grown complex and fragile:
Current state
- Three summarization modes: Full (with tool schemas + full history), Simple (truncated text fallback), and Inline (appended user message)
- Two execution paths: Foreground (blocking, triggered on BudgetExceededError) and Background (async, kicked off at ≥80% context, applied at ≥95% or next iteration)
- Multiple threshold checks: pre-render (previous iteration token count), post-render (current iteration), and budget-exceeded catch
- Cascading fallbacks: Background → foreground → Full → Simple → renderWithoutSummarization → hard error
- Retry guards: per-turn metadata tracking to prevent repeated failed foreground attempts
- Budget mismatches: summarization endpoint vs main render endpoint, tool schema filtering for deferred tools, different ChatLocation behavior
Problems
- Hard to reason about which code path runs in which scenario
- Failure metadata, telemetry, and fallback logic duplicated across many branches
- Budget calculation for summarization is coupled to the main render tool token reduction logic but the two have fundamentally different needs
- Background compaction retry from Failed state has no limit
- The relationship between toolTokens, safeBudget, baseBudget, contextRatio, and postRenderRatio is non-obvious
Related: microsoft/vscode-copilot-chat#4981
The current summarization/compaction infrastructure in Copilot Chat has grown complex and fragile:
Current state
Problems
Related: microsoft/vscode-copilot-chat#4981