Skip to content

chore: add more accuracy models via Grove MCP-491#1140

Open
nirinchev wants to merge 2 commits into
mainfrom
ni/accuracy-models
Open

chore: add more accuracy models via Grove MCP-491#1140
nirinchev wants to merge 2 commits into
mainfrom
ni/accuracy-models

Conversation

@nirinchev
Copy link
Copy Markdown
Collaborator

@nirinchev nirinchev commented May 4, 2026

Proposed changes

This adds some Grove-backed models for the accuracy test runner to use.

Copilot AI review requested due to automatic review settings May 4, 2026 13:53
@nirinchev nirinchev requested a review from a team as a code owner May 4, 2026 13:53
@nirinchev nirinchev requested review from jeroenvervaeke and removed request for a team May 4, 2026 13:53
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the accuracy test runner to use Grove-backed LLM models (plus Anthropic via Grove), adds a model allowlist mechanism, and adjusts accuracy test infrastructure/scripts to better support local and CI runs.

Changes:

  • Replace direct OpenAI/Gemini accuracy models with multiple Grove-backed models and add optional MDB_ACCURACY_MODEL_ALLOWLIST filtering.
  • Adjust the accuracy test client process environment handling and disk result status updates.
  • Update scripts/workflows/editor config to support the new providers and higher test concurrency.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/accuracy/sdk/models.ts Switches to Grove-backed model providers, adds allowlist-based model filtering.
tests/accuracy/sdk/accuracyTestingClient.ts Strips MDB_MCP_* env vars when spawning the test server process.
tests/accuracy/sdk/accuracyResultStorage/diskStorage.ts Avoids updating run status when the results file doesn’t exist.
tests/accuracy/createDeployment.test.ts Loosens expected matcher for imageTag parameter.
scripts/accuracy/runAccuracyTests.sh Documents Grove env vars/allowlist and increases Vitest worker count.
package.json Adds @ai-sdk/anthropic dev dependency.
pnpm-lock.yaml Locks @ai-sdk/anthropic and related transitive dependencies.
.vscode/launch.json Adds a VS Code launch config for debugging accuracy tests.
.github/workflows/accuracy-tests.yml Switches CI accuracy runs to use MDB_GROVE_API_KEY instead of OpenAI/Gemini keys.
Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

Comment thread tests/accuracy/sdk/models.ts Outdated
Comment thread tests/accuracy/sdk/accuracyTestingClient.ts
Comment thread tests/accuracy/sdk/accuracyResultStorage/diskStorage.ts Outdated
Comment thread scripts/accuracy/runAccuracyTests.sh Outdated
Comment thread .vscode/launch.json
@coveralls
Copy link
Copy Markdown
Collaborator

Coverage Report for CI Build 25323040638

Warning

No base build found for commit 83cc145 on main.
Coverage changes can't be calculated without a base build.
If a base build is processing, this comment will update automatically when it completes.

Coverage: 81.733%

Details

  • Patch coverage: No coverable lines changed in this PR.

Uncovered Changes

No uncovered changes found.

Coverage Regressions

Requires a base build to compare against. How to fix this →


Coverage Stats

Coverage Status
Relevant Lines: 3668
Covered Lines: 3190
Line Coverage: 86.97%
Relevant Branches: 2299
Covered Branches: 1687
Branch Coverage: 73.38%
Branches in Coverage %: Yes
Coverage Strength: 172.44 hits per line

💛 - Coveralls

@nirinchev nirinchev changed the title chore: add more accuracy models via Grove chore: add more accuracy models via Grove MCP-491 May 4, 2026
@nirinchev
Copy link
Copy Markdown
Collaborator Author

@copilot resolve the merge conflicts in this pull request

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants