Skip to content

feat(gitlab): sync repository files (code/docs)#4864

Merged
waleedlatif1 merged 9 commits into
stagingfrom
waleedlatif1/gitlab-repo-files
Jun 3, 2026
Merged

feat(gitlab): sync repository files (code/docs)#4864
waleedlatif1 merged 9 commits into
stagingfrom
waleedlatif1/gitlab-repo-files

Conversation

@waleedlatif1
Copy link
Copy Markdown
Collaborator

Summary

  • adds repository file/code sync to the GitLab connector — previously it only synced wiki + issues, so (unlike the GitHub connector) a user couldn't index their READMEs, /docs/*.md, or source. This closes that parity gap.
  • new contentTypes options: Code, Wiki & Issues / Code only / Wiki only / Issues only / Wiki & Issues (legacy both preserved = wiki+issues, so existing connectors are unchanged)
  • repo files: lists the recursive repository tree (keyset pagination via page_token), filters by pathPrefix + fileExtensions, and lazily fetches content per file
  • new advanced config: Branch (ref, defaults to the project default branch), Path Filter, File Extensions
  • change detection uses the git blob SHA (tree entry.id on listing == blob_id on fetch) — identical stub↔getDocument hash, no content fetch during listing (mirrors the GitHub connector)
  • skips binary (NUL-byte heuristic) and oversized (>10 MB) files; new path + size tags

Type of Change

  • New feature

Testing

Verified every endpoint/field against the live GitLab API docs (repository tree keyset pagination + Link page_token; files endpoint blob_id/base64 content; blob-SHA identity between tree and files). Type-check clean, lint clean, 100 connector tests pass. Not exercised against a live GitLab project (no test token).

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel
Copy link
Copy Markdown

vercel Bot commented Jun 3, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Jun 3, 2026 7:19pm

Request Review

@cursor
Copy link
Copy Markdown

cursor Bot commented Jun 3, 2026

PR Summary

Medium Risk
Large new sync surface (full repo trees, many API calls, possible secrets in indexed source) with PAT-scoped access; behavior is gated by config and skips binary/oversized files, but misconfiguration could index broad paths.

Overview
Adds repository file indexing to the GitLab knowledge connector so projects can sync READMEs, docs, and source—not only wiki pages and issues. Existing both configs still mean wiki + issues only; new options include code only and code + wiki + issues.

Listing runs a new repo phase first (then wiki, then issues), walking the recursive repository tree with GitLab keyset pagination (Link rel="next" stored in the cursor). Files are listed as deferred stubs and content is fetched in getDocument; change detection uses the git blob SHA so listing does not need file bodies. Branch/ref, path prefix, and comma-separated extensions are new advanced settings; binary (NUL sniff) and >10 MB files are skipped. Wiki/tree 403/404 now skip that phase instead of failing the whole sync. Config validation checks a user-supplied ref when code sync is enabled; path and size tags were added for file documents.

Reviewed by Cursor Bugbot for commit 3e103b4. Configure here.

Comment thread apps/sim/connectors/gitlab/gitlab.ts
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Jun 3, 2026

Greptile Summary

This PR adds repository-file sync (code/docs) to the GitLab connector, closing a feature-parity gap with the GitHub connector. Five content-type options are introduced (repo, wiki, issues, both, all) with a backwards-compatible both default, and three new advanced config fields (ref, pathPrefix, fileExtensions) control which files are indexed.

  • Listing phase: the repository tree is fetched recursively with keyset pagination (following rel=\"next\" links verbatim), filtered client-side by path prefix and extension, and turned into deferred stubs using the git blob SHA as the change-detection hash.
  • Content fetch (getDocument): files are lazily decoded from base64, rejected if binary (NUL-byte heuristic) or oversized (> 10 MB); resolveRef caches the default branch in syncContext across pages.
  • Phase state machine: a CursorState struct serialized to base64url drives phase transitions (repowikiissues); 403/404 on any phase gracefully skips to the next.

Confidence Score: 4/5

Safe to merge for all listing and content-fetch paths; the one rough edge is that file documents fetched via getDocument may expose a raw API URL instead of the human-friendly web UI URL when projectPath has not been populated by a prior listDocuments call.

The phase state machine, keyset pagination, binary/size guards, and blob-SHA change detection are all implemented correctly. The two issues flagged in the previous review round are confirmed fixed. The outstanding concern — projectPath being read from syncContext before resolveRef can populate it inside getDocument — means the sourceUrl on lazily-fetched file documents will fall back to the raw API endpoint rather than the web UI link whenever getDocument runs without a pre-warmed syncContext.

apps/sim/connectors/gitlab/gitlab.ts — specifically the getDocument handler for FILE_PREFIX documents, where projectPath is captured before resolveRef runs.

Important Files Changed

Filename Overview
apps/sim/connectors/gitlab/gitlab.ts Adds ~300 lines for repo-file sync: phase state machine, keyset pagination via Link header, binary/size guards, blob-SHA change detection, and ref resolution with syncContext caching. pathPrefix normalization and resolveRef logging (flagged in prior review) are addressed. A previously reported issue — projectPath read from syncContext before resolveRef can populate it in getDocument — remains unresolved.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A([listDocuments called]) --> B{phases from choice?}
    B -- empty --> Z([return empty])
    B -- non-empty --> C[decode cursor / reset if invalid]
    C --> D{state.phase}
    D -- repo --> E[resolveRef]
    E --> F[fetch tree page]
    F --> G{ok?}
    G -- 403/404 --> H[warn + advance]
    G -- error --> I([throw])
    G -- ok --> J[filter pathPrefix + ext]
    J --> K[applyMaxItemsCap]
    K -- capped --> L([hasMore=false])
    K -- ok --> M{rel=next link?}
    M -- yes --> N([nextCursor=repo+url])
    M -- no --> O[advance phase]
    D -- wiki --> P[fetch wiki pages]
    P --> Q{ok?}
    Q -- 403/404 --> H
    Q -- error --> I
    Q -- ok --> R[build docs] --> O
    D -- issues --> S[fetch issues page]
    S --> T[build docs]
    T --> U{more pages?}
    U -- yes --> V([nextCursor=issues+page])
    U -- no --> W([hasMore=false])
    O --> X{next phase?}
    X -- yes --> Y([nextCursor=phase])
    X -- no --> W
Loading

Reviews (6): Last reviewed commit: "fix(gitlab): skip repo phase on tree 403..." | Re-trigger Greptile

Comment thread apps/sim/connectors/gitlab/gitlab.ts
Comment thread apps/sim/connectors/gitlab/gitlab.ts Outdated
…fallback, normalize pathPrefix to directory boundary
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 46b7c0e. Configure here.

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

Fixed in c3bd177. buildFileSourceUrl now encodes the ref per slash-delimited segment (ref.split('/').map(encodeURIComponent).join('/')) for the web UI blob link, so GitFlow branches like feature/my-branch keep their raw slashes (/-/blob/feature/my-branch/…) while other special characters are still encoded — same approach already used for the file path. The API ?ref= query-param usages keep full encodeURIComponent (correct for a query value). Good catch — that was the one issue gating the score.

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Comment thread apps/sim/connectors/gitlab/gitlab.ts Outdated
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Comment thread apps/sim/connectors/gitlab/gitlab.ts Outdated
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Comment thread apps/sim/connectors/gitlab/gitlab.ts
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 3e103b4. Configure here.

The Ollama BYOK icon rendered blank because its SVG path had spaces
stripped between arc-command flags (e.g. `a5.05 5.05 0 12.05-.636`),
producing invalid tokens. Replaced with the canonical Ollama path.

Also added a dedicated FalIcon (was falling back to the generic
ImageIcon) and wired it into the BYOK provider list.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The leftmost spark of the Fireworks icon never rendered because its
third subpath used a corrupted arc command (`a34.59 34.59 0 17.15 37.65`)
with collapsed flags, yielding an invalid sweep-flag of 7 that aborts
the path parse. Replaced with the canonical lobehub Fireworks source.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@waleedlatif1 waleedlatif1 merged commit cd66774 into staging Jun 3, 2026
9 checks passed
@waleedlatif1 waleedlatif1 deleted the waleedlatif1/gitlab-repo-files branch June 3, 2026 19:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant