fix: make GitHub PR-description ticket selection deterministic#2422
fix: make GitHub PR-description ticket selection deterministic#2422dellch wants to merge 2 commits into
Conversation
…tion extract_ticket_links_from_pr_description() accumulates issue URLs in a set and caps via list(set)[:3], so when a PR description references more than 3 issues the selected subset is arbitrary and varies between runs. These tests assert first-seen ordering and a deterministic capped subset. The cap test fails against the current set-based implementation on every hash seed (verified across PYTHONHASHSEED 0-11); the fix follows in the next commit. Co-Authored-By: Claude
Review Summary by QodoMake GitHub PR-description ticket selection deterministic
WalkthroughsDescription• Fix nondeterministic ticket selection in PR description parsing - Replace set-based accumulation with ordered list tracking - Preserve first-seen order while de-duplicating URLs - Cap selection now returns deterministic first 3 tickets • Add comprehensive test suite for ticket extraction - Test first-seen ordering preservation - Test deterministic capped subset selection (reliable regression guard) Diagramflowchart LR
A["PR Description"] -->|extract URLs| B["Ordered List + Seen Set"]
B -->|deduplicate| C["Preserve First-Seen Order"]
C -->|cap at 3| D["Deterministic Subset"]
E["Tests"] -->|verify| D
File Changes1. pr_agent/tools/ticket_pr_compliance_check.py
|
Code Review by Qodo
Context used 1.
|
Track issue URLs in first-seen order while de-duplicating (a seen set plus an ordered list), then apply the 3-ticket cap by slicing the ordered list, instead of accumulating in a set and slicing list(set)[:3]. Selection is now stable and predictable across runs. Behaviour is unchanged for 3 or fewer issues. Makes the tests added in the previous commit pass. Co-Authored-By: Claude
8e08857 to
03979aa
Compare
|
Code review by qodo was updated up to the latest commit 03979aa |
|
Good catch — fixed in 03979aa. To clarify the origin: those two single-quoted f-strings are verbatim from the existing code on _add(f"{base_url_html.strip(chr(47))}/{owner}/{repo}/issues/{issue_number}")(inner quotes flipped to single so they do not clash with the outer double quotes). Tests still pass. |
Fixes #2421
Summary
extract_ticket_links_from_pr_description()inpr_agent/tools/ticket_pr_compliance_check.pyaccumulates GitHub issue URLs in aset, then enforces the 3-ticket cap by slicing a list built from that set:Because a
sethas no defined iteration order (and Python randomizes string hashing per process viaPYTHONHASHSEED),list(github_tickets)[:3]selects an arbitrary 3 of the referenced issues, and the selection can vary between runs. When a PR description references more than 3 issues, which issues get fetched for review context is therefore nondeterministic.Fix
Track URLs in first-seen order while de-duplicating (a
seenset plus an ordered list), then apply the cap by slicing the ordered list — mirroring whatfind_jira_tickets()in the same file already does. Selection is now stable and predictable. Behaviour is unchanged for 3 or fewer issues.Tests
Structured as two commits — failing tests first, then the fix:
test:addsTestExtractTicketLinksFromPrDescriptionwith two tests.fix:makes them pass.test_cap_selects_deterministic_first_seen_subsetis the reliable regression guard: with more than 3 issues, the old set-based code returns an arbitrary subset that does not equal the first-seen subset. I verified it fails against the previous implementation on every hash seedPYTHONHASHSEED0–11, and passes with the fix on all of them. (test_preserves_first_seen_orderdocuments the intended ordering; its docstring notes it is not a reliable seed-independent guard on its own, which is why the cap test exists.)Notes
This was originally surfaced by the Qodo review bot on a separate Jira-support PR, but the function predates that work and is unrelated to Jira, so it is raised here on its own.
Test plan
pytest tests/unittest/test_extract_issue_from_branch.pyPYTHONHASHSEED0–11) and pass with the fixCo-Authored-By: Claude