Skip to content

[LLM Classification] Skip duplicate LLM calls on no-op feed updates#470

Open
tcx4c70 wants to merge 4 commits into
FreshRSS:mainfrom
tcx4c70:llm-duplicate-update
Open

[LLM Classification] Skip duplicate LLM calls on no-op feed updates#470
tcx4c70 wants to merge 4 commits into
FreshRSS:mainfrom
tcx4c70:llm-duplicate-update

Conversation

@tcx4c70

@tcx4c70 tcx4c70 commented Jun 1, 2026

Copy link
Copy Markdown

Some RSS sources re-publish existing articles with no semantic change (reformatted author, tweaked date, new enclosure attribute, etc.). FreshRSS detects these as updated entries via hash mismatch and calls EntryBeforeInsert again, which previously triggered another LLM API call for every such pseudo-update.

The extension now stores a SHA-1 of the prompt it sent (plus the exact list of tags it assigned) on each classified entry, under the 'llm_classification' attribute namespace. On a feed update of an already-classified entry:

  • if the new prompt hashes to the same value, the prior tags are restored and no LLM call is made;
  • otherwise, behaviour follows the new 'Re-classify when content changes' toggle (default on): call the LLM and refresh, or keep the prior tags untouched.

When re-classifying, the prior tag list is used to remove only those exact tags (instead of the previous prefix-based heuristic), so manual tags sharing the prefix are preserved.

Some RSS sources re-publish existing articles with no semantic change
(reformatted author, tweaked date, new enclosure attribute, etc.).
FreshRSS detects these as updated entries via hash mismatch and calls
EntryBeforeInsert again, which previously triggered another LLM API
call for every such pseudo-update.

The extension now stores a SHA-1 of the prompt it sent (plus the exact
list of tags it assigned) on each classified entry, under the
'llm_classification' attribute namespace. On a feed update of an
already-classified entry:

- if the new prompt hashes to the same value, the prior tags are
  restored and no LLM call is made;
- otherwise, behaviour follows the new 'Re-classify when content
  changes' toggle (default on): call the LLM and refresh, or keep the
  prior tags untouched.

When re-classifying, the prior tag list is used to remove only those
exact tags (instead of the previous prefix-based heuristic), so manual
tags sharing the prefix are preserved.
@tcx4c70

tcx4c70 commented Jun 1, 2026

Copy link
Copy Markdown
Author

After the fix, the LLM API requests drop from 4-8k per day to ~800 per day, which matches the number of new articles in my RSS sources.

截屏2026-06-01 22 19 25

@Alkarex

Alkarex commented Jun 29, 2026

Copy link
Copy Markdown
Member

Thanks and sorry for the delay.
I had another change, I wanted to address first, and my first thought was that it looks like quite many changes for a hopefully simple option, but I am taking a look

Alkarex added 2 commits June 29, 2026 15:30
* Remove option (for now)
* Re-use existing information instead of additional attribute
* Reduce code
@Alkarex

Alkarex commented Jun 29, 2026

Copy link
Copy Markdown
Member

Thanks again. I have made some simplifications, not tested much yet though. Could you please check whether that would still work for you?

return $entry;
$classification = null;

assert($this->getEntrypoint() !== ''); // For PHPStan // TODO: Fix in parent method

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants