[spark] Record the write operation type in snapshot properties#8236
[spark] Record the write operation type in snapshot properties#8236Zouxxyy wants to merge 4 commits into
Conversation
| /** Compact the manifest entries. Generates a snapshot with {@link CommitKind#COMPACT}. */ | ||
| void compactManifests(); | ||
|
|
||
| BatchTableCommit withCommitProperties(Map<String, String> properties); |
There was a problem hiding this comment.
[P2] BatchTableCommit is annotated @Public, so adding a new abstract method breaks existing third-party implementations at compile time (and can surface as AbstractMethodError for already-compiled implementations). Since this extension is optional for older implementations, could we make it a Java 8 default method and keep the concrete override in TableCommitImpl?
| performNonPrimaryKeyDelete(sparkSession) | ||
| } | ||
| writer.commit(commitMessages) | ||
| writer.commit(commitMessages, SnapshotOperation.asProperties(SnapshotOperation.DELETE)) |
There was a problem hiding this comment.
[P2] This only covers row-rewrite deletes. Metadata-only deletes are optimized before this command runs (for example DELETE FROM t or a partition-only DELETE becomes TruncatePaimonTableWithFilterExec), and that path calls truncateTable / truncatePartitions without any snapshot properties. Those DELETE snapshots would still miss operation=DELETE, so we need to carry SnapshotOperation.DELETE through the truncate path as well.
…ELETE Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Purpose
A Paimon snapshot only records the physical
CommitKind(APPEND/COMPACT/OVERWRITE/...), not the logical operation that produced it — so an APPEND fromINSERT INTOcannot be told apart from one produced byMERGE INTO.This PR records the logical operation type in the snapshot
propertiesmap under the keyoperation. No format change —Snapshotalready has aproperties: Map<String, String>field.Core: add
InnerTableCommit#withCommitProperties(...), applied inTableCommitImplso the properties land on every snapshot the commit generates (both the append and overwrite paths, sinceFileStoreCommitImplsources snapshot properties fromcommittable.properties()).Spark (both v1 and v2 write paths):
WRITEOVERWRITEDELETEUPDATEMERGECREATE TABLE AS SELECTREPLACE TABLE AS SELECT/CREATE OR REPLACE TABLE AS SELECTTests
Added
SnapshotOperationTest(paimon-spark-ut) asserting the recordedoperationfor INSERT/OVERWRITE/UPDATE/DELETE/MERGE under bothspark.paimon.write.use-v2-write=trueandfalse, plus CTAS/RTAS.