Skip to content

[spark] Record the write operation type in snapshot properties#8236

Open
Zouxxyy wants to merge 4 commits into
apache:masterfrom
Zouxxyy:xinyu/paimon-operation
Open

[spark] Record the write operation type in snapshot properties#8236
Zouxxyy wants to merge 4 commits into
apache:masterfrom
Zouxxyy:xinyu/paimon-operation

Conversation

@Zouxxyy

@Zouxxyy Zouxxyy commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Purpose

A Paimon snapshot only records the physical CommitKind (APPEND/COMPACT/OVERWRITE/...), not the logical operation that produced it — so an APPEND from INSERT INTO cannot be told apart from one produced by MERGE INTO.

This PR records the logical operation type in the snapshot properties map under the key operation. No format change — Snapshot already has a properties: Map<String, String> field.

Core: add InnerTableCommit#withCommitProperties(...), applied in TableCommitImpl so the properties land on every snapshot the commit generates (both the append and overwrite paths, since FileStoreCommitImpl sources snapshot properties from committable.properties()).

Spark (both v1 and v2 write paths):

SQL operation
INSERT INTO WRITE
INSERT OVERWRITE OVERWRITE
DELETE DELETE
UPDATE UPDATE
MERGE INTO MERGE
CREATE TABLE AS SELECT CREATE TABLE AS SELECT
(CREATE OR) REPLACE TABLE AS SELECT REPLACE TABLE AS SELECT / CREATE OR REPLACE TABLE AS SELECT

Tests

Added SnapshotOperationTest (paimon-spark-ut) asserting the recorded operation for INSERT/OVERWRITE/UPDATE/DELETE/MERGE under both spark.paimon.write.use-v2-write=true and false, plus CTAS/RTAS.

/** Compact the manifest entries. Generates a snapshot with {@link CommitKind#COMPACT}. */
void compactManifests();

BatchTableCommit withCommitProperties(Map<String, String> properties);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] BatchTableCommit is annotated @Public, so adding a new abstract method breaks existing third-party implementations at compile time (and can surface as AbstractMethodError for already-compiled implementations). Since this extension is optional for older implementations, could we make it a Java 8 default method and keep the concrete override in TableCommitImpl?

performNonPrimaryKeyDelete(sparkSession)
}
writer.commit(commitMessages)
writer.commit(commitMessages, SnapshotOperation.asProperties(SnapshotOperation.DELETE))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] This only covers row-rewrite deletes. Metadata-only deletes are optimized before this command runs (for example DELETE FROM t or a partition-only DELETE becomes TruncatePaimonTableWithFilterExec), and that path calls truncateTable / truncatePartitions without any snapshot properties. Those DELETE snapshots would still miss operation=DELETE, so we need to carry SnapshotOperation.DELETE through the truncate path as well.

…ELETE

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants