Skip to content

[SPARK-21529][SQL] Improve the error message for unsupported Hive union type#56775

Open
AgenticSpark wants to merge 3 commits into
apache:masterfrom
AgenticSpark:agenticspark/SPARK-21529-uniontype-error
Open

[SPARK-21529][SQL] Improve the error message for unsupported Hive union type#56775
AgenticSpark wants to merge 3 commits into
apache:masterfrom
AgenticSpark:agenticspark/SPARK-21529-uniontype-error

Conversation

@AgenticSpark

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Detect unsupported Hive uniontype<...> values when converting Hive FieldSchema types to Spark SQL types and raise a dedicated UNSUPPORTED_HIVE_TYPE error instead of the generic CANNOT_RECOGNIZE_HIVE_TYPE parser error.

Why are the changes needed?

Spark SQL does not support Hive union types. Today the failure message comes from the parser path and does not clearly identify that the Hive union type is unsupported.

Does this PR introduce any user-facing change?

Yes. Reading a Hive table column that uses uniontype<...> now reports UNSUPPORTED_HIVE_TYPE with the offending Hive type and column name.

How was this patch tested?

  • SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "core/testOnly *SparkThrowableSuite -- -t \"Error conditions are correctly formatted\""
  • build/sbt "hive/testOnly *HiveClientImplSuite"

Was this patch authored or co-authored using generative AI tooling?

Yes. GitHub Copilot assisted with preparing and validating this change.


def timestampNanosEpochNanosOverflowError(
value: TimestampNanosVal, isNtz: Boolean, sink: String): SparkArithmeticException = {
def parquetTimestampNanosOverflowError(

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR renames timestampNanosEpochNanosOverflowError(value, isNtz, sink) -> parquetTimestampNanosOverflowError(value, isNtz) and hardcodes "Parquet INT64", but does NOT update its 3 call sites (ArrowWriter.scala:406, :426; ParquetWriteSupport.scala:199). The build will likely NOT compile. Also, this rename is entirely unrelated to SPARK-21529; pure scope creep / accidental edit; should very likely be reverted.

…on type


Hive uniontype<...> is not supported by Spark SQL. Detect it on the Hive type parse-failure path and raise UNSUPPORTED_HIVE_TYPE so the unsupported type and column are reported directly.

Tests:
- SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "core/testOnly *SparkThrowableSuite -- -t \"Error conditions are correctly formatted\""
- build/sbt "hive/testOnly *HiveClientImplSuite"
@AgenticSpark AgenticSpark force-pushed the agenticspark/SPARK-21529-uniontype-error branch from 48f49ec to ee5fb78 Compare June 25, 2026 14:19
Apply the unsupported Hive union type check in HiveClientImpl so uniontype<...> raises UNSUPPORTED_HIVE_TYPE instead of falling through to the generic parser error.

Tests:
- build/sbt "hive/testOnly *HiveClientImplSuite"
Keep the generic Hive type parse fallback on its own line after the uniontype check.
@AgenticSpark

Copy link
Copy Markdown
Contributor Author

Thanks, fixed. I rebuilt the branch on current upstream master and removed the accidental unrelated QueryExecutionErrors.scala changes; the PR diff is back to the Hive union type change only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants