[AURON #2015] Add Native Scan Support for Apache Iceberg Copy-On-Write Tables. by slfan1989 · Pull Request #2016 · apache/auron

slfan1989 · 2026-02-18T01:11:05Z

Which issue does this PR close?

Closes #2015

Rationale for this change

This PR adds native scan support for Apache Iceberg Copy-On-Write (COW) tables to improve query performance. Currently, Auron lacks direct integration with Iceberg, forcing all Iceberg queries to use Spark's native execution path, missing opportunities for native engine acceleration.

Key Motivations:

Enable Auron's native execution engine to read Iceberg tables directly
Leverage native performance optimizations for Iceberg COW tables
Provide automatic fallback to Spark scan for unsupported scenarios
Lay the foundation for future Iceberg feature enhancements (MOR tables, pruning predicates, etc.)

What changes are included in this PR?

Core Implementation:

IcebergConvertProvider - SPI extension point that detects Iceberg scans and decides whether to use native execution
IcebergScanSupport - Decision logic that validates scan plans and checks for COW table eligibility
NativeIcebergTableScanExec - Native execution node that converts Iceberg FileScanTask to native scan plans

Build & Configuration:

Updated pom.xml with Iceberg version management and Maven enforcer rules
Modified auron-build.sh to support Iceberg build parameters
Added configuration option: spark.auron.enable.iceberg.scan (default: true)

Supported Features:

Iceberg COW tables (Parquet and ORC formats)
Projection pushdown (column pruning)
Partitioned and non-partitioned tables
Automatic fallback for unsupported scenarios

Version Support:

Spark: 3.4, 3.5, 4.0 only
Iceberg: 1.10.1 only (enforced by Maven)

Are there any user-facing changes?

No Breaking Changes: Existing functionality remains unchanged. Iceberg support is additive and disabled by default in unsupported scenarios.

How was this patch tested?

Unit & Integration Tests:

Added 10 integration test cases in AuronIcebergIntegrationSuite:
- Simple COW table scan
- Projection pushdown
- Partitioned table with partition filter
- Orc format support
- Empty table handling
- Residual filters fallback
- Metadata columns fallback
- Decimal type fallback
- Delete files (MOR) fallback
- Configuration toggle functionality

Test Environment:

Spark versions: 3.4.4, 3.5.8, 4.0.2
Iceberg version: 1.10.1
File formats: Parquet, ORC
Scala versions: 2.12, 2.13

…n-Write Tables. Signed-off-by: slfan1989 <slfan1989@apache.org>

slfan1989 · 2026-02-24T01:25:37Z

@cxzl25 @richox I’ve submitted the first version of the Iceberg-support code. It can now basically read COW tables, and I’ve added unit tests that pass in CI. If you have some time, could you please take a look and share any feedback? Thank you very much!

cxzl25 · 2026-02-24T14:57:46Z

dev/reformat

+# Check or format all code, including third-party code, with spark-3.4
 sparkver=spark-3.5


Should the comment be spark-3.5?

cxzl25 · 2026-02-24T14:58:14Z

dev/reformat

 done

-sparkvers=(spark-3.0 spark-3.1 spark-3.2 spark-3.3 spark-3.4)
+sparkvers=(spark-3.0 spark-3.1 spark-3.2 spark-3.3 spark-3.5)


This change seems unnecessary.

Thank you for taking the time to look into this issue. From my perspective, this change indeed shouldn’t have appeared in #2016. I’ve submitted #2018 to address it—when you have a moment, could you please take a look? Thanks very much!

Copilot

Pull request overview

This PR adds native scan support for Apache Iceberg Copy-On-Write (COW) tables to the Auron execution engine, enabling direct reads of Iceberg data files through Auron's native path for improved performance. The implementation follows the established SPI (Service Provider Interface) pattern used by other data source integrations like Paimon, with automatic fallback to Spark's execution path for unsupported scenarios.

Changes:

Adds IcebergConvertProvider SPI extension to detect and convert Iceberg BatchScanExec nodes to native execution
Implements validation logic to determine COW table eligibility (no delete files, no metadata columns, supported data types)
Creates NativeIcebergTableScanExec to execute native Iceberg scans with Parquet/ORC format support

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
thirdparty/auron-iceberg/src/main/scala/org/apache/spark/sql/auron/iceberg/IcebergConvertProvider.scala	SPI provider that checks version compatibility and delegates to IcebergScanSupport
thirdparty/auron-iceberg/src/main/scala/org/apache/spark/sql/auron/iceberg/IcebergScanSupport.scala	Core validation logic to determine native scan eligibility and extract FileScanTask metadata via reflection
thirdparty/auron-iceberg/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeIcebergTableScanExec.scala	Native execution node that converts Iceberg tasks to FilePartitions and generates protobuf scan plans
thirdparty/auron-iceberg/src/test/scala/org/apache/auron/iceberg/AuronIcebergIntegrationSuite.scala	Integration tests covering COW tables, projections, partitioning, ORC format, and fallback scenarios
thirdparty/auron-iceberg/src/test/scala/org/apache/auron/iceberg/BaseAuronIcebergSuite.scala	Test base configuration with Auron and Iceberg extensions enabled
thirdparty/auron-iceberg/src/main/resources/META-INF/services/org.apache.spark.sql.auron.AuronConvertProvider	SPI registration file for IcebergConvertProvider
thirdparty/auron-iceberg/pom.xml	Maven enforcer rules to validate Iceberg version (1.10.1) and Spark version (3.4-4.0) compatibility
spark-extension/src/main/java/org/apache/auron/spark/configuration/SparkAuronConfiguration.java	Adds ENABLE_ICEBERG_SCAN configuration option
spark-extension/src/main/scala/org/apache/spark/sql/auron/AuronConverters.scala	Adds default value handling for shuffle manager configuration
spark-extension/pom.xml	Adds arrow-memory-core and arrow-memory-netty dependencies
pom.xml	Adds Iceberg version properties and enforcer rules for all Spark version profiles
auron-build.sh	Updates Iceberg version support to 1.10.1 and Spark version range to 3.4-4.0
dev/reformat	Updates formatting script to include Iceberg module with version 1.10.1
.github/workflows/iceberg.yml	CI workflow for testing Iceberg integration across Spark 3.4, 3.5, 4.0 with multiple Java versions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-24T15:03:18Z

dev/reformat

 done

-sparkvers=(spark-3.0 spark-3.1 spark-3.2 spark-3.3 spark-3.4)
+sparkvers=(spark-3.0 spark-3.1 spark-3.2 spark-3.3 spark-3.5)


The sparkvers array should include "spark-3.4" to match the array iteration below. The comment on line 51 and the code on line 52 both reference spark-3.5, but the array only includes spark-3.5 once. It appears spark-3.4 was removed from the array when it should have been spark-3.5 that was removed since it's already handled in the celeborn loop above. This would cause the code to not be formatted for spark-3.4.

Suggested change

sparkvers=(spark-3.0 spark-3.1 spark-3.2 spark-3.3 spark-3.5)

sparkvers=(spark-3.0 spark-3.1 spark-3.2 spark-3.3 spark-3.4)

Copilot · 2026-02-24T15:03:19Z

...rty/auron-iceberg/src/test/scala/org/apache/auron/iceberg/AuronIcebergIntegrationSuite.scala

+      }
+    }
+  }
+


The PR description claims "Empty table handling" is tested in the suite, but there is no explicit test case for empty tables. While the code in IcebergScanSupport.scala (line 61-62) handles empty partitions, adding an explicit integration test would provide better coverage and documentation of this behavior. Consider adding a test like: test("iceberg native scan is applied for empty COW table") that creates an empty table and verifies NativeIcebergTableScan is used.

Suggested change

test("iceberg native scan is applied for empty COW table") {

withTable("local.db.t_empty") {

sql("""

|create table local.db.t_empty (id int, v string)

|using iceberg

|tblproperties (

| 'format-version' = '2'

|)

|""".stripMargin)

val df = sql("select * from local.db.t_empty")

df.collect()

val plan = df.queryExecution.executedPlan.toString()

assert(plan.contains("NativeIcebergTableScan"))

}

}

Copilot · 2026-02-24T15:03:19Z

...rty/auron-iceberg/src/main/scala/org/apache/spark/sql/auron/iceberg/IcebergScanSupport.scala

+
+      Some(IcebergPartitionView(fileTasks))
+    } catch {
+      case _: ReflectiveOperationException => None


Catching all ReflectiveOperationException here silently swallows important debugging information. When reflection fails, it could be due to Iceberg version incompatibility or internal API changes. Consider logging a warning with the exception details to help diagnose issues in production environments, similar to how IcebergConvertProvider logs warnings for unsupported Spark versions.

Suggested change

case _: ReflectiveOperationException => None

case e: ReflectiveOperationException =>

logWarning(

s"Failed to read Iceberg SparkInputPartition via reflection; falling back to None. " +

s"This may indicate an Iceberg or Spark version incompatibility.",

e)

None

Copilot · 2026-02-24T15:03:20Z

...rty/auron-iceberg/src/main/scala/org/apache/spark/sql/auron/iceberg/IcebergScanSupport.scala

+          Seq.empty
+        }
+      } catch {
+        case _: Throwable => Seq.empty


The code at line 111 swallows all exceptions from the reflection operation. For better observability, consider logging exceptions similar to the pattern used in NativeRDD.scala where setAccessible failures are logged. This would help diagnose issues when accessing Iceberg's internal partition structure fails.

Suggested change

case _: Throwable => Seq.empty

case t: Throwable =>

logWarning(

s"Failed to plan input partitions via DataSource V2 batch API for " +

s"${exec.getClass.getName}; falling back to reflective methods.",

t)

Seq.empty

Copilot · 2026-02-24T15:03:20Z

dev/reformat

 fi

-# Check or format all code, including third-party code, with spark-3.5
+# Check or format all code, including third-party code, with spark-3.4


The comment says "spark-3.4" but the code uses "spark-3.5". Update the comment to accurately reflect the Spark version being used.

Suggested change

# Check or format all code, including third-party code, with spark-3.4

# Check or format all code, including third-party code, with spark-3.5

github-actions bot added spark build thirdparty-iceberg infra common labels Feb 18, 2026

[AURON apache#2015] Add Native Scan Support for Apache Iceberg Copy-O…

41e9318

…n-Write Tables. Signed-off-by: slfan1989 <slfan1989@apache.org>

slfan1989 force-pushed the auron-2015 branch from f954ce5 to 41e9318 Compare February 18, 2026 02:29

github-actions bot removed the common label Feb 18, 2026

slfan1989 added 5 commits February 18, 2026 10:42

[AURON apache#2015] Add Native Scan Support for Apache Iceberg Copy-O…

af511fb

…n-Write Tables. Signed-off-by: slfan1989 <slfan1989@apache.org>

[AURON apache#2015] Add Native Scan Support for Apache Iceberg Copy-O…

581d87c

…n-Write Tables. Signed-off-by: slfan1989 <slfan1989@apache.org>

[AURON apache#2015] Add Native Scan Support for Apache Iceberg Copy-O…

d107323

…n-Write Tables. Signed-off-by: slfan1989 <slfan1989@apache.org>

[AURON apache#2015] Add Native Scan Support for Apache Iceberg Copy-O…

20e7f29

…n-Write Tables. Signed-off-by: slfan1989 <slfan1989@apache.org>

[AURON apache#2015] Add Native Scan Support for Apache Iceberg Copy-O…

e09a8bd

…n-Write Tables. Signed-off-by: slfan1989 <slfan1989@apache.org>

github-actions bot added the dev-tools label Feb 18, 2026

[AURON apache#2015] Add Native Scan Support for Apache Iceberg Copy-O…

eb482ae

…n-Write Tables. Signed-off-by: slfan1989 <slfan1989@apache.org>

cxzl25 requested a review from Copilot February 24, 2026 14:55

Copilot started reviewing on behalf of cxzl25 February 24, 2026 14:56 View session

cxzl25 reviewed Feb 24, 2026

View reviewed changes

Copilot AI reviewed Feb 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AURON #2015] Add Native Scan Support for Apache Iceberg Copy-On-Write Tables.#2016

[AURON #2015] Add Native Scan Support for Apache Iceberg Copy-On-Write Tables.#2016
slfan1989 wants to merge 7 commits intoapache:masterfrom
slfan1989:auron-2015

slfan1989 commented Feb 18, 2026 •

edited

Loading

Uh oh!

slfan1989 commented Feb 24, 2026

Uh oh!

cxzl25 Feb 24, 2026

Uh oh!

cxzl25 Feb 24, 2026

Uh oh!

slfan1989 Feb 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		# Check or format all code, including third-party code, with spark-3.4
		sparkver=spark-3.5

	sparkvers=(spark-3.0 spark-3.1 spark-3.2 spark-3.3 spark-3.5)
	sparkvers=(spark-3.0 spark-3.1 spark-3.2 spark-3.3 spark-3.4)

+  test("iceberg native scan is applied for empty COW table") {
+    withTable("local.db.t_empty") {
+      sql("""
+            |create table local.db.t_empty (id int, v string)
+            |using iceberg
+            |tblproperties (
+            |  'format-version' = '2'
+            |)
+            |""".stripMargin)
+      val df = sql("select * from local.db.t_empty")
+      df.collect()
+      val plan = df.queryExecution.executedPlan.toString()
+      assert(plan.contains("NativeIcebergTableScan"))
+    }
+  }

-      case _: ReflectiveOperationException => None
+      case e: ReflectiveOperationException =>
+        logWarning(
+          s"Failed to read Iceberg SparkInputPartition via reflection; falling back to None. " +
+            s"This may indicate an Iceberg or Spark version incompatibility.",
+          e)
+        None

Conversation

slfan1989 commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

Key Motivations:

What changes are included in this PR?

Core Implementation:

Build & Configuration:

Supported Features:

Version Support:

Are there any user-facing changes?

How was this patch tested?

Unit & Integration Tests:

Test Environment:

Uh oh!

slfan1989 commented Feb 24, 2026

Uh oh!

cxzl25 Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

cxzl25 Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

slfan1989 Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

slfan1989 commented Feb 18, 2026 •

edited

Loading