Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .github/workflows/style.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,5 +46,11 @@ jobs:
java-version: 8
cache: 'maven'
check-latest: false
- name: Setup JDK 17
uses: actions/setup-java@v5
with:
distribution: 'adopt-hotspot'
java-version: 17
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JDK 17 setup is missing the cache configuration that JDK 8 has (cache: 'maven' on line 47). When using multiple setup-java actions, only the last one's cache is active. Consider consolidating both JDK setups into a single action using a matrix or ensuring cache is configured for the version that will be used most. Alternatively, if the reformat script needs to switch between JDKs, the cache configuration should be on the primary JDK version used.

Suggested change
java-version: 17
java-version: 17
cache: 'maven'

Copilot uses AI. Check for mistakes.
check-latest: false
- run: |
./dev/reformat --check
11 changes: 9 additions & 2 deletions dev/reformat
Original file line number Diff line number Diff line change
Expand Up @@ -52,12 +52,19 @@ fi
sparkver=spark-3.5
for celebornver in celeborn-0.5 celeborn-0.6
do
run_maven -P"${sparkver}" -Pceleborn,"${celebornver}" -Puniffle,uniffle-0.10 -Ppaimon,paimon-1.2 -Pflink,flink-1.18 -Piceberg,iceberg-1.9
run_maven -P"${sparkver}" -Pceleborn,"${celebornver}" -Puniffle,uniffle-0.10 -Ppaimon,paimon-1.2 -Pflink-1.18 -Piceberg-1.9

done

sparkvers=(spark-3.0 spark-3.1 spark-3.2 spark-3.3 spark-3.4)
sparkvers=(spark-3.0 spark-3.1 spark-3.2 spark-3.3 spark-3.4 spark-4.0 spark-4.1)
for sparkver in "${sparkvers[@]}"
do
if [[ $sparkver == spark-4.* ]]; then
SCALA_PROFILE=scala-2.13
export JAVA_HOME=$(/usr/libexec/java_home -v 17)
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reformat script switches to JDK 17 for Spark 4.x, but the TPC-DS CI workflow (.github/workflows/tpcds.yml:91-107) uses JDK 21 for Spark 4.0 and 4.1. While Spark 4.x requires JDK 17 as a minimum, using JDK 21 in CI provides better consistency and forward compatibility. Consider aligning the reformat script to use JDK 21 for Spark 4.x to match the testing environment, or document why different JDK versions are used for formatting vs testing.

Suggested change
export JAVA_HOME=$(/usr/libexec/java_home -v 17)
# Use JDK 21 for Spark 4.x to match the TPC-DS CI configuration
export JAVA_HOME=$(/usr/libexec/java_home -v 21)

Copilot uses AI. Check for mistakes.
else
SCALA_PROFILE=scala-2.12
export JAVA_HOME=$(/usr/libexec/java_home -v 1.8)
Comment on lines +64 to +67
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The command /usr/libexec/java_home is macOS-specific and will fail on Linux (ubuntu-24.04) where the CI runs. GitHub Actions' setup-java already sets JAVA_HOME automatically when multiple Java versions are installed, so this line may not be necessary. If explicit JAVA_HOME switching is needed, consider a cross-platform approach like checking if the command exists first, or relying on GitHub Actions' automatic JAVA_HOME management. See dev/mvn-build-helper/build-native.sh:35-44 for an example of how the codebase handles platform-specific logic with uname checks.

Suggested change
export JAVA_HOME=$(/usr/libexec/java_home -v 17)
else
SCALA_PROFILE=scala-2.12
export JAVA_HOME=$(/usr/libexec/java_home -v 1.8)
if [[ "$(uname)" == "Darwin" ]] && [[ -x /usr/libexec/java_home ]]; then
export JAVA_HOME=$(/usr/libexec/java_home -v 17)
fi
else
SCALA_PROFILE=scala-2.12
if [[ "$(uname)" == "Darwin" ]] && [[ -x /usr/libexec/java_home ]]; then
export JAVA_HOME=$(/usr/libexec/java_home -v 1.8)
fi

Copilot uses AI. Check for mistakes.
fi
Comment on lines +64 to +68
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The command /usr/libexec/java_home is macOS-specific and will fail on Linux (ubuntu-24.04) where the CI runs. GitHub Actions' setup-java already sets JAVA_HOME automatically when multiple Java versions are installed, so this line may not be necessary. If explicit JAVA_HOME switching is needed, consider a cross-platform approach like checking if the command exists first, or relying on GitHub Actions' automatic JAVA_HOME management. See dev/mvn-build-helper/build-native.sh:35-44 for an example of how the codebase handles platform-specific logic with uname checks.

Suggested change
export JAVA_HOME=$(/usr/libexec/java_home -v 17)
else
SCALA_PROFILE=scala-2.12
export JAVA_HOME=$(/usr/libexec/java_home -v 1.8)
fi
else
SCALA_PROFILE=scala-2.12
fi
if [[ "$(uname)" == "Darwin" ]] && command -v /usr/libexec/java_home >/dev/null 2>&1; then
if [[ $sparkver == spark-4.* ]]; then
export JAVA_HOME=$(/usr/libexec/java_home -v 17)
else
export JAVA_HOME=$(/usr/libexec/java_home -v 1.8)
fi
fi

Copilot uses AI. Check for mistakes.
run_maven -P"${sparkver}"
done
Original file line number Diff line number Diff line change
Expand Up @@ -424,7 +424,7 @@ object AuronConverters extends Logging {
assert(
!exec.requiredSchema.exists(e => existTimestampType(e.dataType)),
s"Parquet scan with timestamp type is not supported for table: ${tableIdentifier
.getOrElse("unknown")}. " +
.getOrElse("unknown")}. " +
"Set spark.auron.enable.scan.parquet.timestamp=true to enable timestamp support " +
"or remove timestamp columns from the query.")
}
Expand All @@ -435,15 +435,15 @@ object AuronConverters extends Logging {
assert(
!exec.requiredSchema.exists(e => existTimestampType(e.dataType)),
s"ORC scan with timestamp type is not supported for tableIdentifier: ${tableIdentifier
.getOrElse("unknown")}. " +
.getOrElse("unknown")}. " +
"Set spark.auron.enable.scan.orc.timestamp=true to enable timestamp support " +
"or remove timestamp columns from the query.")
}
addRenameColumnsExec(Shims.get.createNativeOrcScanExec(exec))
case p =>
throw new NotImplementedError(
s"Cannot convert FileSourceScanExec tableIdentifier: ${tableIdentifier.getOrElse(
"unknown")}, class: ${p.getClass.getName}")
"unknown")}, class: ${p.getClass.getName}")
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ case class AuronColumnarOverrides(sparkSession: SparkSession) extends ColumnarRu
dumpSimpleSparkPlanTreeNode(sparkPlanTransformed)

logInfo(s"Transformed spark plan after preColumnarTransitions:\n${sparkPlanTransformed
.treeString(verbose = true, addSuffix = true)}")
.treeString(verbose = true, addSuffix = true)}")

// post-transform
Shims.get.postTransform(sparkPlanTransformed, sparkSession.sparkContext)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ object NativeHelper extends Logging {
val heapMemory = Runtime.getRuntime.maxMemory()
val offheapMemory = totalMemory - heapMemory
logWarning(s"memory total: ${Utils.bytesToString(totalMemory)}, onheap: ${Utils.bytesToString(
heapMemory)}, offheap: ${Utils.bytesToString(offheapMemory)}")
heapMemory)}, offheap: ${Utils.bytesToString(offheapMemory)}")
offheapMemory
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ object TaskContextHelper extends Logging {
val thread = Thread.currentThread()
val threadName = if (context != null) {
s"auron native task ${context.partitionId()}.${context.attemptNumber()} in stage ${context
.stageId()}.${context.stageAttemptNumber()} (TID ${context.taskAttemptId()})"
.stageId()}.${context.stageAttemptNumber()} (TID ${context.taskAttemptId()})"
} else {
"auron native task " + thread.getName
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -69,9 +69,10 @@ abstract class NativeParquetInsertIntoHiveTableBase(
.filterKeys(Set("stage_id", "output_rows", "elapsed_compute"))
.toSeq
:+ ("io_time", SQLMetrics.createNanoTimingMetric(sparkContext, "Native.io_time"))
:+ ("bytes_written",
SQLMetrics
.createSizeMetric(sparkContext, "Native.bytes_written")): _*)
:+ (
"bytes_written",
SQLMetrics
.createSizeMetric(sparkContext, "Native.bytes_written")): _*)

def check(): Unit = {
val hadoopConf = sparkContext.hadoopConfiguration
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -166,7 +166,7 @@ class AuronUniffleShuffleReader[K, C](
}
if (!emptyPartitionIds.isEmpty) {
logDebug(s"Found ${emptyPartitionIds
.size()} empty shuffle partitions: ${emptyPartitionIds.asScala.mkString(",")}")
.size()} empty shuffle partitions: ${emptyPartitionIds.asScala.mkString(",")}")
}
iterators = shuffleDataIterList.iterator()
if (iterators.hasNext) {
Expand Down
Loading