Skip to content

[AURON #2017] [BUILD] Add Spark 4.x support to dev/reformat script.#2018

Open
slfan1989 wants to merge 2 commits intoapache:masterfrom
slfan1989:auron-2017
Open

[AURON #2017] [BUILD] Add Spark 4.x support to dev/reformat script.#2018
slfan1989 wants to merge 2 commits intoapache:masterfrom
slfan1989:auron-2017

Conversation

@slfan1989
Copy link
Contributor

Which issue does this PR close?

Closes #2017

Rationale for this change

With Spark 4.0 and 4.1 support added to the project, the dev/reformat script needs to be updated to handle formatting and style checks for these new versions. Spark 4.x requires JDK 17+ and Scala 2.13, while Spark 3.x uses JDK 8 and Scala 2.12. The script should automatically switch between these environments.

What changes are included in this PR?

1. Fix Flink Maven profile

  • Before: -Pflink,flink-1.18
  • After: -Pflink-1.18
  • Reason: Avoid activating non-existent flink profile

2.Add Spark 4.x support

  • Add spark-4.0 and spark-4.1 to the version sweep list
  • Auto-switch to scala-2.13 profile for Spark 4.x (Spark 4.x requires Scala 2.13)
  • Auto-switch to JDK 17 for Spark 4.x (Spark 4.x requires JDK 17+)
  • Auto-switch back to JDK 8 for Spark 3.x versions

3.Update CI workflow (.github/workflows/style.yml)

  • Add JDK 17 setup alongside existing JDK 8
  • Enable style check to work with both Spark 3.x and Spark 4.x versions

Are there any user-facing changes?

No.

How was this patch tested?

Verified automatic JDK switching works for Spark 3.x (JDK 8) and Spark 4.x (JDK 17)

…ipt.

Signed-off-by: slfan1989 <slfan1989@apache.org>
…ipt.

Signed-off-by: slfan1989 <slfan1989@apache.org>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds support for Spark 4.0 and 4.1 to the dev/reformat script, enabling automated code formatting and style checks for the new Spark versions. The script is updated to automatically switch between JDK and Scala versions based on the Spark version being processed, as Spark 4.x requires JDK 17+ and Scala 2.13, while Spark 3.x uses JDK 8 and Scala 2.12.

Changes:

  • Fixed Maven profile references for Flink and Iceberg to use correct versioned profile IDs
  • Added Spark 4.0 and 4.1 to the version sweep array in the reformat script
  • Implemented automatic JDK and Scala version switching logic for Spark 4.x vs 3.x
  • Updated CI workflow to set up both JDK 8 and JDK 17
  • Applied formatting changes to Scala files (indentation adjustments from running the reformatter)

Reviewed changes

Copilot reviewed 3 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
dev/reformat Core changes: fixed Flink/Iceberg profiles, added Spark 4.0/4.1 to version array, added JDK/Scala switching logic
.github/workflows/style.yml Added JDK 17 setup to support Spark 4.x formatting in CI
thirdparty/auron-uniffle/src/main/scala/org/apache/spark/sql/execution/auron/shuffle/uniffle/AuronUniffleShuffleReader.scala Indentation formatting adjustment
spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeParquetInsertIntoHiveTableBase.scala Indentation formatting adjustment
spark-extension/src/main/scala/org/apache/spark/sql/auron/util/TaskContextHelper.scala Indentation formatting adjustment
spark-extension/src/main/scala/org/apache/spark/sql/auron/NativeHelper.scala Indentation formatting adjustment
spark-extension/src/main/scala/org/apache/spark/sql/auron/AuronSparkSessionExtension.scala Indentation formatting adjustment
spark-extension/src/main/scala/org/apache/spark/sql/auron/AuronConverters.scala Indentation formatting adjustments

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

do
if [[ $sparkver == spark-4.* ]]; then
SCALA_PROFILE=scala-2.13
export JAVA_HOME=$(/usr/libexec/java_home -v 17)
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reformat script switches to JDK 17 for Spark 4.x, but the TPC-DS CI workflow (.github/workflows/tpcds.yml:91-107) uses JDK 21 for Spark 4.0 and 4.1. While Spark 4.x requires JDK 17 as a minimum, using JDK 21 in CI provides better consistency and forward compatibility. Consider aligning the reformat script to use JDK 21 for Spark 4.x to match the testing environment, or document why different JDK versions are used for formatting vs testing.

Suggested change
export JAVA_HOME=$(/usr/libexec/java_home -v 17)
# Use JDK 21 for Spark 4.x to match the TPC-DS CI configuration
export JAVA_HOME=$(/usr/libexec/java_home -v 21)

Copilot uses AI. Check for mistakes.
Comment on lines +64 to +67
export JAVA_HOME=$(/usr/libexec/java_home -v 17)
else
SCALA_PROFILE=scala-2.12
export JAVA_HOME=$(/usr/libexec/java_home -v 1.8)
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The command /usr/libexec/java_home is macOS-specific and will fail on Linux (ubuntu-24.04) where the CI runs. GitHub Actions' setup-java already sets JAVA_HOME automatically when multiple Java versions are installed, so this line may not be necessary. If explicit JAVA_HOME switching is needed, consider a cross-platform approach like checking if the command exists first, or relying on GitHub Actions' automatic JAVA_HOME management. See dev/mvn-build-helper/build-native.sh:35-44 for an example of how the codebase handles platform-specific logic with uname checks.

Suggested change
export JAVA_HOME=$(/usr/libexec/java_home -v 17)
else
SCALA_PROFILE=scala-2.12
export JAVA_HOME=$(/usr/libexec/java_home -v 1.8)
if [[ "$(uname)" == "Darwin" ]] && [[ -x /usr/libexec/java_home ]]; then
export JAVA_HOME=$(/usr/libexec/java_home -v 17)
fi
else
SCALA_PROFILE=scala-2.12
if [[ "$(uname)" == "Darwin" ]] && [[ -x /usr/libexec/java_home ]]; then
export JAVA_HOME=$(/usr/libexec/java_home -v 1.8)
fi

Copilot uses AI. Check for mistakes.
Comment on lines +64 to +68
export JAVA_HOME=$(/usr/libexec/java_home -v 17)
else
SCALA_PROFILE=scala-2.12
export JAVA_HOME=$(/usr/libexec/java_home -v 1.8)
fi
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The command /usr/libexec/java_home is macOS-specific and will fail on Linux (ubuntu-24.04) where the CI runs. GitHub Actions' setup-java already sets JAVA_HOME automatically when multiple Java versions are installed, so this line may not be necessary. If explicit JAVA_HOME switching is needed, consider a cross-platform approach like checking if the command exists first, or relying on GitHub Actions' automatic JAVA_HOME management. See dev/mvn-build-helper/build-native.sh:35-44 for an example of how the codebase handles platform-specific logic with uname checks.

Suggested change
export JAVA_HOME=$(/usr/libexec/java_home -v 17)
else
SCALA_PROFILE=scala-2.12
export JAVA_HOME=$(/usr/libexec/java_home -v 1.8)
fi
else
SCALA_PROFILE=scala-2.12
fi
if [[ "$(uname)" == "Darwin" ]] && command -v /usr/libexec/java_home >/dev/null 2>&1; then
if [[ $sparkver == spark-4.* ]]; then
export JAVA_HOME=$(/usr/libexec/java_home -v 17)
else
export JAVA_HOME=$(/usr/libexec/java_home -v 1.8)
fi
fi

Copilot uses AI. Check for mistakes.
uses: actions/setup-java@v5
with:
distribution: 'adopt-hotspot'
java-version: 17
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JDK 17 setup is missing the cache configuration that JDK 8 has (cache: 'maven' on line 47). When using multiple setup-java actions, only the last one's cache is active. Consider consolidating both JDK setups into a single action using a matrix or ensuring cache is configured for the version that will be used most. Alternatively, if the reformat script needs to switch between JDKs, the cache configuration should be on the primary JDK version used.

Suggested change
java-version: 17
java-version: 17
cache: 'maven'

Copilot uses AI. Check for mistakes.
@slfan1989
Copy link
Contributor Author

@cxzl25 Thanks for reviewing this pr! I’ll continue refining this pr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUILD] Add Spark 4.x support to dev/reformat script

2 participants