[AURON #2017] [BUILD] Add Spark 4.x support to dev/reformat script.#2018
[AURON #2017] [BUILD] Add Spark 4.x support to dev/reformat script.#2018slfan1989 wants to merge 2 commits intoapache:masterfrom
Conversation
…ipt. Signed-off-by: slfan1989 <slfan1989@apache.org>
…ipt. Signed-off-by: slfan1989 <slfan1989@apache.org>
There was a problem hiding this comment.
Pull request overview
This pull request adds support for Spark 4.0 and 4.1 to the dev/reformat script, enabling automated code formatting and style checks for the new Spark versions. The script is updated to automatically switch between JDK and Scala versions based on the Spark version being processed, as Spark 4.x requires JDK 17+ and Scala 2.13, while Spark 3.x uses JDK 8 and Scala 2.12.
Changes:
- Fixed Maven profile references for Flink and Iceberg to use correct versioned profile IDs
- Added Spark 4.0 and 4.1 to the version sweep array in the reformat script
- Implemented automatic JDK and Scala version switching logic for Spark 4.x vs 3.x
- Updated CI workflow to set up both JDK 8 and JDK 17
- Applied formatting changes to Scala files (indentation adjustments from running the reformatter)
Reviewed changes
Copilot reviewed 3 out of 8 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| dev/reformat | Core changes: fixed Flink/Iceberg profiles, added Spark 4.0/4.1 to version array, added JDK/Scala switching logic |
| .github/workflows/style.yml | Added JDK 17 setup to support Spark 4.x formatting in CI |
| thirdparty/auron-uniffle/src/main/scala/org/apache/spark/sql/execution/auron/shuffle/uniffle/AuronUniffleShuffleReader.scala | Indentation formatting adjustment |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeParquetInsertIntoHiveTableBase.scala | Indentation formatting adjustment |
| spark-extension/src/main/scala/org/apache/spark/sql/auron/util/TaskContextHelper.scala | Indentation formatting adjustment |
| spark-extension/src/main/scala/org/apache/spark/sql/auron/NativeHelper.scala | Indentation formatting adjustment |
| spark-extension/src/main/scala/org/apache/spark/sql/auron/AuronSparkSessionExtension.scala | Indentation formatting adjustment |
| spark-extension/src/main/scala/org/apache/spark/sql/auron/AuronConverters.scala | Indentation formatting adjustments |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| do | ||
| if [[ $sparkver == spark-4.* ]]; then | ||
| SCALA_PROFILE=scala-2.13 | ||
| export JAVA_HOME=$(/usr/libexec/java_home -v 17) |
There was a problem hiding this comment.
The reformat script switches to JDK 17 for Spark 4.x, but the TPC-DS CI workflow (.github/workflows/tpcds.yml:91-107) uses JDK 21 for Spark 4.0 and 4.1. While Spark 4.x requires JDK 17 as a minimum, using JDK 21 in CI provides better consistency and forward compatibility. Consider aligning the reformat script to use JDK 21 for Spark 4.x to match the testing environment, or document why different JDK versions are used for formatting vs testing.
| export JAVA_HOME=$(/usr/libexec/java_home -v 17) | |
| # Use JDK 21 for Spark 4.x to match the TPC-DS CI configuration | |
| export JAVA_HOME=$(/usr/libexec/java_home -v 21) |
| export JAVA_HOME=$(/usr/libexec/java_home -v 17) | ||
| else | ||
| SCALA_PROFILE=scala-2.12 | ||
| export JAVA_HOME=$(/usr/libexec/java_home -v 1.8) |
There was a problem hiding this comment.
The command /usr/libexec/java_home is macOS-specific and will fail on Linux (ubuntu-24.04) where the CI runs. GitHub Actions' setup-java already sets JAVA_HOME automatically when multiple Java versions are installed, so this line may not be necessary. If explicit JAVA_HOME switching is needed, consider a cross-platform approach like checking if the command exists first, or relying on GitHub Actions' automatic JAVA_HOME management. See dev/mvn-build-helper/build-native.sh:35-44 for an example of how the codebase handles platform-specific logic with uname checks.
| export JAVA_HOME=$(/usr/libexec/java_home -v 17) | |
| else | |
| SCALA_PROFILE=scala-2.12 | |
| export JAVA_HOME=$(/usr/libexec/java_home -v 1.8) | |
| if [[ "$(uname)" == "Darwin" ]] && [[ -x /usr/libexec/java_home ]]; then | |
| export JAVA_HOME=$(/usr/libexec/java_home -v 17) | |
| fi | |
| else | |
| SCALA_PROFILE=scala-2.12 | |
| if [[ "$(uname)" == "Darwin" ]] && [[ -x /usr/libexec/java_home ]]; then | |
| export JAVA_HOME=$(/usr/libexec/java_home -v 1.8) | |
| fi |
| export JAVA_HOME=$(/usr/libexec/java_home -v 17) | ||
| else | ||
| SCALA_PROFILE=scala-2.12 | ||
| export JAVA_HOME=$(/usr/libexec/java_home -v 1.8) | ||
| fi |
There was a problem hiding this comment.
The command /usr/libexec/java_home is macOS-specific and will fail on Linux (ubuntu-24.04) where the CI runs. GitHub Actions' setup-java already sets JAVA_HOME automatically when multiple Java versions are installed, so this line may not be necessary. If explicit JAVA_HOME switching is needed, consider a cross-platform approach like checking if the command exists first, or relying on GitHub Actions' automatic JAVA_HOME management. See dev/mvn-build-helper/build-native.sh:35-44 for an example of how the codebase handles platform-specific logic with uname checks.
| export JAVA_HOME=$(/usr/libexec/java_home -v 17) | |
| else | |
| SCALA_PROFILE=scala-2.12 | |
| export JAVA_HOME=$(/usr/libexec/java_home -v 1.8) | |
| fi | |
| else | |
| SCALA_PROFILE=scala-2.12 | |
| fi | |
| if [[ "$(uname)" == "Darwin" ]] && command -v /usr/libexec/java_home >/dev/null 2>&1; then | |
| if [[ $sparkver == spark-4.* ]]; then | |
| export JAVA_HOME=$(/usr/libexec/java_home -v 17) | |
| else | |
| export JAVA_HOME=$(/usr/libexec/java_home -v 1.8) | |
| fi | |
| fi |
| uses: actions/setup-java@v5 | ||
| with: | ||
| distribution: 'adopt-hotspot' | ||
| java-version: 17 |
There was a problem hiding this comment.
The JDK 17 setup is missing the cache configuration that JDK 8 has (cache: 'maven' on line 47). When using multiple setup-java actions, only the last one's cache is active. Consider consolidating both JDK setups into a single action using a matrix or ensuring cache is configured for the version that will be used most. Alternatively, if the reformat script needs to switch between JDKs, the cache configuration should be on the primary JDK version used.
| java-version: 17 | |
| java-version: 17 | |
| cache: 'maven' |
|
@cxzl25 Thanks for reviewing this pr! I’ll continue refining this pr. |
Which issue does this PR close?
Closes #2017
Rationale for this change
With Spark 4.0 and 4.1 support added to the project, the
dev/reformatscript needs to be updated to handle formatting and style checks for these new versions. Spark 4.x requires JDK 17+ and Scala 2.13, while Spark 3.x uses JDK 8 and Scala 2.12. The script should automatically switch between these environments.What changes are included in this PR?
1. Fix Flink Maven profile
2.Add Spark 4.x support
3.Update CI workflow (.github/workflows/style.yml)
Are there any user-facing changes?
No.
How was this patch tested?
Verified automatic JDK switching works for Spark 3.x (JDK 8) and Spark 4.x (JDK 17)