Skip to content

[AURON #2011] History Server fails when BuildInfo event is missing.#2012

Open
slfan1989 wants to merge 2 commits intoapache:masterfrom
slfan1989:auron-2011
Open

[AURON #2011] History Server fails when BuildInfo event is missing.#2012
slfan1989 wants to merge 2 commits intoapache:masterfrom
slfan1989:auron-2011

Conversation

@slfan1989
Copy link
Contributor

Which issue does this PR close?

Closes #2011

Rationale for this change

The History Server plugin currently crashes during initialization when the AuronBuildInfoUIData record is missing from the KVStore. This causes applications without BuildInfo events to either fail plugin initialization or show no Auron tab.

What changes are included in this PR?

  1. AuronSQLAppStatusStore: Changed buildInfo() to return Option[AuronBuildInfoUIData], catching NoSuchElementException and other exceptions to return None instead of throwing
  2. AuronSQLHistoryServerPlugin: Removed the null check and always create the Auron tab, letting the UI handle empty state
  3. AuronAllExecutionsPage: Added buildInfoSummary() method to handle Option[AuronBuildInfoUIData]:
    • Some: displays BuildInfo table as before
    • None: shows user-friendly message "Auron build information is not available for this application."

Are there any user-facing changes?

Yes. When BuildInfo is not available:

  • Before: Plugin initialization fails or no Auron tab appears
  • After: Auron tab displays with a clear warning message explaining BuildInfo is unavailable

How was this patch tested?

  • Existing unit tests pass

…ing.

Signed-off-by: slfan1989 <slfan1989@apache.org>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes an issue where the History Server plugin crashes during initialization when BuildInfo event data is missing from the KVStore. The fix implements graceful handling of missing BuildInfo by converting the buildInfo() method to return an Option type and displaying a user-friendly message in the UI when data is unavailable.

Changes:

  • Modified AuronSQLAppStatusStore.buildInfo() to return Option[AuronBuildInfoUIData] with exception handling
  • Removed conditional tab creation check in AuronSQLHistoryServerPlugin, allowing the Auron tab to always be created
  • Added buildInfoSummary() method in AuronAllExecutionsPage to handle both present and absent BuildInfo scenarios with appropriate UI rendering

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
auron-spark-ui/src/main/scala/org/apache/spark/sql/execution/ui/AuronSQLAppStatusStore.scala Changed buildInfo() to return Option[AuronBuildInfoUIData] with try-catch handling for missing records
auron-spark-ui/src/main/scala/org/apache/spark/sql/execution/ui/AuronSQLHistoryServerPlugin.scala Removed null check to allow unconditional tab creation, delegating empty state handling to the UI layer
auron-spark-ui/src/main/scala/org/apache/spark/sql/execution/ui/AuronAllExecutionsPage.scala Refactored render methods to use new buildInfoSummary() helper that displays a warning message when BuildInfo is unavailable

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Option(store.read(kClass, kClass.getName))
} catch {
case _: NoSuchElementException => None
case NonFatal(_) => None
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NonFatal exception is being silently swallowed without logging. This makes debugging difficult if unexpected errors occur when reading from the KVStore. Consider logging the exception at a warning or debug level to help diagnose issues. For example: case NonFatal(e) => logWarning(s"Failed to read BuildInfo from KVStore", e); None

Note that this would require AuronSQLAppStatusStore to extend the Logging trait.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for your valuable suggestion. I will further refine and improve the relevant content according to your comments.

Copy link
Contributor

@ShreyeshArangath ShreyeshArangath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for fixing this.

…ing.

Signed-off-by: slfan1989 <slfan1989@apache.org>
@slfan1989
Copy link
Contributor Author

slfan1989 commented Feb 26, 2026

@cxzl25 @richox We frequently encountered Rust crashes when running TPC-DS. I’ve submitted a fix in PR #2023. Could you please take a look and review it?

https://github.com/apache/auron/actions/runs/22424571173/job/64931203277?pr=2012

thread 'auron-native-stage-512-part-0-tid-379' panicked at native-engine/datafusion-ext-plans/src/common/execution_context.rs:723:35:
output_with_sender: send error: channel closed
stack backtrace:
   0: __rustc::rust_begin_unwind
             at /rustc/50aa04180709189a03dde5fd1c05751b2625ed37/library/std/src/panicking.rs:697:5
   1: core::panicking::panic_fmt
             at /rustc/50aa04180709189a03dde5fd1c05751b2625ed37/library/core/src/panicking.rs:75:14
   2: datafusion_ext_plans::common::execution_context::WrappedSender<T>::send::{{closure}}::{{closure}}
   3: datafusion_ext_plans::common::execution_context::WrappedSender<T>::send::{{closure}}
   4: datafusion_ext_plans::sort_exec::send_output_batch::{{closure}}
   5: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::future::future::Future>::poll
   6: datafusion_ext_plans::common::execution_context::ExecutionContext::output_with_sender_impl::{{closure}}
   7: tokio::runtime::task::core::Core<T,S>::poll
   8: tokio::runtime::task::harness::poll_future
.......

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

History Server fails when BuildInfo event is missing

3 participants