Skip to content

test: Start phase1 benchmarks#3081

Draft
alpe wants to merge 1 commit intomainfrom
alex/bench_phase1_start
Draft

test: Start phase1 benchmarks#3081
alpe wants to merge 1 commit intomainfrom
alex/bench_phase1_start

Conversation

@alpe
Copy link
Contributor

@alpe alpe commented Feb 18, 2026

Summary

Introduces the first phase of function-level benchmarking for the block executor's ProduceBlock path

  • Adds BenchmarkProduceBlock with sub-benchmarks for "empty batch" and "single tx" scenarios.
  • CI integration

@github-actions
Copy link
Contributor

github-actions bot commented Feb 18, 2026

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedFeb 19, 2026, 11:30 AM

@claude
Copy link
Contributor

claude bot commented Feb 18, 2026

Claude finished @alpe's task in 2m 20s —— View job


Code Review

  • Gather context and read changed files
  • Analyze benchmark test file
  • Analyze CI workflow file
  • Post comprehensive review

Good addition for phase 1 benchmarking. The structure is solid overall. A few issues worth addressing before merge.


block/internal/executing/executor_benchmark_test.go

Non-deterministic StartTime (line 78) — Medium

StartTime: time.Now().Add(-time.Hour),

Using time.Now() makes benchmark results non-reproducible across runs and machines. Prefer a fixed timestamp so any timing-sensitive assertions in the executor remain stable:

StartTime: time.Date(2023, 1, 1, 0, 0, 0, 0, time.UTC),

Fix this →


lastBatchTime closure is not concurrency-safe (lines 109–118) — Low

lastBatchTime := gen.StartTime
mockSeq.EXPECT()...RunAndReturn(func(...) {
    lastBatchTime = lastBatchTime.Add(...)  // shared mutable state
    ...
})

The variable is captured and mutated inside the mock callback without any synchronisation. This is safe for the current sequential benchmark loop, but will silently race if -benchtime or a future b.RunParallel call exercises concurrent invocations. Adding a note or using an atomic.Int64 for the counter would guard against future breakage. At minimum, a //nolint:govet or a comment is warranted.


Duplicated signer creation vs. buildTestSigner — Low

buildTestSigner in executor_logic_test.go (line 32) is *testing.T-only, so the benchmark re-implements the same key generation inline (lines 63–69). Promoting buildTestSigner to accept testing.TB would eliminate the duplication and let both unit tests and benchmarks share the helper. This is a minor DX improvement.

Fix this →


Non-deterministic sub-benchmark ordering — Low

specs := map[string]struct{ txs [][]byte }{...}
for name, spec := range specs { ... }

Go map iteration is randomised. For benchmarks this is usually harmless, but a []struct{ name string; txs [][]byte } slice keeps output consistent and makes results easier to compare across runs.


Positive observations:

  • RunAndReturn with lastBatchTime incrementing per call is the right pattern — monotonic timestamps prevent false failures from the executor's ordering checks. (Notably, this already addresses Gemini's suggestion.)
  • b.ResetTimer() and b.ReportAllocs() placement is correct — setup work is outside the timed region.
  • b.Context() for context propagation and b.Cleanup for cancellation are idiomatic Go 1.24+.
  • b.Helper() on newBenchExecutor is correctly applied.
  • Direct field access (exec.ctx, exec.cancel) is acceptable here because the test is in the same package, and bypassing Start() to avoid launching the execution goroutine is the right call for a unit-level benchmark.

.github/workflows/benchmark.yml

Block executor output uses > instead of | tee (lines 46–47) — Medium

go test -bench=BenchmarkProduceBlock -benchmem -run='^$' \
  ./block/internal/executing/... > block_executor_output.txt

Shell redirection via > silences all output in the CI log, making failures hard to diagnose. The EVM step already uses tee correctly. Align them:

go test -bench=BenchmarkProduceBlock -benchmem -run='^$' \
  ./block/internal/executing/... | tee block_executor_output.txt

Fix this →


Missing -count=N for stable CI measurements — Low

Single-run benchmarks (-count=1, the default) can be noisy in CI environments with shared runners. Consider -count=5 (or at least -count=3) for the block executor benchmark to get a more reliable signal before fail-on-alert triggers.


Job name evm-benchmark covers both benchmarks — Very Low

The job is named evm-benchmark but now also runs the block executor benchmark. A name like benchmarks or block-and-evm-benchmarks would be more accurate as the suite grows.


Positive observations:

  • Action hashes are pinned — good supply chain hygiene.
  • permissions: {} at the workflow level with narrowly-scoped per-job permissions is the right pattern.
  • alert-threshold: '150%' and fail-on-alert: true will catch regressions automatically.
  • timeout-minutes: 30 is a sensible guard for a benchmark that spins up Docker containers.

@alpe alpe changed the base branch from main to alex/bench_e2e_start February 18, 2026 14:29
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @alpe, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request initiates the first phase of performance benchmarking for EVM contract operations within the ev-node system. It establishes a framework for measuring transaction roundtrip latency and integrates OpenTelemetry for granular performance analysis. The changes also involve making existing test utilities more versatile for use in both testing and benchmarking contexts, alongside necessary dependency updates.

Highlights

  • New EVM Contract Benchmark: Introduced a new benchmark, BenchmarkEvmContractRoundtrip, to measure the store-retrieve latency of EVM contracts against a real Reth node, providing performance insights into contract interactions.
  • OpenTelemetry Tracing Integration: Integrated OpenTelemetry tracing into the new benchmark to collect detailed performance data, allowing for a hierarchical timing report of ev-node components like Engine API calls, executor, and sequencer.
  • In-process OTLP/HTTP Collector: Implemented a lightweight, in-process OTLP/HTTP receiver (otlpCollector) to efficiently collect trace spans from ev-node during benchmark execution.
  • Refactored Test Helpers for Benchmarks: Updated numerous test helper functions across execution/evm/test, test/e2e, and test/e2e/sut_helper.go to accept testing.TB instead of *testing.T, making them reusable for both tests and benchmarks.
  • Dependency Updates: Adjusted go.mod files to explicitly include go.uber.org/zap and go.opentelemetry.io/proto/otlp as direct dependencies, supporting new logging and tracing functionalities.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • block/internal/executing/executor_benchmark_test.go
    • Added newBenchExecutor function to create a mocked Executor for benchmarking.
    • Added BenchmarkProduceBlock function to benchmark block production with varying transaction batches.
  • execution/evm/test/go.mod
    • Added go.uber.org/zap as a direct dependency.
    • Removed go.uber.org/zap from indirect dependencies.
  • execution/evm/test/test_helpers.go
    • Imported go.uber.org/zap and go.uber.org/zap/zaptest.
    • Updated getTestScopedDockerSetup to accept testing.TB.
    • Updated SetupTestRethNode to accept testing.TB and integrated zap logger for Reth node setup.
    • Updated waitForRethContainer to accept testing.TB.
  • execution/evm/test_helpers.go
    • Updated GetRandomTransaction to accept testing.TB.
  • test/e2e/evm_contract_bench_test.go
    • Added new file for BenchmarkEvmContractRoundtrip to measure contract store/retrieve latency.
    • Implemented otlpCollector for in-process OpenTelemetry trace collection.
    • Added printCollectedTraceReport to analyze and log trace breakdowns.
    • Added waitForReceipt helper function.
  • test/e2e/evm_contract_e2e_test.go
    • Updated setupTestSequencer to accept testing.TB and extraArgs.
    • Updated deployContract to accept testing.TB.
  • test/e2e/evm_test_common.go
    • Updated mustGetAvailablePort to accept testing.TB.
    • Updated createPassphraseFile to accept testing.TB.
    • Updated createJWTSecretFile to accept testing.TB.
    • Updated getNodeP2PAddress to accept testing.TB.
    • Updated setupSequencerNode to accept testing.TB and extraArgs.
    • Updated setupSequencerNodeLazy to accept testing.TB.
    • Updated setupFullNode to accept testing.TB.
    • Updated submitTransactionAndGetBlockNumber to accept testing.TB.
    • Updated setupCommonEVMTest to accept testing.TB.
    • Updated checkBlockInfoAt to accept testing.TB.
    • Updated setupSequencerOnlyTest to accept testing.TB and extraArgs.
    • Updated restartDAAndSequencer to accept testing.TB.
    • Updated restartDAAndSequencerLazy to accept testing.TB.
    • Updated restartSequencerNode to accept testing.TB.
    • Updated verifyNoBlockProduction to accept testing.TB.
  • test/e2e/go.mod
    • Added go.opentelemetry.io/proto/otlp as a direct dependency.
    • Removed go.opentelemetry.io/proto/otlp from indirect dependencies.
  • test/e2e/sut_helper.go
    • Updated SystemUnderTest struct field t type to testing.TB.
    • Updated NewSystemUnderTest constructor to accept testing.TB.
    • Updated AwaitNodeUp to accept testing.TB.
    • Updated AwaitNodeLive to accept testing.TB.
    • Updated AwaitNBlocks to accept testing.TB.
    • Updated MustCopyFile to accept testing.TB.
    • Updated NodeID to accept testing.TB.
Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/benchmark.yml
Activity
  • The pull request is currently marked as [WIP], indicating ongoing development.
  • The description body is empty, suggesting that detailed context and rationale are yet to be provided by the author.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@codecov
Copy link

codecov bot commented Feb 18, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 61.19%. Comparing base (dc473e5) to head (8efaece).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3081   +/-   ##
=======================================
  Coverage   61.18%   61.19%           
=======================================
  Files         113      113           
  Lines       11444    11444           
=======================================
+ Hits         7002     7003    +1     
+ Misses       3654     3652    -2     
- Partials      788      789    +1     
Flag Coverage Δ
combined 61.19% <ø> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces benchmarks for the ProduceBlock function. The overall structure is good, but there are a couple of areas for improvement to ensure the benchmarks are robust and deterministic. I've pointed out that using time.Now() can lead to non-deterministic tests and should be replaced with a fixed time. More critically, the mock setup for GetNextBatch uses a static timestamp, which could cause benchmark failures if timestamp monotonicity is required for blocks. I've suggested a fix to generate an increasing timestamp for each block produced in the benchmark loop.

Comment on lines 84 to 89
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The mock for GetNextBatch is configured to return a response with Timestamp: time.Now(). This timestamp is evaluated only once when the mock expectation is set up. Consequently, every call to ProduceBlock within the benchmark loop will receive the same timestamp. If there is a requirement for block timestamps to be monotonically increasing, this will cause ProduceBlock to fail after the first iteration.

To fix this, use RunAndReturn to generate a new, increasing timestamp for each call. This ensures the benchmark correctly simulates the production of a sequence of valid blocks.

Suggested change
mockSeq.EXPECT().GetNextBatch(mock.Anything, mock.AnythingOfType("sequencer.GetNextBatchRequest")).
Return(&coreseq.GetNextBatchResponse{
Batch: &coreseq.Batch{Transactions: txs},
Timestamp: time.Now(),
BatchData: txs,
}, nil)
var blockNum uint64
mockSeq.EXPECT().GetNextBatch(mock.Anything, mock.AnythingOfType("sequencer.GetNextBatchRequest")).
RunAndReturn(func(_ context.Context, _ coreseq.GetNextBatchRequest) (*coreseq.GetNextBatchResponse, error) {
blockNum++
timestamp := gen.StartTime.Add(time.Duration(blockNum) * cfg.Node.BlockTime.Duration)
return &coreseq.GetNextBatchResponse{
Batch: &coreseq.Batch{Transactions: txs},
Timestamp: timestamp,
BatchData: txs,
}, nil
})

@alpe alpe force-pushed the alex/bench_phase1_start branch 2 times, most recently from d118247 to 7150135 Compare February 19, 2026 08:47
Base automatically changed from alex/bench_e2e_start to main February 19, 2026 09:22
@alpe alpe force-pushed the alex/bench_phase1_start branch from 7150135 to 8efaece Compare February 19, 2026 09:56
@alpe alpe changed the title [WIP] Start phase1 benchmarks test: Start phase1 benchmarks Feb 19, 2026
@alpe alpe force-pushed the alex/bench_phase1_start branch from 53311f4 to 7dea42f Compare February 19, 2026 11:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments