feat(sequencer): catchup from base by julienrbrt · Pull Request #3057 · evstack/ev-node

julienrbrt · 2026-02-09T14:35:48Z

Closes: #3016

Opus did a pretty good job. I'll be improving it.

github-actions · 2026-02-09T14:36:01Z

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

Build	Format	Lint	Breaking	Updated (UTC)
`✅ passed`	`⏩ skipped`	`✅ passed`	`✅ passed`	Feb 17, 2026, 3:08 PM

claude · 2026-02-09T14:36:06Z

Claude finished @julienrbrt's task in 1m 57s —— View job

Code Review

Tasks

Gather context and read changed files
Review sequencer catch-up logic
Review syncer DAHeight changes
Review DA client/interface changes
Review tests
Post comprehensive review

This PR adds a catch-up mechanism for the single sequencer after extended downtime, and a complementary DA-height management fix in the syncer. The overall architecture is sound and the approach is correct, but there are several issues — some critical — to address before merging.

Critical Issues

1. Non-monotonic timestamps during multi-block catch-up per epoch

pkg/sequencers/single/sequencer.go:376

During catch-up, all blocks produced for the same DA epoch get timestamps derived from c.currentDAEndTime, adjusted by remainingForcedTxs * time.Millisecond. Consider the scenario: an epoch has 3 forced inclusion transactions but only 2 fit in the first block. The first batch gets timestamp T - 2ms, the second gets T - 0ms = T. But on the next call after checkpoint advancement, if there's still 1 tx left, c.checkpoint.TxIndex may have moved and the math can produce an earlier timestamp than the previous block.

More fundamentally, this millisecond-offset approach is fragile. If remainingForcedTxs == 0 for the last tx (it was just consumed), the timestamp equals daEndTime exactly. The next DA epoch will then have a timestamp close to but potentially not strictly greater than the previous block's timestamp — breaking the monotonic timestamp invariant required by EVM and most execution layers.

// Current approach: fragile millisecond offsets
timestamp = daEndTime.Add(-time.Duration(remainingForcedTxs) * time.Millisecond)

A robust fix: track a per-epoch block counter and use daEndTime.Add(-time.Duration(blocksRemainingInEpoch) * time.Millisecond), or use a strict monotonic counter seeded from the previous block timestamp. The test TestSequencer_CatchUp_UsesDATimestamp only covers the single-tx case and doesn't validate this invariant. Fix this →

2. Data race on `currentDAEndTime`

pkg/sequencers/single/sequencer.go:71 and 370

catchUpState is correctly guarded by atomic.Int32, but currentDAEndTime time.Time is a plain struct field accessed from both GetNextBatch (read at line 370) and fetchNextDAEpoch (write at line 482). time.Time is a struct with multiple fields — concurrent read/write is a data race that can produce corrupted timestamps and lead to undefined behavior.

While GetNextBatch and fetchNextDAEpoch are called sequentially in the block production loop, isCatchingUp() references catchUpState from external callers. If other code paths (e.g., RPC handlers, metrics) ever call GetNextBatch concurrently, the race is live. At minimum, this should be documented with // Not safe for concurrent use if single-goroutine use is guaranteed, or wrapped in a mutex/atomic value if not.

The Gemini review flagged this correctly — the time.Time atomic issue should use atomic.Pointer[time.Time] or a mutex. Fix this →

Medium Issues

3. Syncer epoch advancement: `nextEpochStart` can skip multiple epochs

block/internal/syncing/syncer.go:789-795

_, epochEnd, _ := types.CalculateEpochBoundaries(
    newState.DAHeight, s.genesis.DAStartHeight, epochSize,
)
nextEpochStart := epochEnd + 1

CalculateEpochBoundaries returns the end of the current epoch. If newState.DAHeight is mid-epoch (e.g., height 5 with epoch size 10, start 0), epochEnd would be 9, so nextEpochStart = 10. This correctly advances by one epoch. However, if the sequencer produced a catch-up block that spans the epoch boundary in a non-standard way (e.g., TxIndex offsets), there could be a misalignment. The comment says "advance by exactly one epoch" but the actual advance is from newState.DAHeight to the end of its epoch + 1, which could be many blocks if newState.DAHeight is at the start of an epoch. This logic is correct if the invariant holds that newState.DAHeight tracks epoch start boundaries — confirm this is always the case.

4. `updateCatchUpState` is called on every `fetchNextDAEpoch`, but the short-circuit only prevents the expensive `GetLatestDAHeight` call

pkg/sequencers/single/sequencer.go:531-533

func (c *Sequencer) updateCatchUpState(ctx context.Context) {
    if c.catchUpState.Load() != catchUpUnchecked {
        return
    }

This is correct — the check is only done once. But since updateCatchUpState is called inside fetchNextDAEpoch (which runs on every epoch), every call after the first is a no-op via early return. This is fine for correctness but slightly misleading — fetchNextDAEpoch always calls updateCatchUpState regardless, even during normal steady-state operation. Consider adding a fast path in fetchNextDAEpoch to skip the call entirely once state is determined:

if c.catchUpState.Load() == catchUpUnchecked {
    c.updateCatchUpState(ctx)
}

5. Catch-up detection is height-based only, not time-based

pkg/sequencers/single/sequencer.go:564-578

The Gemini review mentioned a 30-second time threshold — this was apparently revised in the final implementation to use epoch count (>1 missed epoch). This is better. However, the comment in the struct doc says "once per lifecycle" — this is enforced by the catchUpUnchecked → catchUpDone state machine, but if GetLatestDAHeight fails (line 549), the state stays catchUpDone (already set optimistically at line 537), so subsequent restarts won't retry. If the failure was transient, the sequencer will skip catch-up detection entirely. Consider resetting to catchUpUnchecked on GetLatestDAHeight failure so the check can be retried.

// Current: on failure, state stays catchUpDone (optimistic)
c.logger.Warn().Err(err).
    Msg("failed to get latest DA height for catch-up detection, skipping check")
return
// Could reset: c.catchUpState.Store(catchUpUnchecked) to allow retry

6. `isCatchingUp()` is unexported but the comment says "public `IsCatchingUp()` method"

pkg/sequencers/single/sequencer.go:429

The Gemini summary and the code comment reference a "public IsCatchingUp() method" but the actual method is isCatchingUp() (lowercase, unexported). This is fine if external callers don't need it, but the documentation is misleading. Either rename to IsCatchingUp() and export it, or update the comment.

Minor Issues

7. Missing `RetrieveForcedInclusion` tracing

block/internal/da/tracing.go

The tracedClient delegates RetrieveForcedInclusion via the embedded FullClient but the concrete client struct has RetrieveForcedInclusion — it's not in the FullClient interface. The new GetLatestDAHeight is correctly traced. However, HasForcedInclusionNamespace and GetForcedInclusionNamespace are non-instrumented pass-throughs (lines 145-147), which is acceptable since they're pure accessors.

8. Test logger inconsistency

Some catch-up tests use zerolog.New(zerolog.NewConsoleWriter()) while others use zerolog.New(zerolog.NewTestWriter(t)) or zerolog.Nop(). The Gemini review caught this. Using NewTestWriter(t) for all tests is better practice (logs only appear on test failure).

Affected tests: TestSequencer_CatchUp_EntersOnEpochGap (~line 1248) and a few others.

9. Test script `scripts/test-catchup.sh` (393 lines) is a manual bash script

This is fine for development but shouldn't be expected to run in CI without the full binary. Make sure it's clearly documented as a development tool, not a CI requirement.

10. Duplicate pending-map clearing pattern in `VerifyForcedInclusionTxs`

block/internal/syncing/syncer.go:1123-1126

s.pendingForcedInclusionTxs.Range(func(key, value any) bool {
    s.pendingForcedInclusionTxs.Delete(key)
    return true
})

This pattern clears the entire map before repopulating, which is functional but involves two sync.Map traversals (one to clear, one to repopulate). A more efficient approach would swap to a regular map and replace atomically. This is pre-existing code, not introduced by this PR, but worth noting.

Strengths

The state machine approach (catchUpUnchecked → catchUpInProgress → catchUpDone) is clean and well-documented. Using atomic.Int32 for the state enum is the right call.
The syncer's epoch-by-epoch DAHeight advancement (instead of jumping to event.DaHeight) is the correct fix for forced inclusion verification alignment.
The comment block at syncer.go:763-779 is excellent — it clearly explains the "why" behind the non-obvious epoch advancement logic.
Test coverage for catch-up scenarios is thorough (8 test cases covering detection, mempool skipping, timestamp usage, exit conditions, multi-epoch replay, checkpoint advancement, no forced inclusion).
The GetLatestDAHeight interface addition is small and well-placed.
The tracing additions for the DA client (GetLatestDAHeight, namespace accessors) are correct and follow existing patterns.

Summary

The core concept is correct and well-implemented. The two critical issues are:

Timestamp monotonicity during multi-block catch-up per epoch — needs a robust solution or a clear proof that at most one block is produced per epoch (which seems unlikely given the existing multi-tx-per-epoch code paths).
Data race on currentDAEndTime — needs synchronization.

Both should be fixed before merge to avoid potential chain halts in production.

gemini-code-assist · 2026-02-09T14:36:18Z

Summary of Changes

Hello @julienrbrt, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request implements a critical "catch-up" feature for the sequencer, designed to enhance its resilience and consistency following periods of inactivity or downtime. By intelligently replaying missed Data Availability (DA) epochs, the sequencer can resynchronize its state, ensuring that all forced inclusion transactions are processed in the correct order and with accurate timestamps, thereby maintaining a consistent view of the blockchain's history, even when operating in a decentralized environment. This mechanism allows the sequencer to seamlessly transition back to normal operation without compromising data integrity.

Highlights

Sequencer Catch-up Mechanism: Introduced a new mechanism allowing the sequencer to "catch up" on missed Data Availability (DA) epochs after extended downtime.
Forced Inclusion Priority: During catch-up, the sequencer temporarily processes only forced inclusion transactions, ensuring consistency with base sequencing nodes and temporarily skipping mempool transactions.
DA Height Synchronization: Modified syncer.go to incrementally advance the DAHeight by one epoch during catch-up, preventing issues with forced inclusion transaction verification.
Timestamp Alignment: Block timestamps generated during catch-up are now aligned with the DA epoch's end timestamp for historical accuracy.
Comprehensive Testing: Added extensive unit tests covering various catch-up scenarios, including detection, mempool skipping, timestamp usage, exit conditions, and multi-epoch replay.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

block/internal/syncing/syncer.go
- Updated the logic for DAHeight updates to handle large discrepancies between event.DaHeight and newState.DAHeight. When a significant gap (more than one epoch) is detected, newState.DAHeight is now advanced by exactly one epoch per block, rather than jumping directly to event.DaHeight. This ensures correct verification of forced inclusion transactions during sequencer catch-up.
- Added detailed comments explaining the new DAHeight update strategy and its importance for catch-up blocks and forced inclusion verification.
pkg/sequencers/single/sequencer.go
- Added catchingUp (boolean) and currentDAEndTime (time.Time) to the Sequencer struct to manage the catch-up state and store the DA epoch's end timestamp.
- Modified GetNextBatch to conditionally skip fetching transactions from the mempool when the sequencer is in catchingUp mode. This ensures that only forced inclusion transactions are included in batches during catch-up.
- Updated GetNextBatch to use sequencer.currentDAEndTime as the block timestamp when catchingUp is true, aligning block timestamps with the historical DA epoch.
- Introduced a public IsCatchingUp() method to query the sequencer's current catch-up status.
- Modified fetchNextDAEpoch to store the forcedTxsEvent.Timestamp in currentDAEndTime and to call a new updateCatchUpState method to manage the catchingUp flag. It also now exits catch-up mode if StatusHeightFromFuture is returned by the DA client.
- Implemented a new private method updateCatchUpState that determines whether the sequencer should be in catch-up mode based on the timestamp of the fetched DA epoch. It enters catch-up if the epoch timestamp is significantly in the past (more than one epoch's wall-clock duration) and exits if it's recent.
pkg/sequencers/single/sequencer_test.go
- Added a comprehensive suite of tests (TestSequencer_CatchUp_...) to validate the new catch-up functionality. These tests cover:
- Detection of old epochs and entering catch-up mode.
- Skipping mempool transactions during catch-up.
- Using DA epoch timestamps for blocks during catch-up.
- Exiting catch-up mode when the DA head is reached (either a recent epoch or HeightFromFuture).
- Scenarios where catch-up is not triggered (recent epoch, no forced inclusion configured).
- Multi-epoch replay scenarios.
- Correct advancement of the internal checkpoint (DAHeight) during catch-up.

Activity

The pull request addresses and closes issue https://github.com/evstack/ev-node/issues/3016.
The author indicates that this work builds upon a previous attempt by "Opus", aiming for improvement.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

codecov · 2026-02-09T14:39:58Z

Codecov Report

❌ Patch coverage is 64.86486% with 39 lines in your changes missing coverage. Please review.
✅ Project coverage is 61.23%. Comparing base (d4394f6) to head (325f1fc).

Files with missing lines	Patch %	Lines
block/internal/syncing/syncer.go	22.22%	13 Missing and 1 partial ⚠️
block/internal/da/tracing.go	0.00%	10 Missing ⚠️
block/internal/da/client.go	0.00%	9 Missing ⚠️
pkg/sequencers/single/sequencer.go	91.89%	4 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3057      +/-   ##
==========================================
+ Coverage   61.11%   61.23%   +0.12%     
==========================================
  Files         113      113              
  Lines       11444    11550     +106     
==========================================
+ Hits         6994     7073      +79     
- Misses       3661     3688      +27     
  Partials      789      789

Flag	Coverage Δ
combined	`61.23% <64.86%> (+0.12%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

gemini-code-assist

Code Review

This pull request introduces a robust catch-up mechanism for the sequencer, designed to handle restarts after extended downtime. While the overall approach for consuming and verifying catch-up blocks in the syncer, including the incremental advancement of DAHeight, is well-implemented and tested, the implementation of catch-up mode in the single sequencer has significant flaws. Specifically, it produces non-monotonic block timestamps when multiple blocks are generated for a single DA epoch or when empty epochs are encountered, which will likely cause the execution layer to reject blocks and halt the chain. Additionally, there is a data race on the new catch-up state fields due to a lack of synchronization primitives. Minor suggestions for code clarity and testing experience were also noted.

gemini-code-assist · 2026-02-09T14:41:17Z

pkg/sequencers/single/sequencer.go

+	timestamp := time.Now()
+	if c.catchingUp && !c.currentDAEndTime.IsZero() {
+		timestamp = c.currentDAEndTime
+	}


During catch-up mode, the sequencer uses the timestamp from the last fetched DA epoch as the block timestamp. However, multiple blocks can be produced for a single DA epoch (e.g., if the epoch contains many transactions that don't fit in one block). In this case, all blocks produced for that epoch will have the exact same timestamp (c.currentDAEndTime). Furthermore, if an empty DA epoch is encountered, c.currentDAEndTime is not updated, causing the next block to reuse the timestamp from the previous non-empty epoch.

Most blockchain execution layers (like EVM) require that each new block has a timestamp strictly greater than its parent. If the sequencer produces multiple blocks with the same timestamp, the execution layer will reject them, halting the block production process and causing a Denial of Service.

gemini-code-assist · 2026-02-09T14:41:18Z

pkg/sequencers/single/sequencer.go

+	catchingUp bool
+	// currentDAEndTime is the DA epoch end timestamp from the last fetched epoch.
+	// Used as the block timestamp during catch-up to match based sequencing behavior.
+	currentDAEndTime time.Time


The fields catchingUp and currentDAEndTime are accessed concurrently without proper synchronization. GetNextBatch and fetchNextDAEpoch (which update these fields) can be called by the block production loop, while IsCatchingUp() can be called concurrently by other components (e.g., RPC, monitoring, or metrics).

In Go, concurrent read/write access to non-atomic fields is a data race, which leads to undefined behavior. Specifically, a race on time.Time (which is a struct) can result in a corrupted timestamp being read, potentially leading to invalid blocks and a Denial of Service.

gemini-code-assist · 2026-02-09T14:41:18Z

pkg/sequencers/single/sequencer.go

+	if catchUpThreshold < 30*time.Second {
+		catchUpThreshold = 30 * time.Second
+	}


The value 30*time.Second is used as a minimum threshold for the catch-up logic. To improve readability and maintainability, consider defining this as a named constant at the package level, for example: const minCatchUpThreshold = 30 * time.Second, and using that constant here. This makes the intent of the value clearer and centralizes its definition.

gemini-code-assist · 2026-02-09T14:41:18Z

pkg/sequencers/single/sequencer_test.go

 // This test uses maxBytes to limit how many txs are fetched, triggering the unprocessed txs scenario.
+func TestSequencer_CatchUp_DetectsOldEpoch(t *testing.T) {
+	ctx := context.Background()
+	logger := zerolog.New(zerolog.NewConsoleWriter())


Using zerolog.New(zerolog.NewConsoleWriter()) in tests can lead to verbose output in the console during test runs. It's generally better to use a logger that integrates with the testing framework, such as zerolog.New(zerolog.NewTestWriter(t)), which only shows logs for failed tests, or zerolog.Nop() if logs are not needed. This practice is followed in other new tests in this file (e.g., TestSequencer_CatchUp_UsesDATimestamp). Applying this consistently would improve the developer experience. This comment also applies to lines 1309, 1457, and 1676.

Suggested change

logger := zerolog.New(zerolog.NewConsoleWriter())

logger := zerolog.New(zerolog.NewTestWriter(t))

feat(sequencer): catchup from base

17fcc48

julienrbrt requested a review from alpe February 9, 2026 14:35

github-actions bot assigned julienrbrt Feb 9, 2026

gemini-code-assist bot reviewed Feb 9, 2026

View reviewed changes

julienrbrt added 12 commits February 9, 2026 17:53

fetch DA height

c963984

Merge branch 'main' into julien/catchup-base

3e93c9a

Merge branch 'main' into julien/catchup-base

b01ab73

cl

42f0405

align timestamping

014510b

Merge branch 'main' into julien/catchup-base

98e546c

updates

a30ea68

Merge branch 'main' into julien/catchup-base

90b5ee9

Merge branch 'main' into julien/catchup-base

bc630c2

updates

97f055f

fixes

9288b29

Merge branch 'main' into julien/catchup-base

325f1fc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sequencer): catchup from base#3057

feat(sequencer): catchup from base#3057
julienrbrt wants to merge 13 commits intomainfrom
julien/catchup-base

julienrbrt commented Feb 9, 2026

Uh oh!

github-actions bot commented Feb 9, 2026 •

edited

Loading

Uh oh!

claude bot commented Feb 9, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Feb 9, 2026

Uh oh!

codecov bot commented Feb 9, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 9, 2026

Uh oh!

gemini-code-assist bot Feb 9, 2026

Uh oh!

gemini-code-assist bot Feb 9, 2026

Uh oh!

gemini-code-assist bot Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	logger := zerolog.New(zerolog.NewConsoleWriter())
	logger := zerolog.New(zerolog.NewTestWriter(t))

Conversation

julienrbrt commented Feb 9, 2026

Uh oh!

github-actions bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

Tasks

Critical Issues

1. Non-monotonic timestamps during multi-block catch-up per epoch

2. Data race on currentDAEndTime

Medium Issues

3. Syncer epoch advancement: nextEpochStart can skip multiple epochs

4. updateCatchUpState is called on every fetchNextDAEpoch, but the short-circuit only prevents the expensive GetLatestDAHeight call

5. Catch-up detection is height-based only, not time-based

6. isCatchingUp() is unexported but the comment says "public IsCatchingUp() method"

Minor Issues

7. Missing RetrieveForcedInclusion tracing

8. Test logger inconsistency

9. Test script scripts/test-catchup.sh (393 lines) is a manual bash script

10. Duplicate pending-map clearing pattern in VerifyForcedInclusionTxs

Strengths

Summary

Uh oh!

gemini-code-assist bot commented Feb 9, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

codecov bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions bot commented Feb 9, 2026 •

edited

Loading

claude bot commented Feb 9, 2026 •

edited

Loading

2. Data race on `currentDAEndTime`

3. Syncer epoch advancement: `nextEpochStart` can skip multiple epochs

4. `updateCatchUpState` is called on every `fetchNextDAEpoch`, but the short-circuit only prevents the expensive `GetLatestDAHeight` call

6. `isCatchingUp()` is unexported but the comment says "public `IsCatchingUp()` method"

7. Missing `RetrieveForcedInclusion` tracing

9. Test script `scripts/test-catchup.sh` (393 lines) is a manual bash script

10. Duplicate pending-map clearing pattern in `VerifyForcedInclusionTxs`

codecov bot commented Feb 9, 2026 •

edited

Loading