Conversation
|
The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).
|
|
Claude finished @alpe's task in 2m 9s —— View job Code Review
SummaryThis PR implements a catchup/recovery phase for aggregator (sequencer) nodes — allowing them to sync from DA and P2P before starting block production. The implementation is clean and the integration tests are well-structured. Below are findings grouped by severity. Architecture & Design1. Rename clarity: The PR description and original issue refer to "sequencer recovery", but the implementation uses 2. In // block/components.go:349-379 - always creates a Syncer...
func NewAggregatorWithCatchupComponents(...) (*Components, error) { ... }
// node/failover.go:162-165 - ...then discards it if not needed
if isAggregator && !catchupEnabled {
bc.Syncer = nil
}This means every aggregator node allocates a syncer unnecessarily. Consider restoring the original 3. Incomplete comment at // Catchup only applies to aggregator nodes that need to sync beforeThe comment is truncated — missing "before producing blocks" or similar. 4. Stale comment at // stopNodeAndCleanup stops a running node and calls its cleanup function.This comment references a function that doesn't exist. This was flagged by Gemini but remains in the latest commit. Correctness & Concurrency5. P2P client not started in In the original code, 6. // node/failover.go:257-268
func (f *failoverState) runCatchupPhase(ctx context.Context) error {
if err := f.bc.Syncer.Start(ctx); err != nil { ... }
defer f.bc.Syncer.Stop() // captures bc.Syncer at this point
caughtUp, err := f.waitForCatchup(ctx)
...
f.bc.Syncer = nil // ← sets to nil after defer captures non-nil value
return nil
}The 7. // node/failover.go:292-294
case <-ctx.Done():
return false, nilThen at the call site: // node/failover.go:259-265
caughtUp, err := f.waitForCatchup(ctx)
if err != nil { return err }
if !caughtUp {
return ctx.Err() // ← could be nil if ctx.Err() == nil but still !caughtUp
}If the context is done, 8. // node/failover.go:312-315
p2pCaughtUp := ignoreP2P || (maxP2PHeight > 0 && storeHeight >= maxP2PHeight)
if !ignoreP2P && f.catchupTimeout == 0 && maxP2PHeight == 0 {
p2pCaughtUp = true
}When Error Handling9. defer f.bc.Syncer.Stop() // nolint:errcheck // not criticalThe 10. // block/internal/syncing/syncer.go:411
s.daHeadReached.Store(true)The field is documented as "stays true once DA head is seen." This is intentional but means a second call to Testing11. Magic number // node/sequencer_recovery_integration_test.go:329,333
if res.Code == 1 { // StatusSuccessGemini flagged this and it's still present. Use 12. // node/sequencer_recovery_integration_test.go:253
return fmt.Sprintf("%s/p2p/%s", listenAddr, peerID.Loggable()["peerID"].(string))Gemini flagged this. 13. // node/sequencer_recovery_integration_test.go:205-219
if allMatch {
t.Log("recovery node synced original blocks from P2P — all hashes verified")
} else {
t.Log("recovery node produced its own blocks (P2P sync was not completed in time)")
}This test doesn't fail when P2P hashes don't match — it merely logs. The intent seems to be that P2P sync is best-effort, but this makes the test not actually verify P2P recovery. At minimum, consider adding a 14. // node/sequencer_recovery_integration_test.go:311
time.Sleep(waitTime)Using Cleanup Items (from prior reviews, still outstanding)15. The latest commit ( 16. Stale comment at line 305 (mentioned in item 4 above — the Minor Nits
Overall AssessmentThe core implementation is solid. The catchup phase logic in
|
Summary of ChangesHello @alpe, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the resilience of sequencer nodes by implementing a dedicated recovery mechanism. It enables a sequencer to gracefully recover from data loss by first synchronizing its state from the Data Availability layer and the P2P network, and then seamlessly resuming its role as an aggregator. This ensures continuous operation and data integrity, particularly in environments where traditional consensus mechanisms like Raft are not employed. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3061 +/- ##
==========================================
- Coverage 61.16% 61.03% -0.14%
==========================================
Files 113 113
Lines 11444 11544 +100
==========================================
+ Hits 7000 7046 +46
- Misses 3655 3698 +43
- Partials 789 800 +11
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Code Review
This pull request introduces a sequencer recovery mechanism, a valuable feature for disaster recovery. The implementation allows a sequencer to start in sync mode, catch up from DA and P2P, and then transition to an aggregator. The logic is well-structured within the new sequencerRecoveryElector, and the catch-up conditions are robust, including a P2P timeout. The addition of comprehensive integration tests for both DA-only and P2P-assisted recovery scenarios is commendable. I've identified a few minor areas for improvement in the test code to enhance maintainability and robustness.
| t.Helper() | ||
| peerID, err := peer.IDFromPrivateKey(nodeKey.PrivKey) | ||
| require.NoError(t, err) | ||
| return fmt.Sprintf("%s/p2p/%s", listenAddr, peerID.Loggable()["peerID"].(string)) |
There was a problem hiding this comment.
The peer.ID.Loggable() method is primarily for logging purposes, and its output format may not be stable across library versions. It's more robust to use peer.ID.String(), which is the public API for getting the canonical string representation of a peer ID.
| return fmt.Sprintf("%s/p2p/%s", listenAddr, peerID.Loggable()["peerID"].(string)) | |
| return fmt.Sprintf("%s/p2p/%s", listenAddr, peerID.String()) |
| return recNode.(*FullNode), recStopTicker | ||
| } | ||
|
|
||
| // stopNodeAndCleanup stops a running node and calls its cleanup function. |
| if res.Code == 1 { // StatusSuccess | ||
| blobsFound += len(res.Data) | ||
| t.Logf("DIAG: found %d header blob(s) at DA height %d (Success)", len(res.Data), h) | ||
| } | ||
| res = sharedDA.Retrieve(context.Background(), h, sharedDA.GetDataNamespace()) | ||
| if res.Code == 1 { // StatusSuccess | ||
| blobsFound += len(res.Data) | ||
| t.Logf("DIAG: found %d data blob(s) at DA height %d (Success)", len(res.Data), h) | ||
| } |
There was a problem hiding this comment.
The magic number 1 is used to check for success status. For better readability and maintainability, please use the datypes.StatusSuccess constant. You'll need to add the following import:
import (
// ... other imports
datypes "github.com/evstack/ev-node/pkg/da/types"
)| if res.Code == 1 { // StatusSuccess | |
| blobsFound += len(res.Data) | |
| t.Logf("DIAG: found %d header blob(s) at DA height %d (Success)", len(res.Data), h) | |
| } | |
| res = sharedDA.Retrieve(context.Background(), h, sharedDA.GetDataNamespace()) | |
| if res.Code == 1 { // StatusSuccess | |
| blobsFound += len(res.Data) | |
| t.Logf("DIAG: found %d data blob(s) at DA height %d (Success)", len(res.Data), h) | |
| } | |
| if res.Code == datypes.StatusSuccess { // StatusSuccess | |
| blobsFound += len(res.Data) | |
| t.Logf("DIAG: found %d header blob(s) at DA height %d (Success)", len(res.Data), h) | |
| } | |
| res = sharedDA.Retrieve(context.Background(), h, sharedDA.GetDataNamespace()) | |
| if res.Code == datypes.StatusSuccess { // StatusSuccess | |
| blobsFound += len(res.Data) | |
| t.Logf("DIAG: found %d data blob(s) at DA height %d (Success)", len(res.Data), h) | |
| } |
pkg/config/config.go
Outdated
There was a problem hiding this comment.
Nit, can we make nodeConfig.Node.SequencerRecovery.Duration and Raft.Enable mutually exclusive? This is a cobra option
* main: refactor: reuse P2P client across node modes (#3065) refactor: store pending block separately (#3073) chore: Fix mismatched comment in TestCache_WithNilStore function (#3074) build(deps): Bump github.com/pion/dtls/v3 from 3.0.6 to 3.0.11 (#3068) feat: block Pruning (#2984) refactor: omit unnecessary reassignment (#3067) refactor: use WaitGroup.Go to simplify code (#3059) chore: prep rc.4 (#3062) fix(syncing): skip draining when exec client unavailable (#3060) build(deps): Bump the all-go group across 4 directories with 5 updates (#3058)
…re block production.
julienrbrt
left a comment
There was a problem hiding this comment.
nice! lgtm. left some cleanups nits.
Resolves #2521
Overview
This PR implements the Sequencer Recovery mode, enabling a node to restore its state from the Data Availability (DA) layer and P2P network before assuming the role of a sequencer.
Key Changes:
node.sequencer_recoverystart param, allowing configurable timeouts for the catch-up phase.