Skip to content

Conversation

@camilamacedo86
Copy link
Contributor

Description

Reviewer Checklist

  • API Go Documentation
  • Tests: Unit Tests (and E2E Tests, if appropriate)
  • Comprehensive Commit Messages
  • Links to related GitHub Issue(s)

Copilot AI review requested due to automatic review settings February 5, 2026 13:15
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 5, 2026
@netlify
Copy link

netlify bot commented Feb 5, 2026

Deploy Preview for olmv1 ready!

Name Link
🔨 Latest commit efbe1f6
🔍 Latest deploy log https://app.netlify.com/projects/olmv1/deploys/698b804fb743d10008cb22b5
😎 Deploy Preview https://deploy-preview-2486--olmv1.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review any files in this pull request.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 8 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov
Copy link

codecov bot commented Feb 5, 2026

Codecov Report

❌ Patch coverage is 33.33333% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.44%. Comparing base (28102fa) to head (efbe1f6).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
test/utils.go 0.00% 12 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2486      +/-   ##
==========================================
- Coverage   73.74%   69.44%   -4.30%     
==========================================
  Files         102      102              
  Lines        8496     8504       +8     
==========================================
- Hits         6265     5906     -359     
- Misses       1747     2129     +382     
+ Partials      484      469      -15     
Flag Coverage Δ
e2e 45.59% <100.00%> (-0.65%) ⬇️
experimental-e2e 12.72% <100.00%> (-41.32%) ⬇️
unit 57.89% <27.77%> (-0.09%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot AI review requested due to automatic review settings February 5, 2026 15:03
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings February 5, 2026 15:59
@camilamacedo86 camilamacedo86 changed the title WIP: 🌱 chore(deps): upgrade k8s to 1.35 and Go dependencies (edge) 🌱 chore(deps): upgrade k8s to 1.35 and Go dependencies (edge) Feb 5, 2026
@camilamacedo86 camilamacedo86 changed the title 🌱 chore(deps): upgrade k8s to 1.35 and Go dependencies (edge) WIP 🌱 chore(deps): upgrade k8s to 1.35 and Go dependencies (edge) Feb 5, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 11 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 12 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

// Track in a future issue: "Generate apply configurations and migrate to typed Apply methods"
//
// nolint:staticcheck // SA1019: server-side apply required, needs generated apply configurations
return bc.Client.Patch(ctx, rev, client.Apply, client.FieldOwner(bc.FieldOwner), client.ForceOwnership)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Lets looking on that in a follow up shows very complex to go in this PR

@camilamacedo86 camilamacedo86 changed the title WIP 🌱 chore(deps): upgrade k8s to 1.35 and Go dependencies (edge) 🌱 chore(deps): upgrade k8s to 1.35 and Go dependencies (edge) Feb 6, 2026
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 6, 2026
test/utils.go Outdated
if lastErr != nil {
log.Printf("StopWithRetry: timeout reached before successful teardown; last error: %v", lastErr)
}
return env.Stop() // Final attempt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it likely that the final attempt works? could we be calling out a false negative with the previous log line?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should just return:

fmt.Errorf("StopWithRetry: timeout reached before successful teardown; last error: %v", lastErr)

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it works as it is. Within controller-runtime v0.23. has some timing changes in the test environment teardown process. Without using Eventually, testEnv.Stop() can fail intermittently with errors related to graceful shutdown timing.

So this code does the same of:

var _ = AfterSuite(func() {
    By("tearing down the test environment")
    cancel()
    Eventually(func() error {
        return testEnv.Stop()
    }, time.Minute, time.Second).Should(Succeed())
})

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question is whether we can be in a situation where

StopWithRetry: timeout reached before successful teardown; last error: ... is printed out but actually it stopped successfully in the final attempt (and whether we care)

Copy link
Contributor

@perdasilva perdasilva Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just do

err := env.Stop() // Final attempt
if err != nil {
   log ...
}
return err

then you also don't need to keep the lastErr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@pedjak pedjak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Feb 10, 2026
@openshift-ci
Copy link

openshift-ci bot commented Feb 10, 2026

New changes are detected. LGTM label has been removed.

require.NoError(ct, err)
require.NotNil(ct, lease.Spec.HolderIdentity, "lease should have a holder")
require.NotEmpty(ct, *lease.Spec.HolderIdentity, "lease holder identity should not be empty")
}, 3*time.Minute, time.Second)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked for Claude evaluate those changes
Following bellow.

The Problem (WHY the change was needed)

Old Implementation (main branch)

t.Log("Waiting for acquired leader election")
leaderCtx, leaderCancel := context.WithTimeout(ctx, 3*time.Minute)
defer leaderCancel()
leaderSubstrings := []string{"successfully acquired lease"}
leaderElected, err := watchPodLogsForSubstring(leaderCtx, &managerPod, leaderSubstrings...)
require.NoError(t, err)
require.True(t, leaderElected)

The watchPodLogsForSubstring function used:

podLogOpts := corev1.PodLogOptions{
    Follow:    true,  // <-- THIS IS THE PROBLEM
    Container: container,
}

The Race Condition

  1. controller-runtime v0.23.1+ has FASTER leader election startup
  2. The test starts AFTER the pod is created
  3. By the time the test calls watchPodLogsForSubstring, leader election has already completed
  4. Follow: true only streams NEW logs, missing messages already emitted
  5. The test waits forever for a message that already passed

Timeline Visualization

controller-runtime v0.22.x (SLOW startup):
├─ Pod starts
├─ Test starts watching logs (Follow=true)
├─ Leader election happens  ← Message logged and caught by test ✅
└─ Test passes

controller-runtime v0.23.1+ (FAST startup):
├─ Pod starts
├─ Leader election happens  ← Message logged
├─ Test starts watching logs (Follow=true)  ← Too late! Message missed ❌
└─ Test times out waiting for message

New Implementation (deps-up branch)

Approach 1: Direct Lease Checking (for leader election verification)

t.Logf("Waiting for acquired leader election by checking lease")
// Instead of watching logs (which has timing issues in controller-runtime v0.23.1+),
// directly check the lease object to confirm leader election occurred
require.EventuallyWithT(t, func(ct *assert.CollectT) {
    var lease coordinationv1.Lease
    err := c.Get(ctx, types.NamespacedName{Name: "catalogd-operator-lock", Namespace: "olmv1-system"}, &lease)
    require.NoError(ct, err)
    require.NotNil(ct, lease.Spec.HolderIdentity, "lease should have a holder")
    require.NotEmpty(ct, *lease.Spec.HolderIdentity, "lease holder identity should not be empty")
}, 3*time.Minute, time.Second)

Benefits:

  • ✅ Timing-independent - checks the actual state, not log messages
  • ✅ More reliable - verifies the desired outcome (lease acquired)
  • ✅ No race conditions - lease object persists
  • ✅ Tests the right thing - we care about the lease, not the log message

Approach 2: Polling with TailLines (for other log messages)

For messages that MUST be verified from logs (e.g., "reconcile ending"), the implementation was changed to use polling:

func watchPodLogsForSubstring(ctx context.Context, pod *corev1.Pod, substrings ...string) (bool, error) {
    // Use a polling approach to periodically check pod logs for the substrings
    // This handles controller-runtime v0.23.1+ where leader election happens faster
    // and the message may be emitted before we start watching with Follow
    ticker := time.NewTicker(2 * time.Second)
    defer ticker.Stop()

    for {
        select {
        case <-ctx.Done():
            return false, ctx.Err()
        case <-ticker.C:
            // Get recent logs without Follow to check if message exists
            logOpts := corev1.PodLogOptions{
                Container: container,
                TailLines: ptr.To(int64(1000)),  // <-- Check EXISTING logs
            }
            // ... check logs for substring ...
        }
    }
}

Benefits:

  • ✅ Checks existing logs, not just new ones
  • ✅ Handles messages emitted before watching starts
  • ✅ Still timing-independent

Behavior Verification

Old Behavior

  • ❌ Waited for "successfully acquired lease" log message
  • ❌ Could miss fast leader elections
  • ❌ Relied on timing

New Behavior

  • ✅ Checks that lease.Spec.HolderIdentity is set
  • ✅ Verifies leader election occurred (same goal, better method)
  • ✅ Independent of timing

Are they equivalent?

YES - Both verify that leader election completed successfully:

  • Old: "If log says 'successfully acquired lease', then leader election worked"
  • New: "If lease has a HolderIdentity, then leader election worked"

The new approach is actually MORE accurate because:

  1. It checks the actual state object, not a log message
  2. It can't be affected by log rotation, buffering, or truncation
  3. It's the single source of truth for leader election status

@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 10, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 12 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 10, 2026
…0.23.0

controller-runtime v0.23.0 introduced type-safe webhook builders using generics.
Webhook methods now receive typed objects directly instead of runtime.Object,
eliminating the need for type assertions and providing compile-time type safety.

Changes:
- Update ClusterCatalog webhook Default method to use *ocv1.ClusterCatalog
- Update SetupWebhookWithManager to pass type to NewWebhookManagedBy
- Simplify tests to leverage type safety (no type assertion tests needed)


Assisted-by: Cursor/Claude
controller-runtime v0.23.0 changed test environment teardown timing behavior.
Direct calls to testEnv.Stop() can fail intermittently due to graceful shutdown timing,
requiring retry logic to ensure proper cleanup.

Changes:
- Add StopWithRetry helper function to test/utils.go with retry logic
- Wrap testEnv.Stop() calls with retry helper (1-minute timeout, 1-second interval)
- Add error logging for each retry attempt to aid debugging
- Update api/v1/suite_test.go to use test.StopWithRetry
- Update internal/operator-controller/controllers/suite_test.go to use test.StopWithRetry

This shared utility follows DRY principle and ensures consistent teardown behavior
across all test suites, preventing flaky test failures during teardown.

Assisted-by: Cursor/Claude
Co-authored-by: Cursor <cursoragent@cursor.com>

# Conflicts:
#	api/v1/suite_test.go
#	internal/operator-controller/controllers/suite_test.go
…v0.23.0

controller-runtime v0.23.0 added Apply method to the StatusWriter interface
for server-side apply support on status subresources.

Changes:
- Add Apply method to statusWriterMock to satisfy StatusWriter interface
- Method signature: Apply(context.Context, runtime.ApplyConfiguration, ...SubResourceApplyOption)


Assisted-by: Cursor/Claude
The client.Apply patch type constant is deprecated in controller-runtime v0.23.0+
in favor of typed Apply methods that require generated apply configurations.

However, server-side apply with field ownership management is essential for this
controller to properly manage ClusterExtensionRevision lifecycle. The deprecated
client.Apply provides:
- Field ownership via client.FieldOwner()
- Force ownership via client.ForceOwnership
- Automatic create-or-update semantics

Alternative approaches (MergeFrom, StrategicMerge) lack field management support,
causing upgrade test failures where ownership conflicts prevent proper updates during
controller upgrades.

Changes:
- Keep client.Apply usage with comprehensive nolint explanation
- Document why server-side apply semantics are required
- Note migration path: generate apply configurations project-wide

This ensures controllers work correctly during upgrades while documenting the
technical debt for future migration.

Assisted-by: Cursor/Claude
The fake.NewSimpleClientset is deprecated in client-go in favor of
fake.NewClientset which provides better support for field management
and server-side apply testing.

The migration is straightforward for tests that don't require apply
configurations - simply replace NewSimpleClientset() with NewClientset().
The new implementation works seamlessly with existing reactor-based mocking.

Changes:
- Replace fake.NewSimpleClientset() with fake.NewClientset()
- Remove nolint suppression comment
- All tests pass without modifications

Assisted-by: Cursor/Claude
Address code review feedback by making the Apply method implementation
more defensive:

- Add applyCalled tracking field to statusWriterMock
- Return error if Apply is unexpectedly called during tests
- This ensures tests fail immediately if the code starts using Apply,
  rather than silently passing with incorrect assumptions

The error message clearly indicates this method is not expected to be
used, making debugging easier if the interface usage changes in the future.

Assisted-by: Cursor/Claude
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 10, 2026
camilamacedo86 and others added 2 commits February 10, 2026 09:18
Controller-runtime v0.23.1+ has faster leader election startup, causing timing
issues with the upgrade e2e tests that watch pod logs for leader election messages.

The log watching approach fails because it uses Follow=true which only streams NEW
logs, missing messages emitted before the test starts watching. With faster startup
in v0.23.1+, leader election often completes before the test begins watching.

Changes:
- Replace log watching with direct lease object checking for leader election
- Poll lease.Spec.HolderIdentity to confirm leader election occurred
- Refactor watchPodLogsForSubstring to use polling with TailLines instead of Follow
- Add per-poll context timeout (5s) to prevent hanging on slow log streams
- Add scanner error checking to detect and handle stream errors
- Fix context cancellation timing - pollCancel() called after stream operations complete
- Add k8s.io/utils/ptr import for log options

This approach is timing-independent and directly verifies the desired state
rather than relying on log message observation. The per-poll timeout prevents
tests from hanging indefinitely while still respecting the outer context timeout.

Fixes address Copilot review comments about context management and error handling.

Assisted-by: Cursor/Claude
Co-authored-by: Cursor <cursoragent@cursor.com>
…ernetes v1.35

Kubernetes v1.35 changed CEL validation error message format to include
the actual validated value instead of just the type ("string" or "object").
This provides more helpful error messages for debugging validation failures.

This commit was accidentally dropped during interactive rebase.

Changes:
- Update test expectations to include actual values in error messages
- Change from 'Invalid value: "string"' to 'Invalid value: "actual-value"'
- Update 20+ test cases in TestImageSourceCELValidationRules and related tests
- Fix indentation consistency across all test cases

Assisted-by: Cursor/Claude
Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 12 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Address code review feedback from @perdasilva: the "final attempt" after
timeout creates a confusing false negative. If the final attempt succeeds
after we've logged "timeout reached", we return success despite logging
a timeout message.

Changes:
- Remove final attempt after timeout
- Return error immediately when timeout is reached
- Clearer error message with timeout duration
- Prevents false negatives where timeout is logged but success is returned

This makes the function behavior match its log messages - if we say we
timed out, we actually return an error.

Assisted-by: Cursor/Claude
Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants