-
Notifications
You must be signed in to change notification settings - Fork 71
🌱 chore(deps): upgrade k8s to 1.35 and Go dependencies (edge) #2486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for olmv1 ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot wasn't able to review any files in this pull request.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
c288cf6 to
41e7cc0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 7 out of 8 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
83cde26 to
a16409e
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2486 +/- ##
==========================================
- Coverage 73.74% 69.44% -4.30%
==========================================
Files 102 102
Lines 8496 8504 +8
==========================================
- Hits 6265 5906 -359
- Misses 1747 2129 +382
+ Partials 484 469 -15
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 9 out of 10 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
f3bb698 to
f14866b
Compare
f14866b to
4a09f4b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 10 out of 11 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
4a09f4b to
e7a5d34
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 11 out of 12 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Track in a future issue: "Generate apply configurations and migrate to typed Apply methods" | ||
| // | ||
| // nolint:staticcheck // SA1019: server-side apply required, needs generated apply configurations | ||
| return bc.Client.Patch(ctx, rev, client.Apply, client.FieldOwner(bc.FieldOwner), client.ForceOwnership) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Lets looking on that in a follow up shows very complex to go in this PR
test/utils.go
Outdated
| if lastErr != nil { | ||
| log.Printf("StopWithRetry: timeout reached before successful teardown; last error: %v", lastErr) | ||
| } | ||
| return env.Stop() // Final attempt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it likely that the final attempt works? could we be calling out a false negative with the previous log line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should just return:
fmt.Errorf("StopWithRetry: timeout reached before successful teardown; last error: %v", lastErr)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it works as it is. Within controller-runtime v0.23. has some timing changes in the test environment teardown process. Without using Eventually, testEnv.Stop() can fail intermittently with errors related to graceful shutdown timing.
So this code does the same of:
var _ = AfterSuite(func() {
By("tearing down the test environment")
cancel()
Eventually(func() error {
return testEnv.Stop()
}, time.Minute, time.Second).Should(Succeed())
})There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My question is whether we can be in a situation where
StopWithRetry: timeout reached before successful teardown; last error: ... is printed out but actually it stopped successfully in the final attempt (and whether we care)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just do
err := env.Stop() // Final attempt
if err != nil {
log ...
}
return err
then you also don't need to keep the lastErr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
pedjak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
4ce7ef4 to
f03b3e9
Compare
|
New changes are detected. LGTM label has been removed. |
f03b3e9 to
11681ab
Compare
| require.NoError(ct, err) | ||
| require.NotNil(ct, lease.Spec.HolderIdentity, "lease should have a holder") | ||
| require.NotEmpty(ct, *lease.Spec.HolderIdentity, "lease holder identity should not be empty") | ||
| }, 3*time.Minute, time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I asked for Claude evaluate those changes
Following bellow.
The Problem (WHY the change was needed)
Old Implementation (main branch)
t.Log("Waiting for acquired leader election")
leaderCtx, leaderCancel := context.WithTimeout(ctx, 3*time.Minute)
defer leaderCancel()
leaderSubstrings := []string{"successfully acquired lease"}
leaderElected, err := watchPodLogsForSubstring(leaderCtx, &managerPod, leaderSubstrings...)
require.NoError(t, err)
require.True(t, leaderElected)The watchPodLogsForSubstring function used:
podLogOpts := corev1.PodLogOptions{
Follow: true, // <-- THIS IS THE PROBLEM
Container: container,
}The Race Condition
- controller-runtime v0.23.1+ has FASTER leader election startup
- The test starts AFTER the pod is created
- By the time the test calls
watchPodLogsForSubstring, leader election has already completed Follow: trueonly streams NEW logs, missing messages already emitted- The test waits forever for a message that already passed
Timeline Visualization
controller-runtime v0.22.x (SLOW startup):
├─ Pod starts
├─ Test starts watching logs (Follow=true)
├─ Leader election happens ← Message logged and caught by test ✅
└─ Test passes
controller-runtime v0.23.1+ (FAST startup):
├─ Pod starts
├─ Leader election happens ← Message logged
├─ Test starts watching logs (Follow=true) ← Too late! Message missed ❌
└─ Test times out waiting for message
New Implementation (deps-up branch)
Approach 1: Direct Lease Checking (for leader election verification)
t.Logf("Waiting for acquired leader election by checking lease")
// Instead of watching logs (which has timing issues in controller-runtime v0.23.1+),
// directly check the lease object to confirm leader election occurred
require.EventuallyWithT(t, func(ct *assert.CollectT) {
var lease coordinationv1.Lease
err := c.Get(ctx, types.NamespacedName{Name: "catalogd-operator-lock", Namespace: "olmv1-system"}, &lease)
require.NoError(ct, err)
require.NotNil(ct, lease.Spec.HolderIdentity, "lease should have a holder")
require.NotEmpty(ct, *lease.Spec.HolderIdentity, "lease holder identity should not be empty")
}, 3*time.Minute, time.Second)Benefits:
- ✅ Timing-independent - checks the actual state, not log messages
- ✅ More reliable - verifies the desired outcome (lease acquired)
- ✅ No race conditions - lease object persists
- ✅ Tests the right thing - we care about the lease, not the log message
Approach 2: Polling with TailLines (for other log messages)
For messages that MUST be verified from logs (e.g., "reconcile ending"), the implementation was changed to use polling:
func watchPodLogsForSubstring(ctx context.Context, pod *corev1.Pod, substrings ...string) (bool, error) {
// Use a polling approach to periodically check pod logs for the substrings
// This handles controller-runtime v0.23.1+ where leader election happens faster
// and the message may be emitted before we start watching with Follow
ticker := time.NewTicker(2 * time.Second)
defer ticker.Stop()
for {
select {
case <-ctx.Done():
return false, ctx.Err()
case <-ticker.C:
// Get recent logs without Follow to check if message exists
logOpts := corev1.PodLogOptions{
Container: container,
TailLines: ptr.To(int64(1000)), // <-- Check EXISTING logs
}
// ... check logs for substring ...
}
}
}Benefits:
- ✅ Checks existing logs, not just new ones
- ✅ Handles messages emitted before watching starts
- ✅ Still timing-independent
Behavior Verification
Old Behavior
- ❌ Waited for "successfully acquired lease" log message
- ❌ Could miss fast leader elections
- ❌ Relied on timing
New Behavior
- ✅ Checks that lease.Spec.HolderIdentity is set
- ✅ Verifies leader election occurred (same goal, better method)
- ✅ Independent of timing
Are they equivalent?
YES - Both verify that leader election completed successfully:
- Old: "If log says 'successfully acquired lease', then leader election worked"
- New: "If lease has a HolderIdentity, then leader election worked"
The new approach is actually MORE accurate because:
- It checks the actual state object, not a log message
- It can't be affected by log rotation, buffering, or truncation
- It's the single source of truth for leader election status
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 11 out of 12 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
34f8634 to
eef3ba9
Compare
Assisted-by: Cursor/Claude
…0.23.0 controller-runtime v0.23.0 introduced type-safe webhook builders using generics. Webhook methods now receive typed objects directly instead of runtime.Object, eliminating the need for type assertions and providing compile-time type safety. Changes: - Update ClusterCatalog webhook Default method to use *ocv1.ClusterCatalog - Update SetupWebhookWithManager to pass type to NewWebhookManagedBy - Simplify tests to leverage type safety (no type assertion tests needed) Assisted-by: Cursor/Claude
controller-runtime v0.23.0 changed test environment teardown timing behavior. Direct calls to testEnv.Stop() can fail intermittently due to graceful shutdown timing, requiring retry logic to ensure proper cleanup. Changes: - Add StopWithRetry helper function to test/utils.go with retry logic - Wrap testEnv.Stop() calls with retry helper (1-minute timeout, 1-second interval) - Add error logging for each retry attempt to aid debugging - Update api/v1/suite_test.go to use test.StopWithRetry - Update internal/operator-controller/controllers/suite_test.go to use test.StopWithRetry This shared utility follows DRY principle and ensures consistent teardown behavior across all test suites, preventing flaky test failures during teardown. Assisted-by: Cursor/Claude Co-authored-by: Cursor <cursoragent@cursor.com> # Conflicts: # api/v1/suite_test.go # internal/operator-controller/controllers/suite_test.go
…v0.23.0 controller-runtime v0.23.0 added Apply method to the StatusWriter interface for server-side apply support on status subresources. Changes: - Add Apply method to statusWriterMock to satisfy StatusWriter interface - Method signature: Apply(context.Context, runtime.ApplyConfiguration, ...SubResourceApplyOption) Assisted-by: Cursor/Claude
The client.Apply patch type constant is deprecated in controller-runtime v0.23.0+ in favor of typed Apply methods that require generated apply configurations. However, server-side apply with field ownership management is essential for this controller to properly manage ClusterExtensionRevision lifecycle. The deprecated client.Apply provides: - Field ownership via client.FieldOwner() - Force ownership via client.ForceOwnership - Automatic create-or-update semantics Alternative approaches (MergeFrom, StrategicMerge) lack field management support, causing upgrade test failures where ownership conflicts prevent proper updates during controller upgrades. Changes: - Keep client.Apply usage with comprehensive nolint explanation - Document why server-side apply semantics are required - Note migration path: generate apply configurations project-wide This ensures controllers work correctly during upgrades while documenting the technical debt for future migration. Assisted-by: Cursor/Claude
The fake.NewSimpleClientset is deprecated in client-go in favor of fake.NewClientset which provides better support for field management and server-side apply testing. The migration is straightforward for tests that don't require apply configurations - simply replace NewSimpleClientset() with NewClientset(). The new implementation works seamlessly with existing reactor-based mocking. Changes: - Replace fake.NewSimpleClientset() with fake.NewClientset() - Remove nolint suppression comment - All tests pass without modifications Assisted-by: Cursor/Claude
Address code review feedback by making the Apply method implementation more defensive: - Add applyCalled tracking field to statusWriterMock - Return error if Apply is unexpectedly called during tests - This ensures tests fail immediately if the code starts using Apply, rather than silently passing with incorrect assumptions The error message clearly indicates this method is not expected to be used, making debugging easier if the interface usage changes in the future. Assisted-by: Cursor/Claude
eef3ba9 to
bfc9355
Compare
Controller-runtime v0.23.1+ has faster leader election startup, causing timing issues with the upgrade e2e tests that watch pod logs for leader election messages. The log watching approach fails because it uses Follow=true which only streams NEW logs, missing messages emitted before the test starts watching. With faster startup in v0.23.1+, leader election often completes before the test begins watching. Changes: - Replace log watching with direct lease object checking for leader election - Poll lease.Spec.HolderIdentity to confirm leader election occurred - Refactor watchPodLogsForSubstring to use polling with TailLines instead of Follow - Add per-poll context timeout (5s) to prevent hanging on slow log streams - Add scanner error checking to detect and handle stream errors - Fix context cancellation timing - pollCancel() called after stream operations complete - Add k8s.io/utils/ptr import for log options This approach is timing-independent and directly verifies the desired state rather than relying on log message observation. The per-poll timeout prevents tests from hanging indefinitely while still respecting the outer context timeout. Fixes address Copilot review comments about context management and error handling. Assisted-by: Cursor/Claude Co-authored-by: Cursor <cursoragent@cursor.com>
…ernetes v1.35
Kubernetes v1.35 changed CEL validation error message format to include
the actual validated value instead of just the type ("string" or "object").
This provides more helpful error messages for debugging validation failures.
This commit was accidentally dropped during interactive rebase.
Changes:
- Update test expectations to include actual values in error messages
- Change from 'Invalid value: "string"' to 'Invalid value: "actual-value"'
- Update 20+ test cases in TestImageSourceCELValidationRules and related tests
- Fix indentation consistency across all test cases
Assisted-by: Cursor/Claude
Co-authored-by: Cursor <cursoragent@cursor.com>
bfc9355 to
e0b1619
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 11 out of 12 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Address code review feedback from @perdasilva: the "final attempt" after timeout creates a confusing false negative. If the final attempt succeeds after we've logged "timeout reached", we return success despite logging a timeout message. Changes: - Remove final attempt after timeout - Return error immediately when timeout is reached - Clearer error message with timeout duration - Prevents false negatives where timeout is logged but success is returned This makes the function behavior match its log messages - if we say we timed out, we actually return an error. Assisted-by: Cursor/Claude Co-authored-by: Cursor <cursoragent@cursor.com>
e0b1619 to
efbe1f6
Compare
Description
Reviewer Checklist