Skip to content

improve: reconciliation counts as retry attempt only if close to retry deadline#3380

Open
csviri wants to merge 5 commits into
operator-framework:nextfrom
csviri:retry-finetune
Open

improve: reconciliation counts as retry attempt only if close to retry deadline#3380
csviri wants to merge 5 commits into
operator-framework:nextfrom
csviri:retry-finetune

Conversation

@csviri
Copy link
Copy Markdown
Collaborator

@csviri csviri commented May 26, 2026

Currently there is already a retry happening, and we receive an event that triggers the reconciliation counts as a retry. But this way we can fairly easily exhaust retries. So algorithm should is changed a reconciliation counts as a retry only if it is close enough to a scheduled retry.

Signed-off-by: Attila Mészáros a_meszaros@apple.com

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 26, 2026
@csviri csviri changed the title improve: retry is exhausted by non-retry events [WIP] improve: retry is exhausted by non-retry events May 26, 2026
@csviri csviri marked this pull request as ready for review May 27, 2026 10:59
Copilot AI review requested due to automatic review settings May 27, 2026 10:59
@openshift-ci openshift-ci Bot requested review from metacosm and xstefank May 27, 2026 10:59
@csviri csviri changed the title [WIP] improve: retry is exhausted by non-retry events improve: retry is exhausted by non-retry events May 27, 2026
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 27, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adjusts retry behavior so that a failed reconciliation triggered by an external event while inside an already-armed retry window does not consume a retry attempt; instead the original retry deadline is preserved (when more than 5s remains). Previously every failure during a retry window advanced the retry counter, which could exhaust retries quickly under frequent events.

Changes:

  • Added RetryExecution#remainingDurationUntilNextRetry() (default empty) and implemented it in GenericRetryExecution by tracking the wall-clock time of the last nextDelay() call.
  • In EventProcessor#handleRetryOnException, when no superseding event is present and the remaining retry window exceeds a 5s threshold, reschedule on the existing deadline without invoking nextDelay().
  • Added unit tests for the three branches and an integration test verifying the retry counter does not advance under frequent updates; docs updated accordingly.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file
File Description
operator-framework-core/.../retry/RetryExecution.java Adds remainingDurationUntilNextRetry() default method to the interface.
operator-framework-core/.../retry/GenericRetryExecution.java Records timestamp of last nextDelay() call; computes remaining time until deadline.
operator-framework-core/.../event/EventProcessor.java Preserves existing retry deadline when remaining > 5s threshold, skipping retry counter advance.
operator-framework-core/.../event/EventProcessorTest.java Adds tests covering preserve-deadline, consume-attempt, and first-failure branches.
operator-framework/.../retry/RetryTestCustomReconciler.java Tracks max observed RetryInfo attempt count for assertions.
operator-framework/.../retry/RetryIntervalHonoredOnFrequentEventsIT.java New IT verifying frequent updates within retry window do not exhaust retry counter.
docs/.../error-handling-retries.md Documents the new preserve-deadline behavior.

* window is allowed to consume a retry attempt (i.e. advance the retry counter). Above this
* threshold the existing retry deadline is preserved instead.
*/
private static final long RETRY_DEADLINE_PRESERVE_THRESHOLD_MILLIS = 5_000;
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is up to debate if this should be configurable, but maybe we can for now stick KISS principle

@csviri csviri changed the title improve: retry is exhausted by non-retry events improve: reconciliation counts as retry attempt only if close to retry deadline May 27, 2026
csviri added 5 commits May 27, 2026 13:19
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants