fix: Retry transient connection errors on all HTTP requests#2148
Open
iMicknl wants to merge 4 commits into
Open
fix: Retry transient connection errors on all HTTP requests#2148iMicknl wants to merge 4 commits into
iMicknl wants to merge 4 commits into
Conversation
Previously only fetch_events was decorated with @retry_on_connection_failure, so a transient TimeoutError or ClientConnectorError raised from any other method — including the command path (execute_action_group) and all setup/state/refresh calls — propagated raw on the first occurrence. Centralize the retry on the _get/_post/_put/_delete helpers so every request gets uniform transient-connection retry, and drop the now redundant decorator from fetch_events. Fixes #2147
Tighten retry_on_connection_failure from 5 tries / 120s to 3 tries / 30s so a flaky connection gives up faster (~3s worst-case sleep) instead of blocking a command or poll for up to ~15s. Add a test covering the give-up-after-max-tries path.
The give-up-after-max-tries test already exercises _get through the decorator, so the GET retry-once test added no coverage beyond the command-path (_post) regression test.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #2147.
Previously only
fetch_eventswas decorated with@retry_on_connection_failure, so a transientTimeoutError/aiohttp.ClientConnectorErrorraised from any other method — including the command path (execute_action_group→_execute_action_group_direct) and all setup/state/refresh calls — propagated raw on the very first occurrence instead of being retried.This surfaced in Home Assistant (home-assistant/core#173155): a
ConnectionTimeoutError(subclass of the builtinTimeoutError) raised from a coverclosecommand escaped as an unhandled traceback.Change
Centralize the retry on the
_get/_post/_put/_deleterequest helpers (the approach suggested in the issue), so every request gets uniform transient-connection retry. This removes the need to remember the decorator per-method.The now-redundant
@retry_on_connection_failureis dropped fromfetch_eventsto avoid nested double-retry — the outerretry_on_concurrent_requests/retry_on_auth_error/retry_on_listener_errordecorators still wrap the request helpers correctly (connection retry now sits innermost, closest to the request).Tests
Added two regression tests in
tests/test_client.pythat inject the failure at the session level so it flows through the decorated helpers:test_backoff_retries_command_on_connection_failure— the motivating command-path case (TimeoutError)test_backoff_retries_get_on_connection_failure— GET path (ClientConnectorError)Both were confirmed to fail before the fix and pass after (TDD).
Verification