Summary
Only fetch_events is decorated with @retry_on_connection_failure. Every other HTTP-making method on OverkizClient — including the command path (_execute_action_group_direct) and all setup/state/refresh calls — lacks it, so a transient TimeoutError / aiohttp.ClientConnectorError propagates out raw on the very first occurrence instead of being retried.
This surfaced in Home Assistant: a ConnectionTimeoutError (subclass of the builtin TimeoutError) raised from a cover close command escaped as an unhandled traceback, because execute_action_group → _execute_action_group_direct carries only @retry_on_too_many_executions + @retry_on_auth_error.
Decorator definition
retry_on_connection_failure = backoff.on_exception(
backoff.expo,
(TimeoutError, ClientConnectorError),
max_tries=5,
max_time=120,
...
)
Audit of HTTP methods missing @retry_on_connection_failure
fetch_events is the only method that has it. Everything below makes an HTTP request (_get/_post/_put/_delete) and does not:
- Command / execution:
_execute_action_group_direct, execute_persisted_action_group, schedule_persisted_action_group, cancel_execution
- Core polling/setup:
get_setup, get_devices, get_gateways, get_state, refresh_states, refresh_device_states, get_current_execution, get_current_executions, get_execution_history, register_event_listener, unregister_event_listener, get_api_version
- Everything else:
get_diagnostic_data, get_device_definition, get_action_groups, get_places, all get_reference_*, all firmware methods, all local-token methods, all developer-mode methods, search_reference_devices, etc.
Note: ServerDisconnectedError is retried (once) only as a side effect of retry_on_auth_error's relogin path; plain TimeoutError / ClientConnectorError are retried nowhere except fetch_events.
Suggested fix
Apply @retry_on_connection_failure consistently to the request-issuing methods — or, more robustly, centralize it in the _get/_post/_put/_delete helpers so every call gets uniform transient-connection retry, removing the need to remember it per-method. The command path in particular should retry transient timeouts before giving up.
Context
Found while reviewing home-assistant/core#173155. On the HA side we added a catch that converts these to HomeAssistantError so the user sees a clean error instead of a traceback, but the underlying retry gap belongs here in the library.
Summary
Only
fetch_eventsis decorated with@retry_on_connection_failure. Every other HTTP-making method onOverkizClient— including the command path (_execute_action_group_direct) and all setup/state/refresh calls — lacks it, so a transientTimeoutError/aiohttp.ClientConnectorErrorpropagates out raw on the very first occurrence instead of being retried.This surfaced in Home Assistant: a
ConnectionTimeoutError(subclass of the builtinTimeoutError) raised from a coverclosecommand escaped as an unhandled traceback, becauseexecute_action_group→_execute_action_group_directcarries only@retry_on_too_many_executions+@retry_on_auth_error.Decorator definition
Audit of HTTP methods missing
@retry_on_connection_failurefetch_eventsis the only method that has it. Everything below makes an HTTP request (_get/_post/_put/_delete) and does not:_execute_action_group_direct,execute_persisted_action_group,schedule_persisted_action_group,cancel_executionget_setup,get_devices,get_gateways,get_state,refresh_states,refresh_device_states,get_current_execution,get_current_executions,get_execution_history,register_event_listener,unregister_event_listener,get_api_versionget_diagnostic_data,get_device_definition,get_action_groups,get_places, allget_reference_*, all firmware methods, all local-token methods, all developer-mode methods,search_reference_devices, etc.Note:
ServerDisconnectedErroris retried (once) only as a side effect ofretry_on_auth_error's relogin path; plainTimeoutError/ClientConnectorErrorare retried nowhere exceptfetch_events.Suggested fix
Apply
@retry_on_connection_failureconsistently to the request-issuing methods — or, more robustly, centralize it in the_get/_post/_put/_deletehelpers so every call gets uniform transient-connection retry, removing the need to remember it per-method. The command path in particular should retry transient timeouts before giving up.Context
Found while reviewing home-assistant/core#173155. On the HA side we added a catch that converts these to
HomeAssistantErrorso the user sees a clean error instead of a traceback, but the underlying retry gap belongs here in the library.