refactor(errors)!: a lean, idiomatic DataRetrievalError taxonomy#319
Conversation
3651f23 to
108b1ca
Compare
0c183b7 to
63526fd
Compare
thodson-usgs
left a comment
There was a problem hiding this comment.
Please address my suggestions.
| # ``RuntimeError`` base retained deliberately: the request-error taxonomy in | ||
| # ``exceptions.py`` shed its ``ValueError`` / ``RuntimeError`` mixins, but | ||
| # ``ChunkInterrupted`` predates that and long-standing callers catch it as a | ||
| # ``RuntimeError``; keeping the base preserves their ``except RuntimeError`` | ||
| # handlers. It is *also* a ``DataRetrievalError``, so the unified | ||
| # ``except DataRetrievalError`` clause covers it too. |
There was a problem hiding this comment.
don't worry about long-standing callers here. We only recently introduced chunking. You can probably omit historical explanation. We want to jump straight to the longterm solution here.
| def _pickle_state(self) -> dict[str, Any]: | ||
| # ``.call`` is a live ChunkedCall whose ``.fetch`` is a decorated module | ||
| # function stdlib pickle can't serialize by name, so a real | ||
| # ChunkInterrupted can't otherwise cross a multiprocessing boundary. Ship | ||
| # the documented degraded ``call=None`` state instead: ``.resume()`` is | ||
| # gone (un-resumable cross-process regardless), but the counts, retry | ||
| # hint, and partial_frame / partial_response survive. |
There was a problem hiding this comment.
Similarly, the documentation should focus on the current implementation. Not what was done previously
| On an initial-page parse failure (wrapped via | ||
| :func:`_paginated_failure_message` with the original exception on | ||
| ``__cause__``), or any failure on a subsequent page (same wrapping). |
There was a problem hiding this comment.
Double check, but we might omit RuntimeError, since this PR should implement the longterm solution.
| :class:`dataretrieval.DataRetrievalError` (the taxonomy lives in | ||
| ``dataretrieval.exceptions``). Connection-level failures (timeouts, DNS) still |
There was a problem hiding this comment.
can we simplify the wording here. Please copy edit.
| Connection-level failures (timeouts, DNS, refused connections) still surface as | ||
| ``httpx`` exceptions on the single-shot request paths. |
There was a problem hiding this comment.
Is it cleaner or simpler to handle these as DataRetrievalError also?
| def __reduce__(self) -> tuple[Any, ...]: | ||
| # The status subclasses take keyword-only fields (status_code / | ||
| # retry_after) that the default ``BaseException.__reduce__`` can't | ||
| # round-trip: it rebuilds via ``cls(*self.args)``, which raises TypeError | ||
| # before the saved state is applied. Rebuild from args + state via | ||
| # ``__new__`` instead so every subclass survives a pickle / deepcopy back | ||
| # from a multiprocessing / lithops worker. A subclass holding | ||
| # un-picklable live state (e.g. the chunker's ``ChunkInterrupted.call``) | ||
| # overrides ``_pickle_state`` to shed it. | ||
| return (_rebuild_error, (self.__class__, self.args, self._pickle_state())) | ||
|
|
||
| def _pickle_state(self) -> dict[str, Any]: | ||
| """Instance state to serialize in :meth:`__reduce__`. | ||
|
|
||
| Defaults to ``__dict__``; override to drop attributes that can't pickle. | ||
| """ |
There was a problem hiding this comment.
be sure this is absolutely necessary for the implementation. It smells fish.
| #: Deprecated alias for :class:`NoDataError`. The original name leaked NWIS-era | ||
| #: "sites" terminology; it is retained so existing ``except NoSitesError`` | ||
| #: handlers keep working, and will be removed in a future release. | ||
| NoSitesError = NoDataError |
There was a problem hiding this comment.
Go ahead and add a deprecation worning on NoSitesError
| :class:`~dataretrieval.exceptions.NoDataError` when a 200 response | ||
| reports no data matched. Connection-level failures (timeouts, DNS) |
There was a problem hiding this comment.
In general, are you sure we treat nodata as an error? I thought the correct pattern was to return an empty data frame.
d348c12 to
38f6d58
Compare
Every request failure raises a subclass of DataRetrievalError, so a caller can handle any of them with a single `except dataretrieval.DataRetrievalError`. The taxonomy stays small -- it adds only what the underlying httpx exceptions can't express: DataRetrievalError(Exception) # .status_code / .retry_after / .retryable |- HTTPError # .status_code -- the server returned an error status | '- TransientError # .retry_after -- retryable (429 / 5xx) | |- RateLimited # 429 | '- ServiceUnavailable # 5xx |- RequestTooLarge # the request can't fit | |- URLTooLong # 414 / client-side over-long URL | '- Unchunkable # the Water Data chunker can't split the call |- NetworkError # no response: timeout / DNS / refused connection '- NoSitesError # no-data on the legacy nwis path (see below) One factory -- error_for_status(status, message, *, retry_after) -- maps a status to its type, and every request path routes through it (the legacy `query` path, the Water Data chunker, nldi, nadp, streamstats), so a given status surfaces as the same type everywhere. A fatal 4xx is a generic HTTPError carrying .status_code (inspect the code rather than a class per code). The chunker keys retry/resume on TransientError. Every DataRetrievalError exposes three read-anywhere fields -- .status_code (None when there is no HTTP status), .retry_after, and .retryable -- so a single `except DataRetrievalError as e` clause can branch on the status or drive a backoff loop without importing or isinstance-checking the concrete subclass. Connection-level failures (no HTTP response: timeout, DNS, refused connection) are wrapped as NetworkError, with the underlying httpx exception on __cause__, so one `except DataRetrievalError` truly spans every failure. The single-shot paths route their GETs through a thin `utils._get` wrapper that does the translation; the chunker keeps its own client and wraps transport failures as resumable interruptions instead. NetworkError carries no .status_code but is .retryable; with TransientError it forms the retryable set. A no-data result is not an error: the modern getters (waterdata, wqp, nldi) return an empty DataFrame when nothing matches. Only the deprecated nwis (waterservices) path still raises NoSitesError on no data. The typed errors are picklable via the standard __getstate__/__setstate__ protocol, so they survive a pickle / deepcopy back from a multiprocessing / lithops worker. A chunk-interruption error sheds its live resume handle (.call) on that trip -- keeping the diagnostic counts and partial frame/response -- while in-process callers still get full `exc.call.resume()`. A too-long-URL status (413 / 414) on the legacy `query` path keeps the actionable "split your query" remediation message (the same one the client-side over-long-URL case raises), rather than degrading to a bare HTTP-status line. BREAKING CHANGES - Request failures raise typed DataRetrievalError subclasses instead of bare ValueError / RuntimeError / httpx.HTTPStatusError. The exceptions root only at DataRetrievalError(Exception) and no longer also inherit ValueError / RuntimeError -- catch DataRetrievalError (or a subclass), not the builtins. This now also covers ChunkInterrupted (previously a RuntimeError) and mid-pagination failures (previously a bare RuntimeError). - Connection-level failures are wrapped as NetworkError instead of surfacing as raw httpx exceptions on the single-shot paths -- catch NetworkError (or DataRetrievalError); the httpx exception is preserved on __cause__. - A fatal 4xx raises HTTPError (read .status_code); there are no per-code types. Also adds a dataretrieval.exceptions API docs page, a "Handling errors" user guide, and a NEWS.md changelog entry. ruff clean (pre-commit hooks); mypy --strict and the full pytest suite are re-verified by CI on push. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
38f6d58 to
d778e38
Compare
What
A small, idiomatic exception taxonomy: every request failure raises a subclass of
DataRetrievalError, so oneexcept dataretrieval.DataRetrievalErrorhandles any of them. It adds only what the underlyinghttpxexceptions can't express.A single factory,
dataretrieval.exceptions.error_for_status(...), maps a status to its type, and every request path routes through it — the legacyquerypath (nwis/wqp/nldi), the Water Data chunker, andnadp/streamstats— so a given status surfaces as the same type everywhere.HTTPErrorcarrying.status_code(inspect the code rather than a class per code).RateLimited/ServiceUnavailable(bothTransientError), carrying.retry_after.Unchunkable,URLTooLong(keeps the actionable "split your query" message on the legacy path).NetworkError, with the underlyinghttpxexception on__cause__.Branch or retry without knowing the concrete type
Every
DataRetrievalErrorexposes three read-anywhere fields — noisinstance, nohttpximport:.status_code— the HTTP status, orNonewhen the failure carried no response..retry_after— seconds the server hinted (itsRetry-After), orNone..retryable—Truefor the transient statuses (429 / 5xx) and connection failures;Falseotherwise.A new "Handling errors" user-guide page documents these recipes (catch-all, retry-with-backoff, branch-on-status, resume a chunked call).
No-data is not an error
The modern getters (
waterdata,wqp,nldi) return an emptyDataFramewhen nothing matches — checkdf.empty, don't catch. Only the deprecatednwis(waterservices) path still raisesNoSitesErroron no data.Cross-process
The typed errors are picklable via the standard
__getstate__/__setstate__protocol, so they propagate cleanly back from multiprocessing / lithops workers. AChunkInterruptedsheds its live.callresume handle across a process boundary (keeping the partial frame/response and counts), while in-process callers still get fullexc.call.resume().Breaking changes
DataRetrievalErrorsubclasses instead of bareValueError/RuntimeError/httpx.HTTPStatusError. The tree roots only atDataRetrievalError(Exception)— no more builtin mixins, soexcept ValueError/except RuntimeErrorno longer catch them. This now also coversChunkInterrupted(was aRuntimeError) and mid-pagination failures (were a bareRuntimeError). CatchDataRetrievalError(or a subclass).NetworkErrorinstead of rawhttpxexceptions on the single-shot paths — catchNetworkError(orDataRetrievalError); the httpx exception is on__cause__.HTTPError(read.status_code); there are no per-code exception types.Verification
ruffclean (pre-commit hooks).mypy --strictand the fullpytestsuite are re-run by CI on push. The taxonomy was previously exercised green locally: typed errors round-trip through pickle / deepcopy (includingChunkInterruptedwith real partial data andNetworkError); every type exposes.status_code/.retry_after/.retryableuniformly; async chunked requests resume after a transient interruption (from both a sync caller and inside a running event loop). Adds adataretrieval.exceptionsAPI reference page, a "Handling errors" user guide, and aNEWS.mdentry.🤖 Generated with Claude Code