refactor(errors)!: a lean, idiomatic DataRetrievalError taxonomy by thodson-usgs · Pull Request #319 · DOI-USGS/dataretrieval-python

thodson-usgs · 2026-06-03T01:16:32Z

What

A small, idiomatic exception taxonomy: every request failure raises a subclass of DataRetrievalError, so one except dataretrieval.DataRetrievalError handles any of them. It adds only what the underlying httpx exceptions can't express.

DataRetrievalError(Exception)         # .status_code / .retry_after / .retryable
├─ HTTPError                  # .status_code — the server returned an error status
│   └─ TransientError         # .retry_after — retryable (429 / 5xx)
│       ├─ RateLimited        #   429
│       └─ ServiceUnavailable #   5xx
├─ RequestTooLarge            # the request can't fit
│   ├─ URLTooLong             #   414 / client-side over-long URL
│   └─ Unchunkable            #   the Water Data chunker can't split the call
├─ NetworkError               # no response: timeout / DNS / refused connection
└─ NoSitesError               # no-data on the legacy nwis path (see "no-data" below)

A single factory, dataretrieval.exceptions.error_for_status(...), maps a status to its type, and every request path routes through it — the legacy query path (nwis/wqp/nldi), the Water Data chunker, and nadp/streamstats — so a given status surfaces as the same type everywhere.

Fatal 4xx → a generic HTTPError carrying .status_code (inspect the code rather than a class per code).
Retryable (429 / 5xx) → RateLimited / ServiceUnavailable (both TransientError), carrying .retry_after.
Domain failures → Unchunkable, URLTooLong (keeps the actionable "split your query" message on the legacy path).
Connection-level failures (timeout, DNS, refused) → NetworkError, with the underlying httpx exception on __cause__.

Branch or retry without knowing the concrete type

Every DataRetrievalError exposes three read-anywhere fields — no isinstance, no httpx import:

.status_code — the HTTP status, or None when the failure carried no response.
.retry_after — seconds the server hinted (its Retry-After), or None.
.retryable — True for the transient statuses (429 / 5xx) and connection failures; False otherwise.

for attempt in range(5):
    try:
        df, md = dataretrieval.waterdata.get_daily(...)
        break
    except dataretrieval.DataRetrievalError as e:
        if not e.retryable or attempt == 4:
            raise
        time.sleep(e.retry_after or 2 ** attempt)

A new "Handling errors" user-guide page documents these recipes (catch-all, retry-with-backoff, branch-on-status, resume a chunked call).

No-data is not an error

The modern getters (waterdata, wqp, nldi) return an empty DataFrame when nothing matches — check df.empty, don't catch. Only the deprecated nwis (waterservices) path still raises NoSitesError on no data.

Cross-process

The typed errors are picklable via the standard __getstate__/__setstate__ protocol, so they propagate cleanly back from multiprocessing / lithops workers. A ChunkInterrupted sheds its live .call resume handle across a process boundary (keeping the partial frame/response and counts), while in-process callers still get full exc.call.resume().

Breaking changes

Request failures raise typed DataRetrievalError subclasses instead of bare ValueError / RuntimeError / httpx.HTTPStatusError. The tree roots only at DataRetrievalError(Exception) — no more builtin mixins, so except ValueError / except RuntimeError no longer catch them. This now also covers ChunkInterrupted (was a RuntimeError) and mid-pagination failures (were a bare RuntimeError). Catch DataRetrievalError (or a subclass).
Connection failures are wrapped as NetworkError instead of raw httpx exceptions on the single-shot paths — catch NetworkError (or DataRetrievalError); the httpx exception is on __cause__.
A fatal 4xx is an HTTPError (read .status_code); there are no per-code exception types.

Verification

ruff clean (pre-commit hooks). mypy --strict and the full pytest suite are re-run by CI on push. The taxonomy was previously exercised green locally: typed errors round-trip through pickle / deepcopy (including ChunkInterrupted with real partial data and NetworkError); every type exposes .status_code/.retry_after/.retryable uniformly; async chunked requests resume after a transient interruption (from both a sync caller and inside a running event loop). Adds a dataretrieval.exceptions API reference page, a "Handling errors" user guide, and a NEWS.md entry.

🤖 Generated with Claude Code

thodson-usgs

Please address my suggestions.

thodson-usgs · 2026-06-04T19:24:16Z

+# ``RuntimeError`` base retained deliberately: the request-error taxonomy in
+# ``exceptions.py`` shed its ``ValueError`` / ``RuntimeError`` mixins, but
+# ``ChunkInterrupted`` predates that and long-standing callers catch it as a
+# ``RuntimeError``; keeping the base preserves their ``except RuntimeError``
+# handlers. It is *also* a ``DataRetrievalError``, so the unified
+# ``except DataRetrievalError`` clause covers it too.


don't worry about long-standing callers here. We only recently introduced chunking. You can probably omit historical explanation. We want to jump straight to the longterm solution here.

thodson-usgs · 2026-06-04T19:25:58Z

+    def _pickle_state(self) -> dict[str, Any]:
+        # ``.call`` is a live ChunkedCall whose ``.fetch`` is a decorated module
+        # function stdlib pickle can't serialize by name, so a real
+        # ChunkInterrupted can't otherwise cross a multiprocessing boundary. Ship
+        # the documented degraded ``call=None`` state instead: ``.resume()`` is
+        # gone (un-resumable cross-process regardless), but the counts, retry
+        # hint, and partial_frame / partial_response survive.


Similarly, the documentation should focus on the current implementation. Not what was done previously

thodson-usgs · 2026-06-04T19:27:03Z

+        On an initial-page parse failure (wrapped via
+        :func:`_paginated_failure_message` with the original exception on
+        ``__cause__``), or any failure on a subsequent page (same wrapping).


Double check, but we might omit RuntimeError, since this PR should implement the longterm solution.

thodson-usgs · 2026-06-04T19:27:47Z

+:class:`dataretrieval.DataRetrievalError` (the taxonomy lives in
+``dataretrieval.exceptions``). Connection-level failures (timeouts, DNS) still


can we simplify the wording here. Please copy edit.

thodson-usgs · 2026-06-04T19:28:35Z

+Connection-level failures (timeouts, DNS, refused connections) still surface as
+``httpx`` exceptions on the single-shot request paths.


Is it cleaner or simpler to handle these as DataRetrievalError also?

thodson-usgs · 2026-06-04T19:29:37Z

+    def __reduce__(self) -> tuple[Any, ...]:
+        # The status subclasses take keyword-only fields (status_code /
+        # retry_after) that the default ``BaseException.__reduce__`` can't
+        # round-trip: it rebuilds via ``cls(*self.args)``, which raises TypeError
+        # before the saved state is applied. Rebuild from args + state via
+        # ``__new__`` instead so every subclass survives a pickle / deepcopy back
+        # from a multiprocessing / lithops worker. A subclass holding
+        # un-picklable live state (e.g. the chunker's ``ChunkInterrupted.call``)
+        # overrides ``_pickle_state`` to shed it.
+        return (_rebuild_error, (self.__class__, self.args, self._pickle_state()))
+
+    def _pickle_state(self) -> dict[str, Any]:
+        """Instance state to serialize in :meth:`__reduce__`.
+
+        Defaults to ``__dict__``; override to drop attributes that can't pickle.
+        """


be sure this is absolutely necessary for the implementation. It smells fish.

thodson-usgs · 2026-06-04T19:30:14Z

+#: Deprecated alias for :class:`NoDataError`. The original name leaked NWIS-era
+#: "sites" terminology; it is retained so existing ``except NoSitesError``
+#: handlers keep working, and will be removed in a future release.
+NoSitesError = NoDataError


Go ahead and add a deprecation worning on NoSitesError

thodson-usgs · 2026-06-04T19:31:26Z

+        :class:`~dataretrieval.exceptions.NoDataError` when a 200 response
+        reports no data matched. Connection-level failures (timeouts, DNS)


In general, are you sure we treat nodata as an error? I thought the correct pattern was to return an empty data frame.

Every request failure raises a subclass of DataRetrievalError, so a caller can handle any of them with a single `except dataretrieval.DataRetrievalError`. The taxonomy stays small -- it adds only what the underlying httpx exceptions can't express: DataRetrievalError(Exception) # .status_code / .retry_after / .retryable |- HTTPError # .status_code -- the server returned an error status | '- TransientError # .retry_after -- retryable (429 / 5xx) | |- RateLimited # 429 | '- ServiceUnavailable # 5xx |- RequestTooLarge # the request can't fit | |- URLTooLong # 414 / client-side over-long URL | '- Unchunkable # the Water Data chunker can't split the call |- NetworkError # no response: timeout / DNS / refused connection '- NoSitesError # no-data on the legacy nwis path (see below) One factory -- error_for_status(status, message, *, retry_after) -- maps a status to its type, and every request path routes through it (the legacy `query` path, the Water Data chunker, nldi, nadp, streamstats), so a given status surfaces as the same type everywhere. A fatal 4xx is a generic HTTPError carrying .status_code (inspect the code rather than a class per code). The chunker keys retry/resume on TransientError. Every DataRetrievalError exposes three read-anywhere fields -- .status_code (None when there is no HTTP status), .retry_after, and .retryable -- so a single `except DataRetrievalError as e` clause can branch on the status or drive a backoff loop without importing or isinstance-checking the concrete subclass. Connection-level failures (no HTTP response: timeout, DNS, refused connection) are wrapped as NetworkError, with the underlying httpx exception on __cause__, so one `except DataRetrievalError` truly spans every failure. The single-shot paths route their GETs through a thin `utils._get` wrapper that does the translation; the chunker keeps its own client and wraps transport failures as resumable interruptions instead. NetworkError carries no .status_code but is .retryable; with TransientError it forms the retryable set. A no-data result is not an error: the modern getters (waterdata, wqp, nldi) return an empty DataFrame when nothing matches. Only the deprecated nwis (waterservices) path still raises NoSitesError on no data. The typed errors are picklable via the standard __getstate__/__setstate__ protocol, so they survive a pickle / deepcopy back from a multiprocessing / lithops worker. A chunk-interruption error sheds its live resume handle (.call) on that trip -- keeping the diagnostic counts and partial frame/response -- while in-process callers still get full `exc.call.resume()`. A too-long-URL status (413 / 414) on the legacy `query` path keeps the actionable "split your query" remediation message (the same one the client-side over-long-URL case raises), rather than degrading to a bare HTTP-status line. BREAKING CHANGES - Request failures raise typed DataRetrievalError subclasses instead of bare ValueError / RuntimeError / httpx.HTTPStatusError. The exceptions root only at DataRetrievalError(Exception) and no longer also inherit ValueError / RuntimeError -- catch DataRetrievalError (or a subclass), not the builtins. This now also covers ChunkInterrupted (previously a RuntimeError) and mid-pagination failures (previously a bare RuntimeError). - Connection-level failures are wrapped as NetworkError instead of surfacing as raw httpx exceptions on the single-shot paths -- catch NetworkError (or DataRetrievalError); the httpx exception is preserved on __cause__. - A fatal 4xx raises HTTPError (read .status_code); there are no per-code types. Also adds a dataretrieval.exceptions API docs page, a "Handling errors" user guide, and a NEWS.md changelog entry. ruff clean (pre-commit hooks); mypy --strict and the full pytest suite are re-verified by CI on push. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

thodson-usgs changed the title ~~fix(errors): correct DataRetrievalError taxonomy follow-ups from code review~~ fix(errors): unify HTTP status→exception across all request paths Jun 3, 2026

thodson-usgs force-pushed the fix/error-taxonomy-followups branch from 3651f23 to 108b1ca Compare June 3, 2026 16:20

thodson-usgs changed the title ~~fix(errors): unify HTTP status→exception across all request paths~~ refactor(errors)!: a lean, idiomatic DataRetrievalError taxonomy Jun 3, 2026

thodson-usgs force-pushed the fix/error-taxonomy-followups branch 6 times, most recently from 0c183b7 to 63526fd Compare June 4, 2026 18:47

thodson-usgs commented Jun 4, 2026

View reviewed changes

thodson-usgs force-pushed the fix/error-taxonomy-followups branch 8 times, most recently from d348c12 to 38f6d58 Compare June 5, 2026 15:59

thodson-usgs force-pushed the fix/error-taxonomy-followups branch from 38f6d58 to d778e38 Compare June 5, 2026 16:05

thodson-usgs marked this pull request as ready for review June 5, 2026 17:14

thodson-usgs merged commit ff8f535 into DOI-USGS:main Jun 5, 2026
9 checks passed

thodson-usgs deleted the fix/error-taxonomy-followups branch June 5, 2026 17:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(errors)!: a lean, idiomatic DataRetrievalError taxonomy#319

refactor(errors)!: a lean, idiomatic DataRetrievalError taxonomy#319
thodson-usgs merged 1 commit into
DOI-USGS:mainfrom
thodson-usgs:fix/error-taxonomy-followups

thodson-usgs commented Jun 3, 2026 •

edited

Loading

Uh oh!

thodson-usgs left a comment

Uh oh!

thodson-usgs Jun 4, 2026

Uh oh!

thodson-usgs Jun 4, 2026

Uh oh!

thodson-usgs Jun 4, 2026

Uh oh!

thodson-usgs Jun 4, 2026

Uh oh!

thodson-usgs Jun 4, 2026

Uh oh!

thodson-usgs Jun 4, 2026

Uh oh!

thodson-usgs Jun 4, 2026

Uh oh!

thodson-usgs Jun 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		:class:`dataretrieval.DataRetrievalError` (the taxonomy lives in
		``dataretrieval.exceptions``). Connection-level failures (timeouts, DNS) still

		Connection-level failures (timeouts, DNS, refused connections) still surface as
		``httpx`` exceptions on the single-shot request paths.

		:class:`~dataretrieval.exceptions.NoDataError` when a 200 response
		reports no data matched. Connection-level failures (timeouts, DNS)

Conversation

thodson-usgs commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Branch or retry without knowing the concrete type

No-data is not an error

Cross-process

Breaking changes

Verification

Uh oh!

thodson-usgs left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

thodson-usgs commented Jun 3, 2026 •

edited

Loading