Skip to content

refactor(errors)!: a lean, idiomatic DataRetrievalError taxonomy#319

Merged
thodson-usgs merged 1 commit into
DOI-USGS:mainfrom
thodson-usgs:fix/error-taxonomy-followups
Jun 5, 2026
Merged

refactor(errors)!: a lean, idiomatic DataRetrievalError taxonomy#319
thodson-usgs merged 1 commit into
DOI-USGS:mainfrom
thodson-usgs:fix/error-taxonomy-followups

Conversation

@thodson-usgs
Copy link
Copy Markdown
Collaborator

@thodson-usgs thodson-usgs commented Jun 3, 2026

What

A small, idiomatic exception taxonomy: every request failure raises a subclass of DataRetrievalError, so one except dataretrieval.DataRetrievalError handles any of them. It adds only what the underlying httpx exceptions can't express.

DataRetrievalError(Exception)         # .status_code / .retry_after / .retryable
├─ HTTPError                  # .status_code — the server returned an error status
│   └─ TransientError         # .retry_after — retryable (429 / 5xx)
│       ├─ RateLimited        #   429
│       └─ ServiceUnavailable #   5xx
├─ RequestTooLarge            # the request can't fit
│   ├─ URLTooLong             #   414 / client-side over-long URL
│   └─ Unchunkable            #   the Water Data chunker can't split the call
├─ NetworkError               # no response: timeout / DNS / refused connection
└─ NoSitesError               # no-data on the legacy nwis path (see "no-data" below)

A single factory, dataretrieval.exceptions.error_for_status(...), maps a status to its type, and every request path routes through it — the legacy query path (nwis/wqp/nldi), the Water Data chunker, and nadp/streamstats — so a given status surfaces as the same type everywhere.

  • Fatal 4xx → a generic HTTPError carrying .status_code (inspect the code rather than a class per code).
  • Retryable (429 / 5xx) → RateLimited / ServiceUnavailable (both TransientError), carrying .retry_after.
  • Domain failuresUnchunkable, URLTooLong (keeps the actionable "split your query" message on the legacy path).
  • Connection-level failures (timeout, DNS, refused) → NetworkError, with the underlying httpx exception on __cause__.

Branch or retry without knowing the concrete type

Every DataRetrievalError exposes three read-anywhere fields — no isinstance, no httpx import:

  • .status_code — the HTTP status, or None when the failure carried no response.
  • .retry_after — seconds the server hinted (its Retry-After), or None.
  • .retryableTrue for the transient statuses (429 / 5xx) and connection failures; False otherwise.
for attempt in range(5):
    try:
        df, md = dataretrieval.waterdata.get_daily(...)
        break
    except dataretrieval.DataRetrievalError as e:
        if not e.retryable or attempt == 4:
            raise
        time.sleep(e.retry_after or 2 ** attempt)

A new "Handling errors" user-guide page documents these recipes (catch-all, retry-with-backoff, branch-on-status, resume a chunked call).

No-data is not an error

The modern getters (waterdata, wqp, nldi) return an empty DataFrame when nothing matches — check df.empty, don't catch. Only the deprecated nwis (waterservices) path still raises NoSitesError on no data.

Cross-process

The typed errors are picklable via the standard __getstate__/__setstate__ protocol, so they propagate cleanly back from multiprocessing / lithops workers. A ChunkInterrupted sheds its live .call resume handle across a process boundary (keeping the partial frame/response and counts), while in-process callers still get full exc.call.resume().

Breaking changes

  • Request failures raise typed DataRetrievalError subclasses instead of bare ValueError / RuntimeError / httpx.HTTPStatusError. The tree roots only at DataRetrievalError(Exception) — no more builtin mixins, so except ValueError / except RuntimeError no longer catch them. This now also covers ChunkInterrupted (was a RuntimeError) and mid-pagination failures (were a bare RuntimeError). Catch DataRetrievalError (or a subclass).
  • Connection failures are wrapped as NetworkError instead of raw httpx exceptions on the single-shot paths — catch NetworkError (or DataRetrievalError); the httpx exception is on __cause__.
  • A fatal 4xx is an HTTPError (read .status_code); there are no per-code exception types.

Verification

ruff clean (pre-commit hooks). mypy --strict and the full pytest suite are re-run by CI on push. The taxonomy was previously exercised green locally: typed errors round-trip through pickle / deepcopy (including ChunkInterrupted with real partial data and NetworkError); every type exposes .status_code/.retry_after/.retryable uniformly; async chunked requests resume after a transient interruption (from both a sync caller and inside a running event loop). Adds a dataretrieval.exceptions API reference page, a "Handling errors" user guide, and a NEWS.md entry.

🤖 Generated with Claude Code

@thodson-usgs thodson-usgs changed the title fix(errors): correct DataRetrievalError taxonomy follow-ups from code review fix(errors): unify HTTP status→exception across all request paths Jun 3, 2026
@thodson-usgs thodson-usgs force-pushed the fix/error-taxonomy-followups branch from 3651f23 to 108b1ca Compare June 3, 2026 16:20
@thodson-usgs thodson-usgs changed the title fix(errors): unify HTTP status→exception across all request paths refactor(errors)!: a lean, idiomatic DataRetrievalError taxonomy Jun 3, 2026
@thodson-usgs thodson-usgs force-pushed the fix/error-taxonomy-followups branch 6 times, most recently from 0c183b7 to 63526fd Compare June 4, 2026 18:47
Copy link
Copy Markdown
Collaborator Author

@thodson-usgs thodson-usgs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address my suggestions.

Comment thread dataretrieval/waterdata/chunking.py Outdated
Comment on lines +393 to +398
# ``RuntimeError`` base retained deliberately: the request-error taxonomy in
# ``exceptions.py`` shed its ``ValueError`` / ``RuntimeError`` mixins, but
# ``ChunkInterrupted`` predates that and long-standing callers catch it as a
# ``RuntimeError``; keeping the base preserves their ``except RuntimeError``
# handlers. It is *also* a ``DataRetrievalError``, so the unified
# ``except DataRetrievalError`` clause covers it too.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't worry about long-standing callers here. We only recently introduced chunking. You can probably omit historical explanation. We want to jump straight to the longterm solution here.

Comment thread dataretrieval/waterdata/chunking.py Outdated
Comment on lines +505 to +511
def _pickle_state(self) -> dict[str, Any]:
# ``.call`` is a live ChunkedCall whose ``.fetch`` is a decorated module
# function stdlib pickle can't serialize by name, so a real
# ChunkInterrupted can't otherwise cross a multiprocessing boundary. Ship
# the documented degraded ``call=None`` state instead: ``.resume()`` is
# gone (un-resumable cross-process regardless), but the counts, retry
# hint, and partial_frame / partial_response survive.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, the documentation should focus on the current implementation. Not what was done previously

Comment thread dataretrieval/waterdata/utils.py Outdated
Comment on lines +1108 to +1110
On an initial-page parse failure (wrapped via
:func:`_paginated_failure_message` with the original exception on
``__cause__``), or any failure on a subsequent page (same wrapping).
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double check, but we might omit RuntimeError, since this PR should implement the longterm solution.

Comment thread dataretrieval/__init__.py Outdated
Comment on lines +21 to +22
:class:`dataretrieval.DataRetrievalError` (the taxonomy lives in
``dataretrieval.exceptions``). Connection-level failures (timeouts, DNS) still
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we simplify the wording here. Please copy edit.

Comment thread dataretrieval/exceptions.py Outdated
Comment on lines +6 to +7
Connection-level failures (timeouts, DNS, refused connections) still surface as
``httpx`` exceptions on the single-shot request paths.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it cleaner or simpler to handle these as DataRetrievalError also?

Comment thread dataretrieval/exceptions.py Outdated
Comment on lines +56 to +71
def __reduce__(self) -> tuple[Any, ...]:
# The status subclasses take keyword-only fields (status_code /
# retry_after) that the default ``BaseException.__reduce__`` can't
# round-trip: it rebuilds via ``cls(*self.args)``, which raises TypeError
# before the saved state is applied. Rebuild from args + state via
# ``__new__`` instead so every subclass survives a pickle / deepcopy back
# from a multiprocessing / lithops worker. A subclass holding
# un-picklable live state (e.g. the chunker's ``ChunkInterrupted.call``)
# overrides ``_pickle_state`` to shed it.
return (_rebuild_error, (self.__class__, self.args, self._pickle_state()))

def _pickle_state(self) -> dict[str, Any]:
"""Instance state to serialize in :meth:`__reduce__`.

Defaults to ``__dict__``; override to drop attributes that can't pickle.
"""
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

be sure this is absolutely necessary for the implementation. It smells fish.

Comment thread dataretrieval/exceptions.py Outdated
#: Deprecated alias for :class:`NoDataError`. The original name leaked NWIS-era
#: "sites" terminology; it is retained so existing ``except NoSitesError``
#: handlers keep working, and will be removed in a future release.
NoSitesError = NoDataError
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go ahead and add a deprecation worning on NoSitesError

Comment thread dataretrieval/utils.py Outdated
Comment on lines +349 to +350
:class:`~dataretrieval.exceptions.NoDataError` when a 200 response
reports no data matched. Connection-level failures (timeouts, DNS)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, are you sure we treat nodata as an error? I thought the correct pattern was to return an empty data frame.

@thodson-usgs thodson-usgs force-pushed the fix/error-taxonomy-followups branch 8 times, most recently from d348c12 to 38f6d58 Compare June 5, 2026 15:59
Every request failure raises a subclass of DataRetrievalError, so a caller can
handle any of them with a single `except dataretrieval.DataRetrievalError`. The
taxonomy stays small -- it adds only what the underlying httpx exceptions can't
express:

  DataRetrievalError(Exception)         # .status_code / .retry_after / .retryable
  |- HTTPError                   # .status_code -- the server returned an error status
  |   '- TransientError          # .retry_after -- retryable (429 / 5xx)
  |       |- RateLimited         #   429
  |       '- ServiceUnavailable  #   5xx
  |- RequestTooLarge             # the request can't fit
  |   |- URLTooLong              #   414 / client-side over-long URL
  |   '- Unchunkable             #   the Water Data chunker can't split the call
  |- NetworkError                # no response: timeout / DNS / refused connection
  '- NoSitesError                # no-data on the legacy nwis path (see below)

One factory -- error_for_status(status, message, *, retry_after) -- maps a
status to its type, and every request path routes through it (the legacy
`query` path, the Water Data chunker, nldi, nadp, streamstats), so a given
status surfaces as the same type everywhere. A fatal 4xx is a generic HTTPError
carrying .status_code (inspect the code rather than a class per code). The
chunker keys retry/resume on TransientError.

Every DataRetrievalError exposes three read-anywhere fields -- .status_code
(None when there is no HTTP status), .retry_after, and .retryable -- so a single
`except DataRetrievalError as e` clause can branch on the status or drive a
backoff loop without importing or isinstance-checking the concrete subclass.

Connection-level failures (no HTTP response: timeout, DNS, refused connection)
are wrapped as NetworkError, with the underlying httpx exception on __cause__,
so one `except DataRetrievalError` truly spans every failure. The single-shot
paths route their GETs through a thin `utils._get` wrapper that does the
translation; the chunker keeps its own client and wraps transport failures as
resumable interruptions instead. NetworkError carries no .status_code but is
.retryable; with TransientError it forms the retryable set.

A no-data result is not an error: the modern getters (waterdata, wqp, nldi)
return an empty DataFrame when nothing matches. Only the deprecated nwis
(waterservices) path still raises NoSitesError on no data.

The typed errors are picklable via the standard __getstate__/__setstate__
protocol, so they survive a pickle / deepcopy back from a multiprocessing /
lithops worker. A chunk-interruption error sheds its live resume handle (.call)
on that trip -- keeping the diagnostic counts and partial frame/response --
while in-process callers still get full `exc.call.resume()`.

A too-long-URL status (413 / 414) on the legacy `query` path keeps the
actionable "split your query" remediation message (the same one the client-side
over-long-URL case raises), rather than degrading to a bare HTTP-status line.

BREAKING CHANGES
- Request failures raise typed DataRetrievalError subclasses instead of bare
  ValueError / RuntimeError / httpx.HTTPStatusError. The exceptions root only at
  DataRetrievalError(Exception) and no longer also inherit ValueError /
  RuntimeError -- catch DataRetrievalError (or a subclass), not the builtins.
  This now also covers ChunkInterrupted (previously a RuntimeError) and
  mid-pagination failures (previously a bare RuntimeError).
- Connection-level failures are wrapped as NetworkError instead of surfacing as
  raw httpx exceptions on the single-shot paths -- catch NetworkError (or
  DataRetrievalError); the httpx exception is preserved on __cause__.
- A fatal 4xx raises HTTPError (read .status_code); there are no per-code types.

Also adds a dataretrieval.exceptions API docs page, a "Handling errors" user
guide, and a NEWS.md changelog entry.

ruff clean (pre-commit hooks); mypy --strict and the full pytest suite are
re-verified by CI on push.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@thodson-usgs thodson-usgs force-pushed the fix/error-taxonomy-followups branch from 38f6d58 to d778e38 Compare June 5, 2026 16:05
@thodson-usgs thodson-usgs marked this pull request as ready for review June 5, 2026 17:14
@thodson-usgs thodson-usgs merged commit ff8f535 into DOI-USGS:main Jun 5, 2026
9 checks passed
@thodson-usgs thodson-usgs deleted the fix/error-taxonomy-followups branch June 5, 2026 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant