Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -304,3 +304,4 @@ Errors are classified by origin (user vs infra) and retryability. The framework
3. **Extensions return plain errors** — extension interfaces (`MergeChecker`, `Storage`, `Publisher`) return standard `error` values with their own domain sentinels (e.g. `storage.ErrNotFound`). They do NOT classify errors as user or infra.
4. **Controllers classify errors** — the service controller that calls an extension decides whether the failure is user-caused or infrastructure-caused. The same extension error may be classified differently depending on context.
5. **Error chain works end-to-end** — extensions wrap custom errors, controllers wrap with `errs.New*Error`, and `errors.Is`/`errors.As` walks the full chain.
6. **Default classifiers** — primary pipeline consumers compose one or more classifiers (each owning a focused concern such as transport-level signals or a specific driver/library's errors) into `errs.NewClassifierProcessor(...)`. Pick classifiers that match the failure surfaces the consumer actually touches; add a new classifier package when a backend introduces error shapes that no existing one understands, rather than teaching an unrelated classifier about them. DLQ reconciliation consumers use `errs.AlwaysRetryableProcessor` instead so any failure is redelivered rather than dropped.
10 changes: 8 additions & 2 deletions core/errs/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,20 @@ load("@rules_go//go:def.bzl", "go_library", "go_test")

go_library(
name = "errs",
srcs = ["errs.go"],
srcs = [
"errs.go",
"processor.go",
],
importpath = "github.com/uber/submitqueue/core/errs",
visibility = ["//visibility:public"],
)

go_test(
name = "errs_test",
srcs = ["errs_test.go"],
srcs = [
"errs_test.go",
"processor_test.go",
],
embed = [":errs"],
deps = [
"@com_github_stretchr_testify//assert",
Expand Down
42 changes: 27 additions & 15 deletions core/errs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,11 @@ Errors are classified along two axes:
A returned `error` reaches `IsUserError` / `IsRetryable` / `IsDependencyError` carrying one of the framework types (`*userError` / `*infraError`). It gets there one of two ways:

1. **Explicit wrap by the controller** — the controller knows the meaning of the failure and wraps the cause with `NewUserError`, `NewRetryableError`, `NewDependencyError`, or `NewRetryableDependencyError` before returning.
2. **Automatic wrap by `Classify`** — the controller returns a raw driver/library/sentinel error, and a per-backend `Classifier` recognises it later in the pipeline (typically inside the consumer) and adds the appropriate framework wrap.
2. **Automatic wrap by the classifier-based `ErrorProcessor`** — the controller returns a raw driver/library/sentinel error, and a per-backend `Classifier` recognises it later in the pipeline (typically inside the consumer, after `ErrorProcessor.Process` runs) and adds the appropriate framework wrap.

Both routes feed the same downstream helpers; the chain that reaches `IsRetryable` looks identical regardless of who wrapped it.

## `Classify` and the `Classifier` Interface
## `ErrorProcessor`, `Classifier`, and the Processing Pass

`Classifier` inspects a **single error node** and returns a `Verdict`:

Expand All @@ -39,14 +39,22 @@ type Classifier interface {

Verdicts: `Unknown` (this node carries no signal), `User`, `Infra`, `InfraRetryable`, `InfraDependency`, `InfraDependencyRetryable`.

`Classify(err, classifiers...)` is the single, explicit pass that turns a raw chain into a wrapped one. It is called **exactly once per chain** — typically by the consumer immediately after the controller returns. After that point, callers use only the `IsXxx` helpers, which are pure type checks.
An `ErrorProcessor` runs the per-chain pass that turns a raw chain into a wrapped one. It is called **exactly once per chain** — typically by the consumer immediately after the controller returns. After that point, callers use only the `IsXxx` helpers, which are pure type checks.

`Classify` walks the chain twice:
Two implementations ship in this package:

1. **Pass 1 — framework-wrap check.** A cheap type switch looks for an existing `*userError` / `*infraError` anywhere in the chain. If found, the chain is already interpretable and `Classify` returns `err` unchanged. **No classifier is invoked.**
2. **Pass 2 — classifier walk.** From outermost to innermost node, each registered classifier is asked for a verdict. The first non-`Unknown` verdict wins and `err` is wrapped with the matching framework constructor.
- **`NewClassifierProcessor(classifiers...)`** — the standard pass for primary pipeline consumers. Walks the chain twice:
1. **Pass 1 — framework-wrap check.** A cheap type switch looks for an existing `*userError` / `*infraError` anywhere in the chain. If found, the chain is already interpretable and the processor returns `err` unchanged. **No classifier is invoked.**
2. **Pass 2 — classifier walk.** From outermost to innermost node, each registered classifier is asked for a verdict. The first non-`Unknown` verdict wins and `err` is wrapped with the matching framework constructor.

If no classifier recognises anything, `err` is returned unchanged — and behaves as non-retryable infra at the helper layer.
If no classifier recognises anything, `err` is returned unchanged — and behaves as non-retryable infra at the helper layer.

- **`AlwaysRetryableProcessor`** — unconditionally wraps every non-nil error with `NewRetryableError`, overriding any inner framework wrap. Use it for narrowly-scoped consumers — typically DLQ reconciliation — that must redeliver on any failure because there is no further dead-letter destination. Side-effect: an inner `*infraError(dependency=true)` is masked by the outer `retryable=true` wrap, since `errors.As` matches the outermost `*infraError` first. This is acceptable for the intended DLQ use case where only `IsRetryable` drives transport behaviour; do not pair this processor with a primary pipeline consumer or genuine user errors will retry forever instead of reaching their DLQ.

### Choosing a processor

- **Primary pipeline consumer** → `NewClassifierProcessor(...)`. Controllers' explicit `NewUserError` / `NewDependencyError` wraps must survive so user errors don't get retried, and unclassified backend errors must be inspected by the registered classifiers.
- **DLQ reconciliation consumer** → `AlwaysRetryableProcessor`. The DLQ is the last stop; any unprocessable message must come back for another attempt rather than silently drop. The DLQ subscription itself runs with a very high `Retry.MaxAttempts` and with its own DLQ disabled, so "always retryable + bounded-but-effectively-infinite attempts" is the convergence guarantee.

## Adding a Backend-Specific Classifier

Expand Down Expand Up @@ -77,21 +85,24 @@ func (classifier) Classify(err error) errs.Verdict {
}
```

Servers wire each classifier into the consumer as a vararg. Order matters only when two classifiers might both match a node — earlier classifiers win:
Servers wire each classifier into the consumer's `ErrorProcessor`. Order matters only when two classifiers might both match a node — earlier classifiers win:

```go
import (
"github.com/uber/submitqueue/core/errs"
genericerrs "github.com/uber/submitqueue/core/errs/generic"
mysqlerrs "github.com/uber/submitqueue/core/errs/mysql"
)

c := consumer.New(logger, scope, registry,
genericerrs.Classifier,
mysqlerrs.Classifier,
errs.NewClassifierProcessor(
genericerrs.Classifier,
mysqlerrs.Classifier,
),
)
```

Tests follow the same shape: assert per-node behaviour against `Classifier.Classify(node)` directly, and assert end-to-end behaviour by running `errs.Classify(err, Classifier)` and checking the helpers (`IsRetryable`, `IsUserError`, …) on the result. See `core/errs/mysql/mysql_test.go` and `core/errs/generic/generic_test.go`.
Tests follow the same shape: assert per-node behaviour against `Classifier.Classify(node)` directly, and assert end-to-end behaviour by running `errs.NewClassifierProcessor(Classifier).Process(err)` and checking the helpers (`IsRetryable`, `IsUserError`, …) on the result. See `core/errs/mysql/mysql_test.go` and `core/errs/generic/generic_test.go`.

## Overriding Classification from a Controller

Expand All @@ -106,8 +117,9 @@ if errors.Is(err, storage.ErrNotFound) {
return errs.NewUserError(fmt.Errorf("request %s: %w", id, err))
}
if err != nil {
// Hand the raw error to Classify — the mysql classifier will recognise
// deadlocks, lock-wait timeouts, etc. and wrap them as retryable infra.
// Hand the raw error to the consumer's ErrorProcessor — the mysql
// classifier will recognise deadlocks, lock-wait timeouts, etc. and wrap
// them as retryable infra.
return fmt.Errorf("get %s: %w", id, err)
}
```
Expand All @@ -119,7 +131,7 @@ Two practical rules fall out of the short-circuit semantics:

## Extensions Return Plain Go Errors

Extension interfaces (`MergeChecker`, `Storage`, `Publisher`) return standard `error` values. They may define their own domain-specific sentinel errors (e.g. `storage.ErrNotFound`, `storage.ErrVersionMismatch`) but they do **not** classify errors as user or infra — that is the controller's (and `Classify`'s) job.
Extension interfaces (`MergeChecker`, `Storage`, `Publisher`) return standard `error` values. They may define their own domain-specific sentinel errors (e.g. `storage.ErrNotFound`, `storage.ErrVersionMismatch`) but they do **not** classify errors as user or infra — that is the controller's (and the consumer's `ErrorProcessor`'s) job.

This separation keeps extensions reusable across contexts. The same `storage.ErrNotFound` might be a user error in one controller (user requested a non-existent resource) and an infra error in another (expected record is missing).

Expand Down Expand Up @@ -151,4 +163,4 @@ errors.Is(err, ErrNotFound) // true — cause is in the chain
| `IsRetryable(err)` | `err` is or wraps an infra error with the retryable flag set |
| `IsDependencyError(err)` | `err` is or wraps an infra error marked as dependency |

All three are type-only checks. They do not invoke classifiers — pair them with a preceding `Classify` call when the controller's error may not carry an explicit wrap.
All three are type-only checks. They do not invoke classifiers — pair them with a preceding `ErrorProcessor.Process` call when the controller's error may not carry an explicit wrap.
78 changes: 2 additions & 76 deletions core/errs/errs.go
Original file line number Diff line number Diff line change
Expand Up @@ -143,8 +143,8 @@ const (
//
// Classifiers must not call errors.As / errors.Is themselves, which would walk
// the chain and could shadow a classification carried by an outer node (such
// as a controller's explicit NewUserError wrap). The package-level Classify
// function owns the walk.
// as a controller's explicit NewUserError wrap). The classifier-based
// ErrorProcessor (see NewClassifierProcessor) owns the walk.
//
// Classifiers are typically stateless; the canonical convention is to expose a
// package-level singleton value (e.g. mysqlerrs.Classifier) rather than a
Expand All @@ -153,80 +153,6 @@ type Classifier interface {
Classify(err error) Verdict
}

// Classify is the single, explicit classification pass. It is intended to be
// called exactly once per error chain — typically by the consumer immediately
// after a controller returns — and produces a chain that subsequent IsUserError
// / IsRetryable / IsDependencyError calls can interpret with simple type
// checks (no further classifier walks).
//
// Semantics:
//
// - nil in, nil out.
// - If err's chain already carries a framework classification (*userError or
// *infraError anywhere in the chain), returns err unchanged — the chain is
// already interpretable by IsUserError / IsRetryable / IsDependencyError.
// - Otherwise, walks the chain from outermost to innermost, asking each
// classifier per node. The FIRST non-Unknown verdict wins; the outermost
// such node determines the wrap. err is wrapped with the framework
// constructor matching that verdict (User -> NewUserError, InfraRetryable
// -> NewRetryableError, etc.) and the wrapped error is returned.
// - Verdict Infra means "non-retryable infra" — which is already the default
// behavior for an unwrapped chain, so no wrap is added.
// - If no classifier recognises anything, err is returned unchanged.
//
// Implementation: two passes over the chain. Pass 1 is a cheap type check
// looking for an existing framework wrap and short-circuits if one is found —
// no classifier is invoked. Pass 2 runs the configured classifiers per node.
// Walking the chain is cheap relative to a classifier call, so this avoids
// running classifiers whenever the chain is already classified deeper down.
//
// NOTE: this central classifier model cannot disambiguate errors of the same
// underlying type produced by different extensions (e.g. a net.OpError from a
// mysql connection vs the same type from an HTTP caller would both match the
// mysql classifier here). Resolving that requires per-extension provenance
// tagging; intentionally deferred.
func Classify(err error, classifiers ...Classifier) error {
if err == nil {
return nil
}

// Pass 1 — cheap framework-wrap check. If any node already carries a
// framework type, the chain is interpretable as-is and classifiers are
// never invoked.
for cur := err; cur != nil; cur = errors.Unwrap(cur) {
switch cur.(type) {
case *userError, *infraError:
return err
}
}

// Pass 2 — run classifiers per node from outermost to innermost. Stop at
// the first non-Unknown verdict.
var verdict Verdict
for cur := err; cur != nil && verdict == Unknown; cur = errors.Unwrap(cur) {
for _, c := range classifiers {
if v := c.Classify(cur); v != Unknown {
verdict = v
break
}
}
}

switch verdict {
case User:
return NewUserError(err)
case InfraRetryable:
return NewRetryableError(err)
case InfraDependency:
return NewDependencyError(err)
case InfraDependencyRetryable:
return NewRetryableDependencyError(err)
}
// Unknown or Infra — no wrap needed; the existing chain already behaves as
// non-retryable infra at the IsRetryable / IsUserError layer.
return err
}

// IsUserError reports whether err is or wraps a user error, i.e. an error
// produced by NewUserError. Inspects only the framework types in the chain.
func IsUserError(err error) bool {
Expand Down
13 changes: 8 additions & 5 deletions core/errs/generic/generic.go
Original file line number Diff line number Diff line change
Expand Up @@ -25,16 +25,19 @@ import (

// Classifier recognises generic, non-backend-specific errors and returns
// errs.Unknown for anything it does not recognise so the surrounding
// errs.Classify walker can keep looking down the unwrap chain.
// classifier-processor walk can keep looking down the unwrap chain.
//
// The classifier is stateless; this package-level singleton is the canonical
// handle. Pass it into consumer.New as a vararg.
// handle. Pass it as one of the variadic classifiers to
// errs.NewClassifierProcessor; the resulting processor is what gets handed to
// consumer.New.
var Classifier errs.Classifier = classifier{}

type classifier struct{}

// Classify inspects a single node. Per the errs.Classifier contract, this
// must not call errors.Is / errors.As — errs.Classify owns the chain walk.
// must not call errors.Is / errors.As — the classifier-processor owns the
// chain walk.
func (classifier) Classify(err error) errs.Verdict {
// Cancellation signals that the caller aborted the work in flight
// (process shutdown, deadline on the inbound RPC, parent operation gone) —
Expand All @@ -44,8 +47,8 @@ func (classifier) Classify(err error) errs.Verdict {
// Cases where cancellation truly means "do not run this again" are
// caller-specific and should be expressed by wrapping with an explicit
// NewUserError / NewDependencyError before returning; the pass-1
// framework-wrap check in errs.Classify will then short-circuit before
// this classifier is consulted.
// framework-wrap check in the classifier-processor will then short-circuit
// before this classifier is consulted.
if err == context.Canceled {
return errs.InfraRetryable
}
Expand Down
14 changes: 8 additions & 6 deletions core/errs/generic/generic_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,8 @@ func TestClassifier_Unknown(t *testing.T) {
err error
}{
// Per-node contract — Classifier should NOT match a wrapped
// context.Canceled; the surrounding errs.Classify walk will reach the
// inner node and ask Classifier again there.
// context.Canceled; the surrounding classifier-processor walk will
// reach the inner node and ask Classifier again there.
{"wrapped context.Canceled", fmt.Errorf("op: %w", context.Canceled)},
{"deadline exceeded", context.DeadlineExceeded},
{"plain error", errors.New("anything")},
Expand All @@ -49,17 +49,19 @@ func TestClassifier_Unknown(t *testing.T) {
}
}

func TestClassifier_AppliedViaClassify(t *testing.T) {
func TestClassifier_AppliedViaProcessor(t *testing.T) {
p := errs.NewClassifierProcessor(Classifier)

t.Run("bare context.Canceled becomes retryable infra", func(t *testing.T) {
out := errs.Classify(context.Canceled, Classifier)
out := p.Process(context.Canceled)
assert.True(t, errs.IsRetryable(out))
})

t.Run("wrapped context.Canceled becomes retryable infra", func(t *testing.T) {
// The chain walker reaches the inner context.Canceled node and the
// classifier matches there.
wrapped := fmt.Errorf("process: %w", context.Canceled)
out := errs.Classify(wrapped, Classifier)
out := p.Process(wrapped)
assert.True(t, errs.IsRetryable(out))
})

Expand All @@ -68,7 +70,7 @@ func TestClassifier_AppliedViaClassify(t *testing.T) {
// The pass-1 framework-wrap check short-circuits before Classifier
// runs.
err := errs.NewUserError(context.Canceled)
out := errs.Classify(err, Classifier)
out := p.Process(err)
assert.Same(t, err, out)
assert.False(t, errs.IsRetryable(out))
assert.True(t, errs.IsUserError(out))
Expand Down
Loading
Loading