[fm] Guard support bundle creation with SitrepGuardedInsert.#10535
[fm] Guard support bundle creation with SitrepGuardedInsert.#10535mergeconflict wants to merge 21 commits into
Conversation
e15c2c1 to
e5d67e2
Compare
1bc2f60 to
62cc3dd
Compare
e5d67e2 to
809d7d3
Compare
62cc3dd to
3d5af68
Compare
809d7d3 to
7128ff2
Compare
a1beef9 to
e97748d
Compare
7128ff2 to
1574845
Compare
e97748d to
f3832d4
Compare
1574845 to
4d0f1dd
Compare
f3832d4 to
cbc506b
Compare
85edf6b to
a3ab6c2
Compare
cbc506b to
59680b0
Compare
Add the `SitrepGuardedInsert` Diesel combinator and the
`SitrepGuardedResource` trait: a generic primitive for FM rendezvous to
insert a resource row idempotently and guarded against stale-sitrep
execution.
The combinator wraps a caller-supplied resource INSERT in a single CTE
statement that:
- aborts (StaleSitrep) unless the executor's expected generation still
equals the latest sitrep's generation column;
- short-circuits (AlreadyExists) if a creation marker already exists for
the resource id;
- on a successful insert, atomically writes a creation marker, gated by
`WHERE EXISTS (SELECT 1 FROM new_resource)` so a marker is never
fabricated for a pre-existing row.
All spliced SQL identifiers come from the trait's `&'static str` consts, so
the query is injection-safe. The result is surfaced as a
`SitrepGuardedInsertOutcome` of Created / AlreadyExists / StaleSitrep.
59680b0 to
b6f5bfe
Compare
a3ab6c2 to
26cbbe1
Compare
b6f5bfe to
a7d6f65
Compare
26cbbe1 to
da05cf5
Compare
a7d6f65 to
aee00ec
Compare
da05cf5 to
064090e
Compare
Wire the alert resource through `SitrepGuardedInsert`:
- `impl SitrepGuardedResource for Alert`;
- schema: `alert_generation` on `fm_sitrep` and the
`rendezvous_alert_created` marker table (migration
fm-alert-resource-deletion);
- `alert_create`'s FM path routes through the combinator, surfacing a
stale sitrep as a monomorphic `Error::Conflict`;
- `SitrepBuilder` tracks `alert_generation`, bumping it when the
outstanding alert-request set changes; the closed-case carry-forward
filter drops fully-satisfied closed cases;
- fm_rendezvous reads the expected generation and aborts the alert loop
on a stale mismatch; fm_analysis loads existing markers to drive
carry-forward; omdb displays the new status fields and generation.
The support-bundle mirror of the preceding alert change: `impl SitrepGuardedResource for SupportBundle`, the `support_bundle_generation` column and `rendezvous_support_bundle_created` marker table (migration fm-bundle-resource-deletion), `support_bundle_create`'s FM path routed through the combinator inside its transaction, support-bundle generation tracking and carry-forward, the fm_rendezvous bundle loop, the fm_analysis bundle-marker lookup, and the omdb display. `support_bundle_generation` and `alert_generation` are tracked and guarded independently, so a stale generation for one resource aborts only that resource's rendezvous loop.
Add the `SitrepGuardedInsert` Diesel combinator and the `SitrepGuardedResource` trait: a generic primitive for FM rendezvous to insert a resource row idempotently and guarded against stale-sitrep execution. The combinator wraps a caller-supplied resource INSERT in a single CTE statement that: - aborts (StaleSitrep) unless the executor's expected generation still equals the latest sitrep's generation column; - short-circuits (AlreadyExists) if a creation marker already exists for the resource id; - on a successful insert, atomically writes a creation marker. The result is surfaced as a `SitrepGuardedInsertOutcome` of `Created` / `AlreadyExists` / `StaleSitrep`. Context: #10248. This PR was previously #10532 but I made a mess of it. This is used in #10533 and #10535 which are split out in hopes of making the review somewhat less miserable.
aee00ec to
aab93e8
Compare
064090e to
5f46981
Compare
Rename first_sitrep_with_alert_request_bumps_generation -> first_sitrep_with_alert_request_bumps_alert_generation so the alert test reads clearly alongside the support bundle generation tests added later.
| /// For [`SupportBundleProvenance::Fm`] the bundle INSERT is gated by | ||
| /// [`SitrepGuardedInsert`], which also writes the | ||
| /// `rendezvous_support_bundle_created` marker atomically. The | ||
| /// transaction encloses the guarded insert and the data-selection | ||
| /// inserts, so a marker is never durable without its data-selection | ||
| /// rows. If a bundle with the given id already exists, returns | ||
| /// [`Error::ObjectAlreadyExists`]. If the latest sitrep's | ||
| /// `support_bundle_generation` has advanced past | ||
| /// `expected_support_bundle_generation`, returns [`Error::Conflict`]; the | ||
| /// caller should skip the request rather than retrying. | ||
| /// | ||
| /// The free-dataset lookup runs before the generation guard, so a stale | ||
| /// rendezvous activation that also finds no free dataset reports | ||
| /// [`Error::insufficient_capacity`] rather than [`Error::Conflict`]. | ||
| /// This only affects status reporting and log noise: the guard still | ||
| /// prevents a stale activation from inserting anything. | ||
| pub async fn support_bundle_create( |
There was a problem hiding this comment.
hmm. similarly to the suggestion I left on #10533, I wonder if we might want to refactor this as separate support_bundle_create and fm_rendezvous_support_bundle_create methods, rather than passing in a "provenance" type that tells the method which behavior to do --- I can't really imagine having code that wants to be able to call support_bundle_create with either provenance dynamically; we're only doing one code path or the other, so it feels a bit nicer to me for them to just be separate methods.
with that said, I realize that (unlike alerts) the provenance enum was already here, so 🤷♀️
There was a problem hiding this comment.
I had a reason for keeping the switch-on-provenance thing here, but now I honestly can't remember what it was or whether it was a good reason. I'll think about this.
There was a problem hiding this comment.
Ok, I messed with this a bit and decided to stop messing with it. The direction it was going, I had three separate SupportBundleCreateParams structs: one for support_bundle_create, another for fm_rendezvous_support_bundle_create, and a third for their shared helper function (which had some super fun trait bounds for the insert query closure)... It was getting ugly, and then my efforts to de-uglify all pointed in the direction of "hey wouldn't it be nice to just switch on provenance."
There was a problem hiding this comment.
The direction it was going, I had three separate SupportBundleCreateParams structs: one for support_bundle_create, another for fm_rendezvous_support_bundle_create, and a third for their shared helper function (which had some super fun trait bounds for the insert query closure)
Maybe there's an obvious reason this won't work that I'm missing, but why can't we just have something like this:
pub async fn support_bundle_create(&self, params: SupportBundleCreateParams) -> ...
pub async fn fm_support_bundle_create(
&self,
params: SupportBundleCreateParams,
case_id: CaseUuid,
id: SupportBundleUuid,
expected_generation: Generation,
) -> ...rather than having three params types? I wouldn't necessarily expect that there has to be a shared helper function if the code is different enough (but there can be if and only if it makes sense).
With that all said, if you don't feel like this is better, it's fine to do the provenance thing.
There was a problem hiding this comment.
With that all said, if you don't feel like this is better, it's fine to do the provenance thing.
I mean, I don't know, It's definitely bit better in some respects, but I don't know how much better. I've had Claude help me throw together a strawman draft PR, #10665, to show what it might look like. The basic issue I keep beating my head against it is that support_bundle_create and fm_rendezvous_support_bundle_create are structurally very similar -- in #10665 they're basically copy-pastes -- but my various attempts to refactor for sharing all ended up being gross and overbuilt. For example, my most recent attempt involved a policy trait comprising a function for "do the fucking insert" and an associated type for error returns; there was a lot of boilerplate there.
What's your feeling?
There was a problem hiding this comment.
@mergeconflict what say you about #10667? (note: if you merge this there will be some additional cleanup to actually make rendezvous use the new thing, remove all the Provenances, et cetera)
| /// Verifies that the bump is relative to the parent sitrep's generation, | ||
| /// not restarted from a fixed initial value. | ||
| #[test] | ||
| fn child_sitrep_with_new_alert_bumps_alert_generation() { | ||
| fn child_sitrep_with_both_new_requests_bumps_both_generations() { | ||
| // Verifies that the bump is relative to the parent sitrep's generation, | ||
| // not restarted from a fixed initial value. |
There was a problem hiding this comment.
it seems like the comment has still moved?
There was a problem hiding this comment.
you're looking at an older commit for some reason, this was fixed in 06f71f5
| /// A closed case with an outstanding alert request becomes "satisfied" via | ||
| /// marker presence, carry-forward drops it, and alert_generation bumps even | ||
| /// though no open-case builder mutations happened. | ||
| #[test] | ||
| fn carry_forward_drop_bumps_alert_generation() { | ||
| // A closed case with an outstanding alert request becomes "satisfied" | ||
| // via marker presence, carry-forward drops it, alert_generation bumps | ||
| // even though no open-case builder mutations happened. |
There was a problem hiding this comment.
nit: comment has still moved, can we move it back please?
| let query = bundle_marker_dsl::rendezvous_support_bundle_created | ||
| .filter(bundle_marker_dsl::support_bundle_id.eq_any(candidates)) | ||
| .select(bundle_marker_dsl::support_bundle_id); | ||
|
|
||
| let explanation = query | ||
| .explain_async(&conn) |
There was a problem hiding this comment.
this test feels a bit weird to me: we are constructing a query in the test and calling explain_async on it. This is intended to be an EXPLAIN test for the query used in the fm_rendezvous_existing_support_bundle_markers method, but it's not EXPLAINing the actual query used by that method. At present, the test is constructing the same query, but it could drift. In general, when we write these EXPLAIN tests, we do it by having a fn foo_query(args) -> impl RunnableQuery<Whatever> that both the actual query method and the EXPLAIN test both call to get the query. That way, if the query changes, the EXPLAIN test is still EXPLAINing the actual query we're using in production. Can we do that here?
There was a problem hiding this comment.
I decided to delete this test (in 8b556e9) rather than fix it, actually, along with the alerts equivalent. I don't think we need an EXPLAIN test for this just to demonstrate that there's no full table scan, which would be caught by pretty much any other test that runs the query. We already have a good expectorate test for the query shape (see sitrep_guarded_insert.sql).
| /// A closed case with an outstanding alert request becomes "satisfied" via | ||
| /// marker presence, carry-forward drops it, and alert_generation bumps even | ||
| /// though no open-case builder mutations happened. | ||
| #[test] | ||
| fn carry_forward_drop_bumps_alert_generation() { | ||
| // A closed case with an outstanding alert request becomes "satisfied" | ||
| // via marker presence, carry-forward drops it, alert_generation bumps | ||
| // even though no open-case builder mutations happened. |
There was a problem hiding this comment.
nit: comment has still moved, can we move it back please?
| use nexus_types::fm::case::SupportBundleRequest; | ||
| use nexus_types::support_bundle::BundleDataSelection; | ||
| use omicron_uuid_kinds::SupportBundleUuid; | ||
|
|
There was a problem hiding this comment.
nit: some of these imports could probably be factored out?
hawkw
left a comment
There was a problem hiding this comment.
cool i feel pretty good about this
| >, | ||
| >, | ||
| ) | ||
| -> BoxFuture<'a, Result<SupportBundle, DbError>>, |
There was a problem hiding this comment.
oh i'm glad this works that's alot less gross
| Err( | ||
| e @ (FmSupportBundleCreateError::TooManyBundles | ||
| | FmSupportBundleCreateError::Other(_)), | ||
| ) => { |
There was a problem hiding this comment.
personally i feel like the exhaustive matching here feels unnecessary, but 🤷♀️
Wire the support bundle resource through
SitrepGuardedInsert:impl SitrepGuardedResource for SupportBundle;support_bundle_generationonfm_sitrepand therendezvous_support_bundle_createdmarker table (migration fm-bundle-resource-deletion);support_bundle_create's FM path routes through the combinator, surfacing a stale sitrep asError::Conflict;SitrepBuildertrackssupport_bundle_generation, bumping it when the outstanding support-bundle-request set changes; the closed-case carry-forward filter drops fully-satisfied closed cases and keeps those with unsatisfied support bundle requests;support_bundle_generationandalert_generationare tracked and guarded independently, so a stale generation for one resource aborts only that resource's rendezvous loop.Context: #10248, builds on #10564 and #10533.