Skip to content

Server: restore monitored-item data/event queues on failover (#3939)#6

Merged
marcschier merged 1 commit into
nodestatestoragefrom
copilot/failover-restore-monitored-item-queues
Jul 4, 2026
Merged

Server: restore monitored-item data/event queues on failover (#3939)#6
marcschier merged 1 commit into
nodestatestoragefrom
copilot/failover-restore-monitored-item-queues

Conversation

@marcschier

Copy link
Copy Markdown
Owner

Implements OPCFoundation#3939 — the remaining failover work tracked from OPCFoundation#3918.

Stacked on OPCFoundation#3918 (base branch nodestatestorage); this PR adds only the OPCFoundation#3939 changes on top and should merge after / into OPCFoundation#3918.

Problem

SharedKeyValueSubscriptionStore mirrors subscription definitions and retransmission state, but the per-monitored-item data/event queues were not restored: RestoreDataChangeMonitoredItemQueue/RestoreEventMonitoredItemQueue run on the synchronous monitored-item creation path, so a networked/async store cannot re-hydrate them. After a failover, values that were queued-but-not-yet-published on the failed replica were lost.

Changes

Async restore plumbing (Opc.Ua.Server)

  • ISubscriptionStore: add RestoreDataChangeMonitoredItemQueueAsync / RestoreEventMonitoredItemQueueAsync (existing sync hooks kept as the fallback for local/durable stores).
  • IStoredMonitoredItem / StoredMonitoredItem: transient (never-serialized) RestoredDataChangeQueue / RestoredEventQueue used to carry a pre-hydrated queue.
  • MasterNodeManager.RestoreMonitoredItemsAsync pre-fetches each queue asynchronously and hands it to the still-synchronous MonitoredItem constructor, so a networked store never blocks the creation path.
  • MonitoredItem.RestoreQueue prefers the pre-hydrated queue, else falls back to the synchronous store method.
  • Core queue mutators made virtual so a mirror can subclass them.
  • StandardServer + OpcUaServerHostedService: DI seams for ISubscriptionStore and IMonitoredItemQueueFactory.

Continuous mirror (Opc.Ua.Redundancy.Server)

  • New SharedKeyValueMonitoredItemQueueFactory + Mirroring{DataChange,Event}MonitoredItemQueue: snapshot queue contents on each mutation, coalesce, and persist via a non-blocking background drain (encrypted at rest via the configured IRecordProtector); on promotion the restore rebuilds a still-mirroring queue.
  • SharedKeyValueSubscriptionStore delegates its async restore to the factory and cleans stale queue keys; UseDistributedSubscriptionMirroring registers the factory + store together.

Tests & docs

  • 9 new redundancy tests (mirror/restore round-trip for data + event, error preservation, dequeue-shrink, cleanup, dispose-removes, store delegation), 2 new core tests (pre-hydration used + sync fallback), and 1 new AOT round-trip. All green on net10.0 and net48 (AOT test passes in SourceGenerated/AOT mode); existing durable-queue tests remain green.
  • Updated Docs/HighAvailability.md (the two "queues not restored" notes) and Docs/migrate/2.0.x/sessions-subscriptions.md (new async ISubscriptionStore members).

Conventions honored: async TAP only (no sync-over-async), System.Threading.Lock, ByteString/ArrayOf, sealed + DI-injectable with direct-construct fallback, NativeAOT-safe, additive/non-breaking (new 2.0 surface), MIT headers, no regions.

…dation#3939)

Add an async queue-restore path so a networked ISubscriptionStore can re-hydrate per-monitored-item data/event queues without blocking the synchronous MonitoredItem creation path, plus a continuous shared-store mirror so queued-but-unpublished values survive an HA failover.

- ISubscriptionStore: add Restore{DataChange,Event}MonitoredItemQueueAsync (sync hooks kept as fallback)
- IStoredMonitoredItem/StoredMonitoredItem: transient Restored* queue properties
- MasterNodeManager: pre-hydrate queues before MonitoredItem construction
- MonitoredItem.RestoreQueue: prefer pre-hydrated queue, fall back to sync
- Make core queue mutators virtual for mirroring subclasses
- StandardServer + hosted service: DI seams for ISubscriptionStore and IMonitoredItemQueueFactory
- Opc.Ua.Redundancy.Server: SharedKeyValueMonitoredItemQueueFactory + mirroring queues; wire into UseDistributedSubscriptionMirroring and SharedKeyValueSubscriptionStore async restore
- Tests (core, redundancy, AOT) and docs (HighAvailability, migration guide)
@marcschier marcschier marked this pull request as ready for review July 4, 2026 11:00
@marcschier marcschier merged commit a2c3b44 into nodestatestorage Jul 4, 2026
@marcschier marcschier deleted the copilot/failover-restore-monitored-item-queues branch July 4, 2026 11:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant