Server: decouple held long-poll Publishes from the request-queue worker budget#3950
Server: decouple held long-poll Publishes from the request-queue worker budget#3950marcschier wants to merge 6 commits into
Conversation
…r budget A parked Publish (waiting for notifications) now releases its processing worker at the park point instead of occupying one of MaxRequestThreadCount worker slots for the whole wait, so the worker/thread budget no longer scales with session count. - Opc.Ua.Types: new IRequestParkSink; RequestLifetime.ParkSink carrier. - Opc.Ua.Core: internal RequestParkSink + IParkableIncomingRequest; EndpointIncomingRequest creates the sink and flows it onto the RequestLifetime; ServerBase.RequestQueue awaits park-or-complete (Task.WhenAny) and releases the active-worker slot at park, observing the detached parked task fault-safely. New ServerConfiguration.DecoupleHeldPublishRequests (default true) escape hatch, plumbed via InitializeRequestQueue. - Opc.Ua.Server: StandardServer.PublishAsync flows the sink; SubscriptionManager.PublishAsync and SessionPublishQueue.PublishAsync gain IRequestParkSink? overloads; NotifyParked() is raised at the single park site (queuing the incomplete Tcs.Task). Response delivery, ordering, timeout and cancellation are unchanged. net10 + net48 clean.
- Opc.Ua.Core.Tests RequestQueueTests: a parked request releases its worker so a single worker services the next request while the first is still parked; the DecoupleHeldPublishRequests=false path keeps the worker blocked; 200 parked requests are served by a 2-worker pool. - Opc.Ua.Server.Tests SessionPublishQueueTests: NotifyParked is raised exactly once when a Publish parks and never when it returns/faults immediately.
…h hot path free Only requests that can actually park (Publish long-polls) allocate a park sink and take the park-or-complete worker path; every other request (Read/Write/Browse/...) keeps the legacy inline path with no extra per-request allocation or Task.WhenAny. This makes the dominant non-Publish request path byte-for-byte equivalent to before the change, so there is no fast-path regression, while held Publishes still release their worker at the park point. ParkSink is now nullable (null = cannot park).
|
agent seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #3950 +/- ##
==========================================
- Coverage 73.52% 73.49% -0.03%
==========================================
Files 1170 1171 +1
Lines 170163 170202 +39
Branches 29361 29366 +5
==========================================
- Hits 125117 125096 -21
- Misses 34043 34101 +58
- Partials 11003 11005 +2
🚀 New features to boost your workflow:
|
- ServerScalability.md: rewrite as user-facing documentation — drop tracking IDs (B1-B5/S1-S4) and status/temporal wording (delivered/now/roadmap/options), and describe the server's scalability characteristics, admission controls, and the held-Publish decoupling as current functionality. - Remove the delegating PublishAsync compat overloads (2.0-only API, no migration needed) in favor of the IRequestParkSink signature throughout: ISubscriptionManager, SubscriptionManager, SessionPublishQueue now expose only the park-sink method; test/benchmark callers updated to pass the sink (null when none).
…anch Integrates PR #3950 "Server: decouple held long-poll Publishes from the request-queue worker budget" (the S1 scalability item) into this branch. Conflict resolution: Docs/ServerScalability.md - took #3950's concise rewrite (its "Admission control and rate limiting" section already summarizes the rate-limiting + retry-after work, and it references RateLimiting.md for the full surface) and added a See-also link to RetryAfterSignaling.md. All code files auto-merged cleanly (StandardServer.cs regions are disjoint).
…y numbers With DecoupleHeldPublishRequests default-on (S1), a held Publish releases its request-processing worker at the park point, so the worker pool no longer has to scale with the session count. An A/B load-test campaign (ServerManySessionsLoadTest, Xeon W-2235 6c/12t, quiet machine) shows a ~200-worker pool cleanly establishes and serves ~4000 sessions (8-17x the ~140-350 the coupled-off path manages at the same budget), while a session-count-sized pool (10500) is ~2x slower to establish and reaches a LOWER ceiling (thread oversubscription during the RSA-handshake burst). - LoadTest fixture: MaxRequestThreadCount 10500->200, MinRequestThreadCount 200->50, with the rationale comment rewritten for the S1 small-pool model. - Benchmarks.md: refresh the server-session scalability table (2000/2500/4000 now establish cleanly; 10000 remains the establishment wall) and the MaxRequestThreadCount guidance (size to active concurrency, not session count). ServerScalability.md prose was rewritten upstream (8f7a95d); measured numbers now live in Benchmarks.md, which that doc references.
|
Merged into #3946. The held long-poll Publish decoupling (S1) commits from this branch — including the latest |
Description
Decouples held long-poll
Publishrequests from the server's request-queue worker/thread budget (a follow-up to the scalability analysis added by #3941). A parkedPublish— one waiting for the next subscription notification — no longer occupies one ofMaxRequestThreadCountworker slots for the whole wait, so a small, fixed worker pool can hold many thousands of outstanding Publishes andMaxRequestThreadCountno longer has to scale with session count.The problem
The OS thread is already released while a
Publishis parked (SessionPublishQueue.PublishAsyncreturns aRunContinuationsAsynchronouslyTaskCompletionSourcetask). ButServerBase.RequestQueue.WorkerLoopAsyncawaited each request to completion inline and held the active-worker counter for the entire long-poll. Each heldPublishtherefore occupied a worker-loop slot;ScheduleIncomingRequestspawns workers up toMaxRequestThreadCountand the queue raises the processThreadPoolceiling to match. Consequence: to serve N sessions you had to sizeMaxRequestThreadCount ≥ N.The change — park-or-complete, via an explicit park sink
IRequestParkSink(Opc.Ua.Types) + a nullableRequestLifetime.ParkSinkcarrier. One-shot, idempotent.Opc.Ua.Core: internalRequestParkSink(TaskCompletionSource<bool>, cross-TFM) and a compat-safe internalIParkableIncomingRequest : IEndpointIncomingRequest.EndpointIncomingRequestcreates a sink only forPublishrequests and flows it onto theRequestLifetime.ServerBase.RequestQueueawaitsTask.WhenAny(processing, ParkSink.ParkedTask)and releases the active-worker slot at the park point rather than at completion, observing the detached parked task fault-safely (ProcessRequestSafeAsyncnever throws and always faults the request to the client, exactly as the legacy path did).Opc.Ua.Server:StandardServer.PublishAsyncflows the sink;SubscriptionManager.PublishAsyncandSessionPublishQueue.PublishAsynctake anIRequestParkSinkparameter (used throughout — the pre-existing 2.0-only signatures are replaced, not duplicated);NotifyParked()is raised at the single park site (queuing the incompleteTcs.Task).ServerConfiguration.DecoupleHeldPublishRequests(defaulttrue) restores the legacy inline-await worker path when set tofalse.Response delivery, publish ordering, timeout and cancellation are unchanged — the response still flows through the request's value-task source, and the per-request timeout/cancel still fault the
TaskCompletionSource.Performance — no fast-path regression by construction
Only
Publishrequests carry a sink and take the park-or-complete path. Every other request (Read/Write/Browse/Call/…) keeps the legacy inline path with no extra per-request allocation and noTask.WhenAny— it is byte-for-byte the prior behavior.Publishadds a singleWhenAny+ one smallTaskCompletionSource, which is negligible against a multi-second park and is dwarfed by the worker slot it releases. Worker-pool footprint stays flat as the number of held Publishes grows (see the 200-parked-on-2-workers test).Tests
Opc.Ua.Core.Tests/RequestQueueTests: a parked request releases its worker so a single worker services the next request while the first is still parked; withDecoupleHeldPublishRequests = falsethe single worker stays blocked until completion; 200 parked requests are served by a 2-worker pool.Opc.Ua.Server.Tests/SessionPublishQueueTests:NotifyParkedis raised exactly once when aPublishparks and never when it returns/faults immediately.Opc.Ua.Core.Tests3592 passed,Opc.Ua.Server.Tests1904 passed,Opc.Ua.Subscriptions.Tests505 passed (real Publish long-polls, failover, transfer, durable).Opc.Ua.Server/Opc.Ua.Corebuild clean on net48 and net10.0.Docs
Docs/ServerScalability.mdrewritten as user-facing documentation covering the server's scalability characteristics, admission controls, the held-Publish decoupling, andDecoupleHeldPublishRequests/MaxRequestThreadCountsizing guidance.Related Issues
Follow-up to the scalability analysis added in #3941 (
Docs/ServerScalability.md).Checklist