Skip to content

Server: decouple held long-poll Publishes from the request-queue worker budget#3950

Closed
marcschier wants to merge 6 commits into
masterfrom
copilot/decouple-held-publishes
Closed

Server: decouple held long-poll Publishes from the request-queue worker budget#3950
marcschier wants to merge 6 commits into
masterfrom
copilot/decouple-held-publishes

Conversation

@marcschier

@marcschier marcschier commented Jul 3, 2026

Copy link
Copy Markdown
Collaborator

Description

Decouples held long-poll Publish requests from the server's request-queue worker/thread budget (a follow-up to the scalability analysis added by #3941). A parked Publish — one waiting for the next subscription notification — no longer occupies one of MaxRequestThreadCount worker slots for the whole wait, so a small, fixed worker pool can hold many thousands of outstanding Publishes and MaxRequestThreadCount no longer has to scale with session count.

The problem

The OS thread is already released while a Publish is parked (SessionPublishQueue.PublishAsync returns a RunContinuationsAsynchronously TaskCompletionSource task). But ServerBase.RequestQueue.WorkerLoopAsync awaited each request to completion inline and held the active-worker counter for the entire long-poll. Each held Publish therefore occupied a worker-loop slot; ScheduleIncomingRequest spawns workers up to MaxRequestThreadCount and the queue raises the process ThreadPool ceiling to match. Consequence: to serve N sessions you had to size MaxRequestThreadCount ≥ N.

The change — park-or-complete, via an explicit park sink

  • IRequestParkSink (Opc.Ua.Types) + a nullable RequestLifetime.ParkSink carrier. One-shot, idempotent.
  • Opc.Ua.Core: internal RequestParkSink (TaskCompletionSource<bool>, cross-TFM) and a compat-safe internal IParkableIncomingRequest : IEndpointIncomingRequest. EndpointIncomingRequest creates a sink only for Publish requests and flows it onto the RequestLifetime. ServerBase.RequestQueue awaits Task.WhenAny(processing, ParkSink.ParkedTask) and releases the active-worker slot at the park point rather than at completion, observing the detached parked task fault-safely (ProcessRequestSafeAsync never throws and always faults the request to the client, exactly as the legacy path did).
  • Opc.Ua.Server: StandardServer.PublishAsync flows the sink; SubscriptionManager.PublishAsync and SessionPublishQueue.PublishAsync take an IRequestParkSink parameter (used throughout — the pre-existing 2.0-only signatures are replaced, not duplicated); NotifyParked() is raised at the single park site (queuing the incomplete Tcs.Task).
  • Escape hatch: ServerConfiguration.DecoupleHeldPublishRequests (default true) restores the legacy inline-await worker path when set to false.

Response delivery, publish ordering, timeout and cancellation are unchanged — the response still flows through the request's value-task source, and the per-request timeout/cancel still fault the TaskCompletionSource.

Performance — no fast-path regression by construction

Only Publish requests carry a sink and take the park-or-complete path. Every other request (Read/Write/Browse/Call/…) keeps the legacy inline path with no extra per-request allocation and no Task.WhenAny — it is byte-for-byte the prior behavior. Publish adds a single WhenAny + one small TaskCompletionSource, which is negligible against a multi-second park and is dwarfed by the worker slot it releases. Worker-pool footprint stays flat as the number of held Publishes grows (see the 200-parked-on-2-workers test).

Tests

  • Opc.Ua.Core.Tests/RequestQueueTests: a parked request releases its worker so a single worker services the next request while the first is still parked; with DecoupleHeldPublishRequests = false the single worker stays blocked until completion; 200 parked requests are served by a 2-worker pool.
  • Opc.Ua.Server.Tests/SessionPublishQueueTests: NotifyParked is raised exactly once when a Publish parks and never when it returns/faults immediately.
  • Regression (net10.0): Opc.Ua.Core.Tests 3592 passed, Opc.Ua.Server.Tests 1904 passed, Opc.Ua.Subscriptions.Tests 505 passed (real Publish long-polls, failover, transfer, durable). Opc.Ua.Server/Opc.Ua.Core build clean on net48 and net10.0.

Docs

Docs/ServerScalability.md rewritten as user-facing documentation covering the server's scalability characteristics, admission controls, the held-Publish decoupling, and DecoupleHeldPublishRequests / MaxRequestThreadCount sizing guidance.

Related Issues

Follow-up to the scalability analysis added in #3941 (Docs/ServerScalability.md).

Checklist

  • I have signed the CLA and read the CONTRIBUTING doc.
  • I have added tests that prove my fix is effective or that my feature works and increased code coverage.
  • I have added all necessary documentation.
  • I have verified that my changes do not introduce (new) build or analyzer warnings.
  • I ran all tests locally using the UA.slnx solution against at least .net framework and .net 10, and all passed.
  • I fixed all failing and flaky tests in the CI pipelines and all CodeQL warnings.
  • I have addressed all PR feedback received.

Draft — do not merge until reviewed.

agent added 4 commits July 3, 2026 15:31
…r budget

A parked Publish (waiting for notifications) now releases its processing worker at
the park point instead of occupying one of MaxRequestThreadCount worker slots for
the whole wait, so the worker/thread budget no longer scales with session count.

- Opc.Ua.Types: new IRequestParkSink; RequestLifetime.ParkSink carrier.
- Opc.Ua.Core: internal RequestParkSink + IParkableIncomingRequest; EndpointIncomingRequest
  creates the sink and flows it onto the RequestLifetime; ServerBase.RequestQueue awaits
  park-or-complete (Task.WhenAny) and releases the active-worker slot at park, observing
  the detached parked task fault-safely. New ServerConfiguration.DecoupleHeldPublishRequests
  (default true) escape hatch, plumbed via InitializeRequestQueue.
- Opc.Ua.Server: StandardServer.PublishAsync flows the sink; SubscriptionManager.PublishAsync
  and SessionPublishQueue.PublishAsync gain IRequestParkSink? overloads; NotifyParked() is
  raised at the single park site (queuing the incomplete Tcs.Task).

Response delivery, ordering, timeout and cancellation are unchanged. net10 + net48 clean.
- Opc.Ua.Core.Tests RequestQueueTests: a parked request releases its worker so a
  single worker services the next request while the first is still parked; the
  DecoupleHeldPublishRequests=false path keeps the worker blocked; 200 parked
  requests are served by a 2-worker pool.
- Opc.Ua.Server.Tests SessionPublishQueueTests: NotifyParked is raised exactly once
  when a Publish parks and never when it returns/faults immediately.
…h hot path free

Only requests that can actually park (Publish long-polls) allocate a park sink and take
the park-or-complete worker path; every other request (Read/Write/Browse/...) keeps the
legacy inline path with no extra per-request allocation or Task.WhenAny. This makes the
dominant non-Publish request path byte-for-byte equivalent to before the change, so there
is no fast-path regression, while held Publishes still release their worker at the park
point. ParkSink is now nullable (null = cannot park).
@CLAassistant

CLAassistant commented Jul 3, 2026

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ marcschier
❌ agent


agent seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Comment thread Docs/ServerScalability.md Outdated
Comment thread Docs/ServerScalability.md Outdated
Comment thread Libraries/Opc.Ua.Server/Subscription/ISubscriptionManager.cs Outdated
Comment thread Libraries/Opc.Ua.Server/Subscription/SessionPublishQueue.cs Outdated
Comment thread Libraries/Opc.Ua.Server/Subscription/SubscriptionManager.cs Outdated
@codecov

codecov Bot commented Jul 3, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 75.00000% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.49%. Comparing base (26ef554) to head (dc8ac3d).

Files with missing lines Patch % Lines
...pc.Ua.Core/Stack/Server/ServerBase.RequestQueue.cs 65.51% 10 Missing ⚠️
Stack/Opc.Ua.Core/Stack/Server/ServerBase.cs 71.42% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3950      +/-   ##
==========================================
- Coverage   73.52%   73.49%   -0.03%     
==========================================
  Files        1170     1171       +1     
  Lines      170163   170202      +39     
  Branches    29361    29366       +5     
==========================================
- Hits       125117   125096      -21     
- Misses      34043    34101      +58     
- Partials    11003    11005       +2     
Files with missing lines Coverage Δ
Libraries/Opc.Ua.Server/Server/StandardServer.cs 75.64% <100.00%> (+0.01%) ⬆️
.../Opc.Ua.Server/Subscription/SessionPublishQueue.cs 85.15% <100.00%> (+0.05%) ⬆️
.../Opc.Ua.Server/Subscription/SubscriptionManager.cs 82.02% <100.00%> (-0.68%) ⬇️
...ack/Server/EndpointBase.EndpointIncomingRequest.cs 81.25% <100.00%> (+1.25%) ⬆️
Stack/Opc.Ua.Core/Stack/Server/RequestParkSink.cs 100.00% <100.00%> (ø)
Stack/Opc.Ua.Types/BuiltIn/RequestLifetime.cs 90.32% <ø> (ø)
Stack/Opc.Ua.Core/Stack/Server/ServerBase.cs 63.22% <71.42%> (+0.03%) ⬆️
...pc.Ua.Core/Stack/Server/ServerBase.RequestQueue.cs 83.49% <65.51%> (-3.10%) ⬇️

... and 15 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- ServerScalability.md: rewrite as user-facing documentation — drop tracking IDs
  (B1-B5/S1-S4) and status/temporal wording (delivered/now/roadmap/options), and
  describe the server's scalability characteristics, admission controls, and the
  held-Publish decoupling as current functionality.
- Remove the delegating PublishAsync compat overloads (2.0-only API, no migration
  needed) in favor of the IRequestParkSink signature throughout:
  ISubscriptionManager, SubscriptionManager, SessionPublishQueue now expose only the
  park-sink method; test/benchmark callers updated to pass the sink (null when none).
@marcschier marcschier changed the title Server: decouple held long-poll Publishes from the request-queue worker budget (S1) Server: decouple held long-poll Publishes from the request-queue worker budget Jul 3, 2026
marcschier added a commit that referenced this pull request Jul 4, 2026
…anch

Integrates PR #3950 "Server: decouple held long-poll Publishes from the
request-queue worker budget" (the S1 scalability item) into this branch.

Conflict resolution: Docs/ServerScalability.md - took #3950's concise rewrite
(its "Admission control and rate limiting" section already summarizes the
rate-limiting + retry-after work, and it references RateLimiting.md for the full
surface) and added a See-also link to RetryAfterSignaling.md. All code files
auto-merged cleanly (StandardServer.cs regions are disjoint).
…y numbers

With DecoupleHeldPublishRequests default-on (S1), a held Publish releases its
request-processing worker at the park point, so the worker pool no longer has to
scale with the session count. An A/B load-test campaign (ServerManySessionsLoadTest,
Xeon W-2235 6c/12t, quiet machine) shows a ~200-worker pool cleanly establishes and
serves ~4000 sessions (8-17x the ~140-350 the coupled-off path manages at the same
budget), while a session-count-sized pool (10500) is ~2x slower to establish and
reaches a LOWER ceiling (thread oversubscription during the RSA-handshake burst).

- LoadTest fixture: MaxRequestThreadCount 10500->200, MinRequestThreadCount 200->50,
  with the rationale comment rewritten for the S1 small-pool model.
- Benchmarks.md: refresh the server-session scalability table (2000/2500/4000 now
  establish cleanly; 10000 remains the establishment wall) and the
  MaxRequestThreadCount guidance (size to active concurrency, not session count).

ServerScalability.md prose was rewritten upstream (8f7a95d); measured numbers now
live in Benchmarks.md, which that doc references.
@marcschier

Copy link
Copy Markdown
Collaborator Author

Merged into #3946. The held long-poll Publish decoupling (S1) commits from this branch — including the latest dc8ac3dc3 (right-size load-test worker pool / refresh benchmark numbers) — are now part of #3946, which combines the admission-control / rate-limiting, retry-after signaling, and held-Publish decoupling work into one server-session-scalability PR. Closing as its changes are carried by #3946.

@marcschier marcschier closed this Jul 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants