Implement pool shutdown for ChannelDbConnectionPool and harden WaitHandleDbConnectionPool shutdown#4302
Conversation
…ndleDbConnectionPool shutdown (AB#44828) ChannelDbConnectionPool: - Shutdown() now transitions State to ShuttingDown (FR-001), completes the idle channel writer to unblock async waiters (FR-002, FR-007), and drains and destroys buffered idle connections (FR-003). Implementation is idempotent via Interlocked.CompareExchange (FR-006). - Startup() emits a trace and is a structural counterpart to Shutdown. - ReturnInternalConnection no longer asserts on TryWrite; if the channel was completed by a concurrent shutdown, the returning connection is destroyed (FR-004 race-window guard). - IdleConnectionChannel.Complete() exposes channel completion to the pool. WaitHandleDbConnectionPool: - Shutdown() is now idempotent (FR-006), disposes both _cleanupTimer and _errorTimer (FR-005), wakes threads parked in WaitHandle.WaitAny by releasing the pool semaphore (FR-007), and drains _stackNew/_stackOld (FR-003). - CleanupCallback and ErrorCallback short-circuit when State == ShuttingDown so an in-flight callback racing with Shutdown does not re-arm work. - The private TryGetConnection wait loop re-checks State after WaitHandle.WaitAny and bails out, so waiters cannot spin against a drained pool. Tests: - ChannelDbConnectionPoolShutdownTest (7 tests) covers state transition, drain, return-after-shutdown, idempotency, async waiter unblock, post-shutdown return, and Startup-after-Shutdown. - WaitHandleDbConnectionPoolShutdownTest (9 tests) covers state transition, cleanup/error timer disposal, drain, idempotency, callback no-op after shutdown, sync TryGetConnection short-circuit, and sync waiter unblock. Verified by running targeted suite: 340 tests passing across net8.0, net9.0, net10.0, and net462. Spec: specs/004-pool-shutdown.
There was a problem hiding this comment.
Pull request overview
This PR implements the “pool shutdown” contract (spec 004-pool-shutdown) across both connection pool implementations so that retiring a pool (via SqlConnectionFactory.QueuePoolForRelease) reliably stops background work, unblocks waiters, and destroys idle/returning connections.
Changes:
- Implement
ChannelDbConnectionPool.Shutdown()by transitioning toShuttingDown, completing/draining the idle channel, and making shutdown idempotent. - Harden
WaitHandleDbConnectionPool.Shutdown()by making it idempotent, disposing timers, draining idle stacks, and attempting to wake blocked synchronous waiters; guard timer callbacks during shutdown. - Add unit tests covering shutdown behavior for both pool types.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/ConnectionPool/ChannelDbConnectionPool.cs | Implements real shutdown behavior (state transition, channel completion/drain, safer return-on-shutdown) and traces Startup(). |
| src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/ConnectionPool/IdleConnectionChannel.cs | Adds Complete() to allow the pool to terminate the channel writer on shutdown. |
| src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/ConnectionPool/WaitHandleDbConnectionPool.cs | Adds shutdown idempotency, disposes both timers, drains idle stacks, attempts to unblock sync waiters, and guards timer callbacks during shutdown. |
| src/Microsoft.Data.SqlClient/tests/UnitTests/ConnectionPool/ChannelDbConnectionPoolShutdownTest.cs | Adds unit coverage for channel-pool shutdown semantics (drain, idempotency, unblock waiters, destroy-on-return). |
| src/Microsoft.Data.SqlClient/tests/UnitTests/ConnectionPool/WaitHandleDbConnectionPoolShutdownTest.cs | Adds unit coverage for wait-handle-pool shutdown semantics (timer disposal, drain, idempotency, unblock waiter, callback no-op). |
Code review fixes (WaitHandleDbConnectionPool):
- TryGetConnection shutdown short-circuit now releases the PoolSemaphore
slot when WaitAny was woken by SEMAPHORE_HANDLE (or
WAIT_ABANDONED+SEMAPHORE_HANDLE), preventing a semaphore-slot leak that
would starve future waiters.
- Shutdown() releases Volatile.Read(ref _waitCount) slots instead of
MaxPoolSize, fixing ArgumentOutOfRangeException when MaxPoolSize == 0
(unlimited pool) and ensuring every parked waiter is woken under high
contention. Kept SemaphoreFullException guard.
Test fix:
- Shutdown_UnblocksSyncWaiter replaces Thread.Sleep(200) with a bounded
poll on the private _waitCount field (via the existing reflection
helper) so the test is deterministic on slow CI agents.
Documentation cleanup:
- Removed inline 'FR-00x' spec-traceability labels from comments and test
section banners across both pool implementations and both shutdown
test files. Explanatory text is preserved; the spec coverage will live
in the PR description instead of the source.
| // Dispose all background timers so they no longer schedule new work. | ||
| // Note that any timer callback already in flight may still observe State == ShuttingDown | ||
| // and short-circuit (see CleanupCallback / ErrorCallback). | ||
| Timer cleanup = Interlocked.Exchange(ref _cleanupTimer, null); |
There was a problem hiding this comment.
Let's make sure we've addressed this bug: #1881
There was a problem hiding this comment.
This PR adds the per-pool Shutdown() machinery that #1881 ultimately needs, but it doesn't actually touch SqlConnectionFactory._pruningTimer — so the symptom (the PruneConnectionPoolGroups trace firing every 30 s after ClearAllPools()) will still reproduce. I'd prefer to land #4302 as the foundation and address #1881 in a small follow-up that adds a "park the timer when there's no work" guard to PruneConnectionPoolGroups and a re-arm in GetConnectionPoolGroup. Happy to file that follow-up issue and link it.
…tionPool shutdown tests Replace reflection-based access of WaitHandleDbConnectionPool internals from the shutdown unit tests with direct field/method access. The pool's _waitCount, _errorTimer, _cleanupTimer, CleanupCallback, and ErrorCallback are now declared internal so the UnitTests assembly (covered by InternalsVisibleTo) can reach them without GetField/Invoke. The GetPrivateField<T> helper is removed. Per @mdaigle review feedback on dotnet#4302.
…pools Both pool implementations now delegate the connection drain in Shutdown() to the existing Clear() routine instead of duplicating the drain inline. ChannelDbConnectionPool.Shutdown(): keeps the CompareExchange election, State transition and _idleChannel.Complete() call, then calls Clear() (which bumps _clearGeneration and best-effort drains the idle channel). A final unbounded drain loop is retained as a mop-up because Clear() may short-circuit when another Clear() is concurrently in flight; the final loop is safe because the channel is already completed and no new writes can succeed. WaitHandleDbConnectionPool.Shutdown(): keeps the trace, idempotent State check, timer disposal and waiter wake-up, then replaces the inline _stackNew / _stackOld drain loops with a single Clear() call. Clear() additionally dooms every entry in _objectList, decrements the ExitFreeConnection metric per drained connection, and runs ReclaimEmancipatedObjects() — strictly more cleanup than the previous inline drain, with no behaviour change for normal Running-state operations. Per @mdaigle review feedback on dotnet#4302.
…ryGetConnection in shutdown tests PR dotnet#4295 (merged to main) added a required TimeoutTimer parameter to ChannelDbConnectionPool.TryGetConnection. The shutdown tests added in this PR still call the pre-dotnet#4295 3-arg overload, which compiles locally against PoolShutdown's older base but fails in the CI pipeline because the pipeline merges this PR's head into the post-dotnet#4295 main. Pass TimeoutTimer.StartNew(TimeSpan.FromSeconds(15)) at all 5 call sites to match the new signature.
…ryGetConnection in shutdown tests PR 4295 (merged to main) added a required TimeoutTimer parameter to ChannelDbConnectionPool.TryGetConnection. The shutdown tests added in this PR still call the pre-4295 3-arg overload, which compiles locally against PoolShutdown's older base but fails in the CI pipeline because the pipeline merges this PR's head into the post-4295 main. Pass TimeoutTimer.StartNew(TimeSpan.FromSeconds(15)) at all 5 call sites to match the new signature.
db3d616 to
9a45a66
Compare
…hutdown tests The TryGetConnection method on main now requires a TimeoutTimer argument. Update all 5 call sites in WaitHandleDbConnectionPoolShutdownTest.cs to pass TimeoutTimer.StartNew(TimeSpan.FromSeconds(15)) so the test project compiles against the current 4-arg signature.
aec0b28 to
954679d
Compare
Resolve conflict in ChannelDbConnectionPool.Shutdown(): integrate PoolPruner.Dispose() from upstream dotnet#4304 with the branch's idempotent shutdown sequence (CAS guard, State transition, channel completion, Clear() reuse, final drain). Pruner is disposed before the channel is completed and drained so a pruning tick cannot race with the final drain. PoolPruner.Dispose is idempotent and non-throwing.
…atile.Read for _waitCount - WaitHandleDbConnectionPoolShutdownTest.cs: remove unused 'using System.Threading.Tasks;' and read pool._waitCount via Volatile.Read in both the poll-loop condition and the assertion so the test no longer relies on plain int reads observing another thread's Interlocked.Increment.\n- ChannelDbConnectionPoolShutdownTest.cs: remove unused 'using System.Data.Common;'.
…Connection short-circuits when State != Running Mirror WaitHandleDbConnectionPool.TryGetConnection: when the pool has been shut down (or is not yet Running), return (true, null) immediately instead of entering the channel-reader path. The previous behavior could reserve a connection slot via _connectionSlots.Add, open a fresh physical connection, hand it to the caller, and then immediately destroy it on return because ReturnInternalConnection checks State == ShuttingDown. The short-circuit avoids the unnecessary open/close round-trip for both sync and async paths. Adds two regression tests.
…l.Startup() a no-op when ShuttingDown Guard Startup() against being called after Shutdown(). Without this, Startup() would create a fresh _cleanupTimer and queue a PoolCreateRequest against a pool that will never accept connections back, leaking the timer and scheduling background work that immediately short-circuits on State == ShuttingDown. Adds Startup_AfterShutdown_DoesNotResurrectPool regression test.
…nostics;' from ChannelDbConnectionPool Commit 6431209 replaced the only Debug.Assert in ChannelDbConnectionPool.cs with a race-window-guarded RemoveConnection call but left the 'using System.Diagnostics;' directive in place. Drop the now-unused directive so CS8019 cannot surface where it is promoted to an error.
|
/azp run |
|
Azure Pipelines successfully started running 2 pipeline(s). |
| if (State != Running) | ||
| { | ||
| SqlClientEventSource.Log.TryPoolerTraceEvent("<prov.DbConnectionPool.GetConnection|RES|CPOOL> {0}, Pool is shutting down; abandoning wait.", Id); | ||
| if (waitResult == SEMAPHORE_HANDLE || waitResult == WAIT_ABANDONED + SEMAPHORE_HANDLE) | ||
| { |
paulmedynski
left a comment
There was a problem hiding this comment.
I will look at the tests once we settle the state discussion - most of my comments relate to the State enum and not understanding its intended use.
| // Returning (true, null) matches WaitHandleDbConnectionPool.TryGetConnection and tells | ||
| // the caller "completed; no connection available" without entering the channel path, | ||
| // which would otherwise reserve a slot, attempt to open a fresh physical connection, | ||
| // and then immediately destroy it on return because State == ShuttingDown. |
There was a problem hiding this comment.
Let's not assume that ShuttingDown is the only possible other state - this leads to misleading docs as time goes on.
"... destroy it on return because the pool isn't running."
That is (mostly) future proof.
| } | ||
| // null sentinels are wake-up signals only; nothing to destroy. | ||
| } | ||
| } |
There was a problem hiding this comment.
Have we finished the shutdown process here? Should we transition to State Shutdown (which doesn't exixst yet)?
What happens if anything between setting State = ShuttingDown and here throws? Should we allow a subsequent Shutdown() to try again? Should Shutdown() be idempotent?
| /// </summary> | ||
| /// <returns><see langword="true"/> if this call completed the channel; otherwise <see langword="false"/> | ||
| /// (channel was already completed).</returns> | ||
| internal bool Complete() => _writer.TryComplete(); |
There was a problem hiding this comment.
Missing tests for this new method.
| // If the pool is shutting down, skip work. Shutdown disposes the timer, but | ||
| // a callback may already be in-flight when Shutdown runs; this guard ensures it does | ||
| // not perform pruning or re-arm pool create requests. | ||
| if (State == ShuttingDown) |
There was a problem hiding this comment.
I think State != Running would be more future proof. And update the comment to "If the pool isn't running, ..."
Same for ErrorCallback() below.
| // Release loop) would starve. | ||
| if (State != Running) | ||
| { | ||
| SqlClientEventSource.Log.TryPoolerTraceEvent("<prov.DbConnectionPool.GetConnection|RES|CPOOL> {0}, Pool is shutting down; abandoning wait.", Id); |
There was a problem hiding this comment.
All we know is that the pool isn't running. We don't know that it is shutting down. Logs shouldn't make that assumption.
"Pool is not running; abandoning wait"
Look for other cases where the logic, docs, or logs assume that "not Running" means "Shutting Down", which isn't future proof.
I'm not sure there are many cases where we actually want to check if State == ShuttingDown. I think State != Running is the correct check. We chose an enum for State presumably so we can add states in the future. If it's just 2 values, then it should be a bool isRunning.
| // Shutdown() would create a fresh _cleanupTimer and queue a PoolCreateRequest | ||
| // against a pool that will never accept connections back, leaking the timer | ||
| // and scheduling background work that immediately short-circuits. | ||
| if (State is ShuttingDown) |
There was a problem hiding this comment.
Another case where State != Running is the correct check.
| public void Startup() | ||
| { | ||
| // No-op for now, warmup will be implemented later. | ||
| // This pool has no background timers today (idle timeout is enforced lazily in |
There was a problem hiding this comment.
But does it make sense to set Running in the constructor before Startup() is called?
Maybe we need a few more states, and they should be consistently set across both pools:
Created - set by constructors
Starting - set when Startup() is called, before it does any work.
Started - set when Startup() completes successfully.
Stopping - set when Shutdown() is called, before it does any work.
Stopped - set when Shutdown() completes successfully.
Then Startup() and Shutdown() can be retryable and idempotent.
@mdaigle - Thoughts? Is this overkill? Do we just need an isRunning bool instead?
Implements the Shutdown lifecycle method on both connection-pool implementations (ChannelDbConnectionPool and WaitHandleDbConnectionPool) so callers can deterministically stop a pool, drain its idle connections, and wake any parked waiters.
Compatibility and behavior:
• No public API surface change for end users —
Shutdown()is an internal pool lifecycle method exposed via IDbConnectionPool. Existing pooling behavior is unchanged untilShutdown()is invoked. •Shutdown()is idempotent and terminal: repeated calls are no-ops and the pool cannot be restarted (Startup()afterShutdown()does not resurrect the pool). • Active checked-out connections are not aborted; they are destroyed instead of pooled when their owners return them. • Parked waiters (syncWaitHandle.WaitAny` and async channel readers) are woken with a non-blocking failure signal rather than left to time out.Additional changes:
• ChannelDbConnectionPool: adds an
Interlocked.CompareExchange-guardedShutdown()that completes the idle channel writer, drains buffered idle connections, and routes in-flight returners throughRemoveConnectionvia arace-window guard in
ReturnInternalConnection.• WaitHandleDbConnectionPool: adds
Shutdown()that disposes the cleanup and error timers, releases the pool semaphore to wake parked waiters (usingVolatile.Read(ref _waitCount)so unlimited pools — MaxPoolSize == 0— work correctly), and drains
_stackNew/_stackOld. Cleanup-callback, error-callback, and theTryGetConnectionwait loop now short-circuit when the pool is shutting down. The wait-loop short-circuit also releases thesemaphore slot it consumed so accounting stays balanced.
• IdleConnectionChannel: adds a
Complete()helper wrappingChannelWriter.TryComplete()so the pool can close its idle channel without exposing the raw writer.• Adds 16 deterministic xUnit tests (7 channel + 9 wait-handle) covering state transition, idle drain, in-flight-returner destruction, timer disposal, callback no-op, idempotency, and waiter wake-up. All unit tests pass on net8.0/net9.0/net10.0/net462.
Out of scope (deferred to a follow-up):
• Transaction-root stasis on shutdown for the channel pool — connections enlisted in active distributed transactions still need to be staked to the transaction so the transaction-end callback can destroy them deterministically.