Skip to content

Implement pool shutdown for ChannelDbConnectionPool and harden WaitHandleDbConnectionPool shutdown#4302

Open
priyankatiwari08 wants to merge 11 commits into
dotnet:mainfrom
priyankatiwari08:feature/pool-shutdown-pr1
Open

Implement pool shutdown for ChannelDbConnectionPool and harden WaitHandleDbConnectionPool shutdown#4302
priyankatiwari08 wants to merge 11 commits into
dotnet:mainfrom
priyankatiwari08:feature/pool-shutdown-pr1

Conversation

@priyankatiwari08

@priyankatiwari08 priyankatiwari08 commented May 21, 2026

Copy link
Copy Markdown
Contributor

Implements the Shutdown lifecycle method on both connection-pool implementations (ChannelDbConnectionPool and WaitHandleDbConnectionPool) so callers can deterministically stop a pool, drain its idle connections, and wake any parked waiters.

Compatibility and behavior:

• No public API surface change for end users — Shutdown() is an internal pool lifecycle method exposed via IDbConnectionPool. Existing pooling behavior is unchanged until Shutdown()is invoked. •Shutdown() is idempotent and terminal: repeated calls are no-ops and the pool cannot be restarted (Startup()afterShutdown()does not resurrect the pool). • Active checked-out connections are not aborted; they are destroyed instead of pooled when their owners return them. • Parked waiters (syncWaitHandle.WaitAny` and async channel readers) are woken with a non-blocking failure signal rather than left to time out.

Additional changes:

• ChannelDbConnectionPool: adds an Interlocked.CompareExchange-guarded Shutdown() that completes the idle channel writer, drains buffered idle connections, and routes in-flight returners through RemoveConnection via a
race-window guard in ReturnInternalConnection.
• WaitHandleDbConnectionPool: adds Shutdown() that disposes the cleanup and error timers, releases the pool semaphore to wake parked waiters (using Volatile.Read(ref _waitCount) so unlimited pools — MaxPoolSize == 0
— work correctly), and drains _stackNew/_stackOld. Cleanup-callback, error-callback, and the TryGetConnection wait loop now short-circuit when the pool is shutting down. The wait-loop short-circuit also releases the
semaphore slot it consumed so accounting stays balanced.
• IdleConnectionChannel: adds a Complete() helper wrapping ChannelWriter.TryComplete() so the pool can close its idle channel without exposing the raw writer.
• Adds 16 deterministic xUnit tests (7 channel + 9 wait-handle) covering state transition, idle drain, in-flight-returner destruction, timer disposal, callback no-op, idempotency, and waiter wake-up. All unit tests pass on net8.0/net9.0/net10.0/net462.

Out of scope (deferred to a follow-up):

• Transaction-root stasis on shutdown for the channel pool — connections enlisted in active distributed transactions still need to be staked to the transaction so the transaction-end callback can destroy them deterministically.

…ndleDbConnectionPool shutdown (AB#44828)

ChannelDbConnectionPool:
- Shutdown() now transitions State to ShuttingDown (FR-001), completes the idle
  channel writer to unblock async waiters (FR-002, FR-007), and drains and
  destroys buffered idle connections (FR-003). Implementation is idempotent via
  Interlocked.CompareExchange (FR-006).
- Startup() emits a trace and is a structural counterpart to Shutdown.
- ReturnInternalConnection no longer asserts on TryWrite; if the channel was
  completed by a concurrent shutdown, the returning connection is destroyed
  (FR-004 race-window guard).
- IdleConnectionChannel.Complete() exposes channel completion to the pool.

WaitHandleDbConnectionPool:
- Shutdown() is now idempotent (FR-006), disposes both _cleanupTimer and
  _errorTimer (FR-005), wakes threads parked in WaitHandle.WaitAny by releasing
  the pool semaphore (FR-007), and drains _stackNew/_stackOld (FR-003).
- CleanupCallback and ErrorCallback short-circuit when State == ShuttingDown so
  an in-flight callback racing with Shutdown does not re-arm work.
- The private TryGetConnection wait loop re-checks State after WaitHandle.WaitAny
  and bails out, so waiters cannot spin against a drained pool.

Tests:
- ChannelDbConnectionPoolShutdownTest (7 tests) covers state transition, drain,
  return-after-shutdown, idempotency, async waiter unblock, post-shutdown return,
  and Startup-after-Shutdown.
- WaitHandleDbConnectionPoolShutdownTest (9 tests) covers state transition,
  cleanup/error timer disposal, drain, idempotency, callback no-op after
  shutdown, sync TryGetConnection short-circuit, and sync waiter unblock.

Verified by running targeted suite: 340 tests passing across net8.0, net9.0,
net10.0, and net462.

Spec: specs/004-pool-shutdown.
Copilot AI review requested due to automatic review settings May 21, 2026 13:41
@github-project-automation github-project-automation Bot moved this to To triage in SqlClient Board May 21, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements the “pool shutdown” contract (spec 004-pool-shutdown) across both connection pool implementations so that retiring a pool (via SqlConnectionFactory.QueuePoolForRelease) reliably stops background work, unblocks waiters, and destroys idle/returning connections.

Changes:

  • Implement ChannelDbConnectionPool.Shutdown() by transitioning to ShuttingDown, completing/draining the idle channel, and making shutdown idempotent.
  • Harden WaitHandleDbConnectionPool.Shutdown() by making it idempotent, disposing timers, draining idle stacks, and attempting to wake blocked synchronous waiters; guard timer callbacks during shutdown.
  • Add unit tests covering shutdown behavior for both pool types.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/ConnectionPool/ChannelDbConnectionPool.cs Implements real shutdown behavior (state transition, channel completion/drain, safer return-on-shutdown) and traces Startup().
src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/ConnectionPool/IdleConnectionChannel.cs Adds Complete() to allow the pool to terminate the channel writer on shutdown.
src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/ConnectionPool/WaitHandleDbConnectionPool.cs Adds shutdown idempotency, disposes both timers, drains idle stacks, attempts to unblock sync waiters, and guards timer callbacks during shutdown.
src/Microsoft.Data.SqlClient/tests/UnitTests/ConnectionPool/ChannelDbConnectionPoolShutdownTest.cs Adds unit coverage for channel-pool shutdown semantics (drain, idempotency, unblock waiters, destroy-on-return).
src/Microsoft.Data.SqlClient/tests/UnitTests/ConnectionPool/WaitHandleDbConnectionPoolShutdownTest.cs Adds unit coverage for wait-handle-pool shutdown semantics (timer disposal, drain, idempotency, unblock waiter, callback no-op).

@priyankatiwari08 priyankatiwari08 added this to the 7.1.0-preview2 milestone May 21, 2026
@priyankatiwari08 priyankatiwari08 marked this pull request as ready for review May 29, 2026 10:12
@priyankatiwari08 priyankatiwari08 requested a review from a team as a code owner May 29, 2026 10:12
  Code review fixes (WaitHandleDbConnectionPool):
  - TryGetConnection shutdown short-circuit now releases the PoolSemaphore
    slot when WaitAny was woken by SEMAPHORE_HANDLE (or
    WAIT_ABANDONED+SEMAPHORE_HANDLE), preventing a semaphore-slot leak that
    would starve future waiters.
  - Shutdown() releases Volatile.Read(ref _waitCount) slots instead of
    MaxPoolSize, fixing ArgumentOutOfRangeException when MaxPoolSize == 0
    (unlimited pool) and ensuring every parked waiter is woken under high
    contention. Kept SemaphoreFullException guard.

  Test fix:
  - Shutdown_UnblocksSyncWaiter replaces Thread.Sleep(200) with a bounded
    poll on the private _waitCount field (via the existing reflection
    helper) so the test is deterministic on slow CI agents.

  Documentation cleanup:
  - Removed inline 'FR-00x' spec-traceability labels from comments and test
    section banners across both pool implementations and both shutdown
    test files. Explanatory text is preserved; the spec coverage will live
    in the PR description instead of the source.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

@priyankatiwari08 priyankatiwari08 moved this from To triage to In review in SqlClient Board May 29, 2026
// Dispose all background timers so they no longer schedule new work.
// Note that any timer callback already in flight may still observe State == ShuttingDown
// and short-circuit (see CleanupCallback / ErrorCallback).
Timer cleanup = Interlocked.Exchange(ref _cleanupTimer, null);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make sure we've addressed this bug: #1881

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR adds the per-pool Shutdown() machinery that #1881 ultimately needs, but it doesn't actually touch SqlConnectionFactory._pruningTimer — so the symptom (the PruneConnectionPoolGroups trace firing every 30 s after ClearAllPools()) will still reproduce. I'd prefer to land #4302 as the foundation and address #1881 in a small follow-up that adds a "park the timer when there's no work" guard to PruneConnectionPoolGroups and a re-arm in GetConnectionPoolGroup. Happy to file that follow-up issue and link it.

@github-project-automation github-project-automation Bot moved this from In review to Waiting for customer in SqlClient Board Jun 2, 2026
@priyankatiwari08 priyankatiwari08 marked this pull request as draft June 5, 2026 08:03
…tionPool shutdown tests

Replace reflection-based access of WaitHandleDbConnectionPool internals from the shutdown unit tests with direct field/method access. The pool's _waitCount, _errorTimer, _cleanupTimer, CleanupCallback, and ErrorCallback are now declared internal so the UnitTests assembly (covered by InternalsVisibleTo) can reach them without GetField/Invoke. The GetPrivateField<T> helper is removed.

Per @mdaigle review feedback on dotnet#4302.
…pools

Both pool implementations now delegate the connection drain in Shutdown() to the existing Clear() routine instead of duplicating the drain inline.

ChannelDbConnectionPool.Shutdown(): keeps the CompareExchange election, State transition and _idleChannel.Complete() call, then calls Clear() (which bumps _clearGeneration and best-effort drains the idle channel). A final unbounded drain loop is retained as a mop-up because Clear() may short-circuit when another Clear() is concurrently in flight; the final loop is safe because the channel is already completed and no new writes can succeed.

WaitHandleDbConnectionPool.Shutdown(): keeps the trace, idempotent State check, timer disposal and waiter wake-up, then replaces the inline _stackNew / _stackOld drain loops with a single Clear() call. Clear() additionally dooms every entry in _objectList, decrements the ExitFreeConnection metric per drained connection, and runs ReclaimEmancipatedObjects() — strictly more cleanup than the previous inline drain, with no behaviour change for normal Running-state operations.

Per @mdaigle review feedback on dotnet#4302.
@priyankatiwari08 priyankatiwari08 moved this from Waiting for customer to In progress in SqlClient Board Jun 5, 2026
priyankatiwari08 added a commit to priyankatiwari08/SqlClient that referenced this pull request Jun 5, 2026
…ryGetConnection in shutdown tests

PR dotnet#4295 (merged to main) added a required TimeoutTimer parameter to ChannelDbConnectionPool.TryGetConnection. The shutdown tests added in this PR still call the pre-dotnet#4295 3-arg overload, which compiles locally against PoolShutdown's older base but fails in the CI pipeline because the pipeline merges this PR's head into the post-dotnet#4295 main. Pass TimeoutTimer.StartNew(TimeSpan.FromSeconds(15)) at all 5 call sites to match the new signature.
Copilot AI review requested due to automatic review settings June 5, 2026 14:55

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

…ryGetConnection in shutdown tests

PR 4295 (merged to main) added a required TimeoutTimer parameter to ChannelDbConnectionPool.TryGetConnection. The shutdown tests added in this PR still call the pre-4295 3-arg overload, which compiles locally against PoolShutdown's older base but fails in the CI pipeline because the pipeline merges this PR's head into the post-4295 main. Pass TimeoutTimer.StartNew(TimeSpan.FromSeconds(15)) at all 5 call sites to match the new signature.
@priyankatiwari08 priyankatiwari08 force-pushed the feature/pool-shutdown-pr1 branch from db3d616 to 9a45a66 Compare June 5, 2026 15:08
Copilot AI review requested due to automatic review settings June 5, 2026 15:24

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

…hutdown tests

The TryGetConnection method on main now requires a TimeoutTimer argument. Update all 5 call sites in WaitHandleDbConnectionPoolShutdownTest.cs to pass TimeoutTimer.StartNew(TimeSpan.FromSeconds(15)) so the test project compiles against the current 4-arg signature.
@priyankatiwari08 priyankatiwari08 force-pushed the feature/pool-shutdown-pr1 branch from aec0b28 to 954679d Compare June 5, 2026 15:55
Resolve conflict in ChannelDbConnectionPool.Shutdown(): integrate PoolPruner.Dispose() from upstream dotnet#4304 with the branch's idempotent shutdown sequence (CAS guard, State transition, channel completion, Clear() reuse, final drain). Pruner is disposed before the channel is completed and drained so a pruning tick cannot race with the final drain. PoolPruner.Dispose is idempotent and non-throwing.
Copilot AI review requested due to automatic review settings June 11, 2026 08:41
…atile.Read for _waitCount

- WaitHandleDbConnectionPoolShutdownTest.cs: remove unused 'using System.Threading.Tasks;' and read pool._waitCount via Volatile.Read in both the poll-loop condition and the assertion so the test no longer relies on plain int reads observing another thread's Interlocked.Increment.\n- ChannelDbConnectionPoolShutdownTest.cs: remove unused 'using System.Data.Common;'.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

…Connection short-circuits when State != Running

Mirror WaitHandleDbConnectionPool.TryGetConnection: when the pool has been shut down (or is not yet Running), return (true, null) immediately instead of entering the channel-reader path. The previous behavior could reserve a connection slot via _connectionSlots.Add, open a fresh physical connection, hand it to the caller, and then immediately destroy it on return because ReturnInternalConnection checks State == ShuttingDown. The short-circuit avoids the unnecessary open/close round-trip for both sync and async paths. Adds two regression tests.
…l.Startup() a no-op when ShuttingDown

Guard Startup() against being called after Shutdown(). Without this, Startup() would create a fresh _cleanupTimer and queue a PoolCreateRequest against a pool that will never accept connections back, leaking the timer and scheduling background work that immediately short-circuits on State == ShuttingDown. Adds Startup_AfterShutdown_DoesNotResurrectPool regression test.
…nostics;' from ChannelDbConnectionPool

Commit 6431209 replaced the only Debug.Assert in ChannelDbConnectionPool.cs with a race-window-guarded RemoveConnection call but left the 'using System.Diagnostics;' directive in place. Drop the now-unused directive so CS8019 cannot surface where it is promoted to an error.
Copilot AI review requested due to automatic review settings June 11, 2026 12:37

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

@priyankatiwari08 priyankatiwari08 marked this pull request as ready for review June 18, 2026 13:37
@priyankatiwari08

Copy link
Copy Markdown
Contributor Author

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 2 pipeline(s).

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Comment on lines +1013 to +1017
if (State != Running)
{
SqlClientEventSource.Log.TryPoolerTraceEvent("<prov.DbConnectionPool.GetConnection|RES|CPOOL> {0}, Pool is shutting down; abandoning wait.", Id);
if (waitResult == SEMAPHORE_HANDLE || waitResult == WAIT_ABANDONED + SEMAPHORE_HANDLE)
{

@paulmedynski paulmedynski left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will look at the tests once we settle the state discussion - most of my comments relate to the State enum and not understanding its intended use.

// Returning (true, null) matches WaitHandleDbConnectionPool.TryGetConnection and tells
// the caller "completed; no connection available" without entering the channel path,
// which would otherwise reserve a slot, attempt to open a fresh physical connection,
// and then immediately destroy it on return because State == ShuttingDown.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not assume that ShuttingDown is the only possible other state - this leads to misleading docs as time goes on.

"... destroy it on return because the pool isn't running."

That is (mostly) future proof.

}
// null sentinels are wake-up signals only; nothing to destroy.
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we finished the shutdown process here? Should we transition to State Shutdown (which doesn't exixst yet)?

What happens if anything between setting State = ShuttingDown and here throws? Should we allow a subsequent Shutdown() to try again? Should Shutdown() be idempotent?

/// </summary>
/// <returns><see langword="true"/> if this call completed the channel; otherwise <see langword="false"/>
/// (channel was already completed).</returns>
internal bool Complete() => _writer.TryComplete();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing tests for this new method.

// If the pool is shutting down, skip work. Shutdown disposes the timer, but
// a callback may already be in-flight when Shutdown runs; this guard ensures it does
// not perform pruning or re-arm pool create requests.
if (State == ShuttingDown)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think State != Running would be more future proof. And update the comment to "If the pool isn't running, ..."

Same for ErrorCallback() below.

// Release loop) would starve.
if (State != Running)
{
SqlClientEventSource.Log.TryPoolerTraceEvent("<prov.DbConnectionPool.GetConnection|RES|CPOOL> {0}, Pool is shutting down; abandoning wait.", Id);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All we know is that the pool isn't running. We don't know that it is shutting down. Logs shouldn't make that assumption.

"Pool is not running; abandoning wait"

Look for other cases where the logic, docs, or logs assume that "not Running" means "Shutting Down", which isn't future proof.

I'm not sure there are many cases where we actually want to check if State == ShuttingDown. I think State != Running is the correct check. We chose an enum for State presumably so we can add states in the future. If it's just 2 values, then it should be a bool isRunning.

// Shutdown() would create a fresh _cleanupTimer and queue a PoolCreateRequest
// against a pool that will never accept connections back, leaking the timer
// and scheduling background work that immediately short-circuits.
if (State is ShuttingDown)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another case where State != Running is the correct check.

public void Startup()
{
// No-op for now, warmup will be implemented later.
// This pool has no background timers today (idle timeout is enforced lazily in

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But does it make sense to set Running in the constructor before Startup() is called?

Maybe we need a few more states, and they should be consistently set across both pools:

Created - set by constructors
Starting - set when Startup() is called, before it does any work.
Started - set when Startup() completes successfully.
Stopping - set when Shutdown() is called, before it does any work.
Stopped - set when Shutdown() completes successfully.

Then Startup() and Shutdown() can be retryable and idempotent.

@mdaigle - Thoughts? Is this overkill? Do we just need an isRunning bool instead?

@github-project-automation github-project-automation Bot moved this from In progress to Waiting for customer in SqlClient Board Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Waiting for customer

Development

Successfully merging this pull request may close these issues.

4 participants