Skip to content

fix(sched): break two-CPU off-CPU deadlock in wake() + reproduction test#127

Closed
FlareCoding wants to merge 2 commits into
masterfrom
cursor/sched-wake-off-cpu-deadlock-test-7816
Closed

fix(sched): break two-CPU off-CPU deadlock in wake() + reproduction test#127
FlareCoding wants to merge 2 commits into
masterfrom
cursor/sched-wake-off-cpu-deadlock-test-7816

Conversation

@FlareCoding

Copy link
Copy Markdown
Owner

Summary

Adds a deterministic SMP unit test that reproduces a real two-CPU deadlock in sched::wake(), then fixes the deadlock in an interrupt-safe way.

The bug

A task's exec.on_cpu is set to 1 when it is switched in and is cleared only lazily at the owning CPU's next scheduler trap (the switched-out task is parked in the per-CPU pending_off_cpu_task and published by finalize_pending_off_cpu()). sched::wake() spin-waits on a remote task's on_cpu before enqueuing it.

If two CPUs each hold a just-switched-out task in their pending slot (on_cpu still 1) and each tries to wake the task parked on the other CPU while running with interrupts disabled (e.g. from an MSI/IRQ-context wake_one), neither CPU can reach a scheduler trap to clear the flag the other is spinning on → permanent mutual stall.

The test

kernel/tests/sched/wake_off_cpu_deadlock.test.cpp fabricates the exact precondition deterministically: two controller tasks pin themselves to distinct non-BSP CPUs with interrupts disabled, each parks a victim in its own CPU's pending slot via sched::defer_off_cpu_finalize(), and after a barrier each cross-wakes the other CPU's victim. The BSP detects a stall via a bounded watchdog and, on failure, recovers the wedged CPUs so the rest of the suite still runs. The test fails on the unfixed scheduler and passes once wake() is fixed.

The fix

sched::wake() now publishes this CPU's own deferred off-CPU task (finalize_pending_off_cpu()) before entering the cross-CPU on_cpu spin, breaking the cycle. Because finalize_pending_off_cpu() performs a non-atomic read-modify-clear of the per-CPU pending_off_cpu_task and is otherwise only ever called from the IRQ-masked scheduler trap paths, the call is wrapped in cpu::irq_save()/cpu::irq_restore(). wake() is routinely called with interrupts enabled (futex / wait-queue / mutex wakeups), so without the guard a timer tick landing between the read and the clear could drop a different task's pending entry — an equally severe regression. The guard preserves the function's required execution context.

Testing

  • make test ARCH=x86_64
  • make test ARCH=aarch64
Open in Web Open in Cursor 

cursoragent and others added 2 commits June 4, 2026 01:00
Adds a deterministic SMP unit test that fabricates the exact precondition
for the wake() off-CPU spin deadlock: two controller tasks pin themselves
to distinct CPUs with interrupts disabled, each parks a victim task in its
own CPU's pending_off_cpu slot (on_cpu=1), then cross-wakes the other CPU's
victim. With the current wake() implementation both controllers spin
forever on each other's on_cpu; the BSP detects the stall via a bounded
watchdog, fails the test, and recovers the wedged CPUs so the suite can
continue.

This test fails on the current code and is expected to pass once wake()
publishes its own deferred off-CPU task before the cross-CPU spin.

Co-authored-by: Albert Slepak <FlareCoding@users.noreply.github.com>
wake() spin-waits on a remote task's exec.on_cpu before enqueuing it, but a
switched-out task's on_cpu is cleared only lazily at the owning CPU's next
scheduler trap (finalize_pending_off_cpu()). Two CPUs that each wake a task
still pending-off-CPU on the other, while running with interrupts disabled,
spin forever: neither reaches a trap to clear the flag the other awaits.

Publish this CPU's own deferred off-CPU task before the spin so the cycle
cannot form. finalize_pending_off_cpu() does a non-atomic read-modify-clear
of the per-CPU pending_off_cpu_task and is otherwise only invoked from the
IRQ-masked scheduler trap paths; wake() runs with interrupts enabled on the
futex/wait-queue/mutex paths, so the call is wrapped in irq_save/irq_restore
to preserve that contract and prevent a timer tick from corrupting the
pending slot.

Verified by tests/sched/wake_off_cpu_deadlock.test.cpp, which deadlocks
without this change and passes with it, on both x86_64 and aarch64.

Co-authored-by: Albert Slepak <FlareCoding@users.noreply.github.com>
@cursor cursor Bot deleted the cursor/sched-wake-off-cpu-deadlock-test-7816 branch June 4, 2026 02:00
@FlareCoding FlareCoding closed this Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants