Revert "ACPI: OSL: Use a threaded interrupt handler for SCI"#580
Revert "ACPI: OSL: Use a threaded interrupt handler for SCI"#580ymd-arista wants to merge 1 commit into
Conversation
This reverts commit 7a36b901a6eb0e9945341db71ed3c45c7721cfa9.
After upgrading from Debian bookworm to trixie on modular systems,
the kdump kernel started hitting a soft lockup while capturing a
crash dump. The issue is reproducible by triggering a panic in the
production kernel with:
echo c | sudo tee /proc/sysrq-trigger
Once the kdump kernel boots, CPU0 gets stuck in the ACPI SCI handling
path and the soft lockup watchdog eventually panics the kdump kernel,
so no vmcore is produced.
The trace below was obtained by adding the following to the kdump
command line: debug=1, loglevel=7, softlockup_all_cpu_backtrace=1 and
softlockup_panic=1:
watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [irq/9-acpi:39]
CPU: 0 UID: 0 PID: 39 Comm: irq/9-acpi Not tainted
6.12.41+deb13-sonic-amd64 sonic-net#1 Debian 6.12.41-1
Hardware name: Intel Camelback Mountain CRB, BIOS
Aboot-norcal7-7.1.6-generic-22971530 06/30/2021
RIP: 0010:acpi_os_read_port+0x30/0xa0
Call Trace:
<TASK>
acpi_hw_gpe_read+0x61/0x80
acpi_ev_detect_gpe+0x74/0x180
acpi_ev_gpe_detect+0xe1/0x130
acpi_ev_sci_xrupt_handler+0x1d/0x40
acpi_irq+0x1c/0x40
irq_thread_fn+0x23/0x60
irq_thread+0x1b3/0x2f0
kthread+0xd2/0x100
ret_from_fork+0x34/0x50
ret_from_fork_asm+0x1a/0x30
</TASK>
Kernel panic - not syncing: softlockup: hung tasks
Comparing the bookworm and trixie kernels, the SCI handler was moved
from a hardirq handler to a threaded handler by the commit being
reverted. Moving to a threaded IRQ regressed kdump on this hardware;
reverting that commit restores the previous hardirq-based SCI handling
and the kdump kernel completes the crash dump without triggering the
soft lockup watchdog.
Signed-off-by: Mohan Yelugoti <ymd@arista.com>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@saiarcot895 : This is another fallout caused by moving from bookworm to trixie. The soft lockup inside kdump kernel was reproduced each time on both supervisor and the linecard. |
|
Please report this to the upstream list with the people involved in the patch in Cc:, just to get their feedback. PS: Another regression the commit caused, but was fixed in a follow up. |
|
As Paul said, please report this upstream so that a proper permanent fix can be made. See this for reporting issues. Based on the mail thread for the other regression, I'd recommend sending the email to linux-acpi@vger.kernel.org and CC the patch author. |
|
Thanks @paulmenzel and @saiarcot895 . I am currently working on the upstream email. Will link the lkml thread once I report it upstream. |
|
Upstream reports: |
This reverts commit 7a36b901a6eb0e9945341db71ed3c45c7721cfa9.
After upgrading from Debian bookworm to trixie on modular systems, the kdump kernel started hitting a soft lockup while capturing a crash dump. The issue is reproducible by triggering a panic in the production kernel with:
echo c | sudo tee /proc/sysrq-trigger
Once the kdump kernel boots, CPU0 gets stuck in the ACPI SCI handling path and the soft lockup watchdog eventually panics the kdump kernel, so no vmcore is produced.
The trace below was obtained by adding the following to the kdump command line: debug=1, loglevel=7, softlockup_all_cpu_backtrace=1 and softlockup_panic=1:
Comparing the bookworm and trixie kernels, the SCI handler was moved from a hardirq handler to a threaded handler by the commit being reverted. Moving to a threaded IRQ regressed kdump on this hardware; reverting that commit restores the previous hardirq-based SCI handling and the kdump kernel completes the crash dump without triggering the soft lockup watchdog.