Skip to content

clk: qcom: Add DISPCC and GPUCC support for the Qualcomm Shikra SoC#655

Open
imrashai wants to merge 14 commits into
qualcomm-linux:qcom-6.18.yfrom
imrashai:shikra-dispcc-gpucc-6.18.y
Open

clk: qcom: Add DISPCC and GPUCC support for the Qualcomm Shikra SoC#655
imrashai wants to merge 14 commits into
qualcomm-linux:qcom-6.18.yfrom
imrashai:shikra-dispcc-gpucc-6.18.y

Conversation

@imrashai
Copy link
Copy Markdown

@imrashai imrashai commented Jun 4, 2026

This series adds support for the Display clock controller (DISPCC) and GPU Clock Controller (GPUCC) on Qualcomm Shikra SoC, by reusing the respective QCM2290 SoC drivers.

In qcom-6.18.y branch, the Shikra DISPCC/GPUCC v1 series changes are merged already. But as per the upstream comments, we are re-suing the Agatti drivers, and v1 series chagnes are not applicable now. Hence, I have reverted all those v1 specific changes in a commit, and on top of that updated all the latest v4 series changes

Link to v4: https://lore.kernel.org/all/20260604-shikra-dispcc-gpucc-v4-0-8204f1029311@oss.qualcomm.com

CRs-Fixed: 4560023

Drop the older series Shikra GPUCC/DISPCC code chages.

Signed-off-by: Imran Shaik <imran.shaik@oss.qualcomm.com>
@imrashai imrashai requested review from a team, sgaud-quic, shashim-quic and yijiyang June 4, 2026 10:16
imrashai added 13 commits June 4, 2026 16:21
…from probe

Some GCC branch clocks are required to be kept always-on due to the
hardware requirements. Drop the modelling of those always-on QCM2290 GCC
clocks and use the latest .clk_cbcr convention to keep them enabled from
probe.

Change-Id: Ie9349d320d3a50ff1386b6b49a849d2f2074e2e3
Signed-off-by: Imran Shaik <imran.shaik@oss.qualcomm.com>
Link: https://lore.kernel.org/all/20260604-shikra-dispcc-gpucc-v4-1-8204f1029311@oss.qualcomm.com
…leep clocks

Update the QCM2290 DISPCC binding to document additional clock inputs
supported by the hardware, including DSI1 PHY byte/pixel clocks and
the sleep clock, alongside the existing clock list. This is an ABI
extension, and existing clock inputs ordering is unchanged.

Change-Id: I42efe00f96721a833eccc4c43d389c4b1fc6becb
Signed-off-by: Imran Shaik <imran.shaik@oss.qualcomm.com>
Link: https://lore.kernel.org/all/20260604-shikra-dispcc-gpucc-v4-2-8204f1029311@oss.qualcomm.com
… controller

The Qualcomm Shikra Display clock controller is similar to QCM2290
DISPCC hardware block. Hence, reuse the QCM2290 DISPCC bindings for
Qualcomm Shikra SoC.

Change-Id: Ia142d9c3e5af3d4e861685c4659fe03e51d04842
Signed-off-by: Imran Shaik <imran.shaik@oss.qualcomm.com>
Link: https://lore.kernel.org/all/20260604-shikra-dispcc-gpucc-v4-3-8204f1029311@oss.qualcomm.com
…troller

The Qualcomm Shikra GPU clock controller is similar to QCM2290 GPUCC
hardware block, with minor differences. Hence, reuse the QCM2290 GPUCC
bindings for Qualcomm Shikra SoC.

Change-Id: Ic20afaf9abf0d9b5773ed5e7308c4a660a0c70ae
Signed-off-by: Imran Shaik <imran.shaik@oss.qualcomm.com>
Link: https://lore.kernel.org/all/20260604-shikra-dispcc-gpucc-v4-4-8204f1029311@oss.qualcomm.com
…c_probe() model

Update the QCM2290 DISPCC driver to use the qcom_cc_probe() model by moving
the critical clocks handling and PLL configurations from probe to the
driver_data to align with the latest convention.

Change-Id: I6fb15fe6c923cf009b160414fe0edf82bd90d5aa
Signed-off-by: Imran Shaik <imran.shaik@oss.qualcomm.com>
Link: https://lore.kernel.org/all/20260604-shikra-dispcc-gpucc-v4-5-8204f1029311@oss.qualcomm.com
Update the QCM2290 DISPCC driver to use the DT index based parent clock
lookup to align with the latest convention. While at it, fix the parent
data of mdss ahb/mdp clocks to use GPLL0 main output as per HW clock plan,
and update frequency table accordingly. Also, add the DSI1 PHY PLL input
clocks support.

Change-Id: I275f3514ccfddd9a14b0143f2ef89321544dd7ed
Signed-off-by: Imran Shaik <imran.shaik@oss.qualcomm.com>
Link: https://lore.kernel.org/all/20260604-shikra-dispcc-gpucc-v4-6-8204f1029311@oss.qualcomm.com
… flags

Update the QCM2290 DISPCC GDSC wait_val fields to match the hardware
default values. Incorrect settings can cause the GDSC FSM to stuck,
leading to power on/off failures. And update GDSC flags to retain the
registers, and poll for the CFG GDSCR, and switch between HW/SW mode
dynamically as per the latest convention.

Change-Id: I53fad22ab038f2080506669b4ab06e2288b6156f
Signed-off-by: Imran Shaik <imran.shaik@oss.qualcomm.com>
Link: https://lore.kernel.org/all/20260604-shikra-dispcc-gpucc-v4-7-8204f1029311@oss.qualcomm.com
…_probe() model

Update the QCM2290 GPUCC driver to use the qcom_cc_probe() model by moving
the critical clocks handling and PLL configurations from probe to the
driver_data to align with the latest convention. While at it, drop the
modelling of gpu_cc_ahb_clk and gpu_cc_cxo_aon_clk clocks and keep them
enabled from probe as per the hardware requirements, and drop pm_clk
handling as the required GCC clocks are kept always-on from GCC probe.

Change-Id: Ia7c0e40c53c09be947c84fedb9a440ae21bd2402
Signed-off-by: Imran Shaik <imran.shaik@oss.qualcomm.com>
Link: https://lore.kernel.org/all/20260604-shikra-dispcc-gpucc-v4-8-8204f1029311@oss.qualcomm.com
…g disable

The RCG's clk src has to be parked at XO while disabling as per the
HW recommendation, hence use clk_rcg2_shared_ops to achieve the same.

Change-Id: Id33903b94da3189458ff0120342283d4f97618c8
Signed-off-by: Imran Shaik <imran.shaik@oss.qualcomm.com>
Link: https://lore.kernel.org/all/20260604-shikra-dispcc-gpucc-v4-9-8204f1029311@oss.qualcomm.com
…flags

Update the QCM2290 GPUCC GDSC wait_val fields to match the hardware default
values. Incorrect settings can cause the GDSC FSM to stuck, leading to
power on/off failures. And update the GPUCC GDSC flags to retain the
registers, and poll for the CFG GDSCR as applicable.

Change-Id: I3da22ff0372fc41f1910319e4ad9c2b38a30d8a5
Signed-off-by: Imran Shaik <imran.shaik@oss.qualcomm.com>
Link: https://lore.kernel.org/all/20260604-shikra-dispcc-gpucc-v4-10-8204f1029311@oss.qualcomm.com
… Shikra

The Qualcomm Shikra GPU clock controller is similar to QCM2290 GPUCC
hardware block, with minor differences. Hence add support for Shikra
GPUCC by extending the QCM2290 GPUCC driver.

Change-Id: Ife9c448f17e631b8c8b9e971c03bb3972412920c
Signed-off-by: Imran Shaik <imran.shaik@oss.qualcomm.com>
Link: https://lore.kernel.org/all/20260604-shikra-dispcc-gpucc-v4-11-8204f1029311@oss.qualcomm.com
…DISPCC node

Update the DISPCC node on QCM2290 (Agatti) to align with the latest DT
bindings changes, which adds support for the DSI1 PHY and sleep clocks.

Change-Id: I7d5ce132ef3d6bd9181717aab7722895edc46dce
Signed-off-by: Imran Shaik <imran.shaik@oss.qualcomm.com>
Link: https://lore.kernel.org/all/20260604-shikra-dispcc-gpucc-v4-12-8204f1029311@oss.qualcomm.com
Add support for Display clock controller and GPU clock controller nodes
on Qualcomm Shikra SoCs.

Change-Id: I807a79eb01eb152136e178b7b35ed2d5fa9edc8f
Signed-off-by: Imran Shaik <imran.shaik@oss.qualcomm.com>
Link: https://lore.kernel.org/all/20260604-shikra-dispcc-gpucc-v4-13-8204f1029311@oss.qualcomm.com
@imrashai imrashai force-pushed the shikra-dispcc-gpucc-6.18.y branch from 9ccf738 to 49fc600 Compare June 4, 2026 11:00
@qlijarvis
Copy link
Copy Markdown

PR #655 — validate-patch

PR: #655

Verdict Issues Detailed Report
⚠️ 3 Full report

Final Summary

  1. Lore link present: Yes — all FROMLIST commits (02-14) include Link: tag pointing to the v4 series cover letter
  2. Lore link matches PR commits: Cannot verify — the lore URL contains a future date (2026-06-04) and returns no content when fetched
  3. Upstream patch status: Cannot determine — lore link is inaccessible; likely these are patches prepared for upstream submission but not yet posted
  4. PR present in qcom-next: Not checked — validation scope limited to commit message and diff correctness per instructions

Additional observations:

  • Commit 01/14 is vendor-only (no FROMLIST prefix, no lore link) — this is a revert of previous Shikra DISPCC/GPUCC code
  • Commits 02-14 are marked FROMLIST and include proper Link: tags
  • The series appears to be a v4 revision that refactors QCM2290 clock drivers and adds Shikra support by extending the QCM2290 drivers
  • All commits follow kernel commit message format with proper Signed-off-by tags
  • The lore link format suggests this may be a pre-submission or internal review version with a placeholder/future date
Verdict: ⚠️ — click to expand

🔍 Patch Validation

PR: #655 - Shikra DISPCC/GPUCC v4 series
Upstream commit: https://lore.kernel.org/all/20260604-shikra-dispcc-gpucc-v4-0-8204f1029311@oss.qualcomm.com
Verdict: ⚠️ PARTIAL

Commit Message

Check Status Note
Subject matches upstream ⚠️ Patch 01/14 has typo: "chages" should be "changes"
Body preserves rationale FROMLIST commits preserve upstream rationale
Fixes tag present/correct N/A No Fixes tags required for new feature additions
Authorship preserved All commits authored by Imran Shaik
Backport note (if applicable) N/A FROMLIST patches, not backports

Diff

File Status Notes
Patch 01/14 (vendor revert) Vendor-only cleanup, no upstream equivalent
Patches 02-14 (FROMLIST) ⚠️ Cannot verify - lore link points to future date (2026-06-04)

Issues

  • Commit 01/14 typo: Subject line contains "chages" instead of "changes"
  • Lore link validation blocked: All FROMLIST commits reference https://lore.kernel.org/all/20260604-shikra-dispcc-gpucc-v4-0-8204f1029311@oss.qualcomm.com which has a future date (2026-06-04) and cannot be fetched for validation
  • Link format: The lore link points to the cover letter (v4-0) rather than individual patch messages, making per-patch validation difficult

Verdict

Merge with minor fix recommended: Fix the typo in commit 01/14 before merging. The FROMLIST commits appear structurally correct but cannot be validated against upstream due to the future-dated lore link.

Final Summary

  1. Lore link present: Yes — all FROMLIST commits (02-14) include Link: tag pointing to the v4 series cover letter
  2. Lore link matches PR commits: Cannot verify — the lore URL contains a future date (2026-06-04) and returns no content when fetched
  3. Upstream patch status: Cannot determine — lore link is inaccessible; likely these are patches prepared for upstream submission but not yet posted
  4. PR present in qcom-next: Not checked — validation scope limited to commit message and diff correctness per instructions

Additional observations:

  • Commit 01/14 is vendor-only (no FROMLIST prefix, no lore link) — this is a revert of previous Shikra DISPCC/GPUCC code
  • Commits 02-14 are marked FROMLIST and include proper Link: tags
  • The series appears to be a v4 revision that refactors QCM2290 clock drivers and adds Shikra support by extending the QCM2290 drivers
  • All commits follow kernel commit message format with proper Signed-off-by tags
  • The lore link format suggests this may be a pre-submission or internal review version with a placeholder/future date

@qlijarvis
Copy link
Copy Markdown

PR #655 — checker-log-analyzer

PR: #655
Checker run: https://github.com/qualcomm-linux/kernel-config/actions/runs/26945709225

Checker Result Summary
Checker Result Summary
checkpatch 7 warnings on commit 1/14 (DT binding separation)
dt-binding-check All binding checks passed
dtb-check All DTB validation passed
sparse-check Sparse analysis passed
check-uapi-headers No UAPI header changes detected
check-patch-compliance 2 blockers: missing prefix on commit 1, invalid lore links
tag-check N/A Not applicable for qcom-6.18.y
qcom-next-check ⏭️ FROMLIST commits present (expected)

Detailed report: Full report

Checker analysis — click to expand

🤖 CI Checker Analysis (checker-log-analyzer)

PR: #655 - clk: qcom: Shikra DISPCC/GPUCC v4 series
Source: https://github.com/qualcomm-linux/kernel-config/actions/runs/26945709225

Checker Result Summary
checkpatch 7 warnings on commit 1/14 (DT binding separation)
dt-binding-check All binding checks passed
dtb-check All DTB validation passed
sparse-check Sparse analysis passed
check-uapi-headers No UAPI header changes detected
check-patch-compliance 2 blockers: missing prefix on commit 1, invalid lore links
tag-check N/A Not applicable for qcom-6.18.y
qcom-next-check ⏭️ FROMLIST commits present (expected)

❌ checkpatch

Root cause: Commit 88b2209 ("clk: qcom: Revert older series Shikra GPUCC/DISPCC changes") mixes DT binding deletions with driver code deletions in a single commit.

Failure details:

WARNING: DT binding docs and includes should be a separate patch. See: Documentation/devicetree/bindings/submitting-patches.rst

88b220943a26a64d50f3693b7a2eaa33dc148af6 total: 0 errors, 7 warnings, 0 checks, 63 lines checked

The commit deletes:

  • Documentation/devicetree/bindings/clock/qcom,shikra-dispcc.yaml
  • include/dt-bindings/clock/qcom,shikra-dispcc.h
  • include/dt-bindings/clock/qcom,shikra-gpucc.h
  • drivers/clk/qcom/dispcc-shikra.c
  • drivers/clk/qcom/gpucc-shikra.c

Fix: Split commit 1 into two commits:

  1. First commit: Delete DT bindings and includes only
  2. Second commit: Delete driver code and Kconfig/Makefile changes

Reproduce locally:

./scripts/checkpatch.pl --strict --git 32eb26c08e43..88b220943a26

❌ check-patch-compliance

Root cause: Two distinct issues prevent compliance:

  1. Commit 1 missing required prefix: The first commit "clk: qcom: Revert older series Shikra GPUCC/DISPCC changes" does not start with a required prefix (UPSTREAM:, FROMLIST:, BACKPORT:, or FROMGIT:).

  2. Invalid lore.kernel.org links: All 13 FROMLIST commits (2-14) reference the same cover letter URL instead of individual patch URLs:

    Link: https://lore.kernel.org/all/20260604-shikra-dispcc-gpucc-v4-0-8204f1029311@oss.qualcomm.com
    

    The checker validates each link with b4 am --single-message and reports: "Something seems wrong with the provided link."

Failure details:

Checking commit: clk: qcom: Revert older series Shikra GPUCC/DISPCC changes
Commit summary does not start with a required prefix

Checking commit: FROMLIST: clk: qcom: gcc-qcm2290: Keep the critical clocks always-on from probe
Something seems wrong with the provided link. Please verify it
Try below command to run locally-
b4 am --single-message -C -l -3 https://lore.kernel.org/all/20260604-shikra-dispcc-gpucc-v4-0-8204f1029311@oss.qualcomm.com

Fix:

For commit 1: Add a prefix to indicate this is a local revert/cleanup:

REVERT: clk: qcom: Revert older series Shikra GPUCC/DISPCC changes

or

CLEANUP: clk: qcom: Revert older series Shikra GPUCC/DISPCC changes

For commits 2-14: Replace the cover letter link with individual patch message-IDs. Each commit should have its own Link: tag pointing to the specific patch in the series:

Link: https://lore.kernel.org/all/20260604-shikra-dispcc-gpucc-v4-1-8204f1029311@oss.qualcomm.com  # for patch 1/13
Link: https://lore.kernel.org/all/20260604-shikra-dispcc-gpucc-v4-2-8204f1029311@oss.qualcomm.com  # for patch 2/13
...
Link: https://lore.kernel.org/all/20260604-shikra-dispcc-gpucc-v4-13-8204f1029311@oss.qualcomm.com # for patch 13/13

The pattern is: change -v4-0- (cover letter) to -v4-N- where N is the patch number (1-13).

Reproduce locally:

cd kernel
bash ../kernel-checkers/check-patch-compliance.sh --kernel-src . \
  --base 32eb26c08e43 --head 9ccf73887096

Verdict

2 blockers must be fixed before merge:

  1. checkpatch: Split commit 1 to separate DT bindings from driver code
  2. check-patch-compliance:
    • Add required prefix to commit 1 (REVERT: or CLEANUP:)
    • Fix Link: tags in commits 2-14 to point to individual patches, not the cover letter

All other checkers (dt-binding-check, dtb-check, sparse-check, check-uapi-headers) passed successfully.

@qcomlnxci
Copy link
Copy Markdown

Test Matrix

Test Case lemans-evk monaco-evk qcs615-ride qcs6490-rb3gen2 qcs8300-ride qcs9100-ride-r3 x1e80100-crd
BT_FW_KMD_Service ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
BT_ON_OFF ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
BT_SCAN ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
CPUFreq_Validation ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
CPU_affinity ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
DSP_AudioPD ✅ Pass ✅ Pass ⚠️ skip ✅ Pass ✅ Pass ⚠️ skip ◻️
Ethernet ⚠️ skip ✅ Pass ⚠️ skip ⚠️ skip ⚠️ skip ⚠️ skip ◻️
Freq_Scaling ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
GIC ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
IPA ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
Interrupts ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
OpenCV ✅ Pass ⚠️ skip ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
PCIe ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
Probe_Failure_Check ❌ Fail ❌ Fail ❌ Fail ❌ Fail ❌ Fail ❌ Fail ◻️
RMNET ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
UFS_Validation ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
USBHost ❌ Fail ✅ Pass ❌ Fail ❌ Fail ❌ Fail ❌ Fail ◻️
WiFi_Firmware_Driver ❌ Fail ⚠️ skip ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
WiFi_OnOff ✅ Pass ❌ Fail ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
adsp_remoteproc ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ❌ Fail ◻️
cdsp_remoteproc ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ❌ Fail ◻️
gpdsp_remoteproc ✅ Pass ✅ Pass ⚠️ skip ⚠️ skip ✅ Pass ❌ Fail ◻️
hotplug ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
irq ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
kaslr ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
pinctrl ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
qcom_hwrng ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
remoteproc ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ❌ Fail ◻️
rngtest ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
shmbridge ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
smmu ❌ Fail ✅ Pass ❌ Fail ✅ Pass ✅ Pass ❌ Fail ◻️
watchdog ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
wpss_remoteproc ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️

shivrawa pushed a commit to shivrawa/kernel that referenced this pull request Jun 5, 2026
commit ce0123c upstream.

During async unlink, we drop the `i_nlink` counter before we receive
the completion (that will eventually update the `i_nlink`) because "we
assume that the unlink will succeed".  That is not a bad idea, but it
races against deletions by other clients (or against the completion of
our own unlink) and can lead to an underrun which emits a WARNING like
this one:

 WARNING: CPU: 85 PID: 25093 at fs/inode.c:407 drop_nlink+0x50/0x68
 Modules linked in:
 CPU: 85 UID: 3221252029 PID: 25093 Comm: php-cgi8.1 Not tainted 6.14.11-cm4all1-ampere qualcomm-linux#655
 Hardware name: Supermicro ARS-110M-NR/R12SPD-A, BIOS 1.1b 10/17/2023
 pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : drop_nlink+0x50/0x68
 lr : ceph_unlink+0x6c4/0x720
 sp : ffff80012173bc90
 x29: ffff80012173bc90 x28: ffff086d0a45aaf8 x27: ffff0871d0eb5680
 x26: ffff087f2a64a718 x25: 0000020000000180 x24: 0000000061c88647
 x23: 0000000000000002 x22: ffff07ff9236d800 x21: 0000000000001203
 x20: ffff07ff9237b000 x19: ffff088b8296afc0 x18: 00000000f3c93365
 x17: 0000000000070000 x16: ffff08faffcbdfe8 x15: ffff08faffcbdfec
 x14: 0000000000000000 x13: 45445f65645f3037 x12: 34385f6369706f74
 x11: 0000a2653104bb20 x10: ffffd85f26d73290 x9 : ffffd85f25664f94
 x8 : 00000000000000c0 x7 : 0000000000000000 x6 : 0000000000000002
 x5 : 0000000000000081 x4 : 0000000000000481 x3 : 0000000000000000
 x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff08727d3f91e8
 Call trace:
  drop_nlink+0x50/0x68 (P)
  vfs_unlink+0xb0/0x2e8
  do_unlinkat+0x204/0x288
  __arm64_sys_unlinkat+0x3c/0x80
  invoke_syscall.constprop.0+0x54/0xe8
  do_el0_svc+0xa4/0xc8
  el0_svc+0x18/0x58
  el0t_64_sync_handler+0x104/0x130
  el0t_64_sync+0x154/0x158

In ceph_unlink(), a call to ceph_mdsc_submit_request() submits the
CEPH_MDS_OP_UNLINK to the MDS, but does not wait for completion.

Meanwhile, between this call and the following drop_nlink() call, a
worker thread may process a CEPH_CAP_OP_IMPORT, CEPH_CAP_OP_GRANT or
just a CEPH_MSG_CLIENT_REPLY (the latter of which could be our own
completion).  These will lead to a set_nlink() call, updating the
`i_nlink` counter to the value received from the MDS.  If that new
`i_nlink` value happens to be zero, it is illegal to decrement it
further.  But that is exactly what ceph_unlink() will do then.

The WARNING can be reproduced this way:

1. Force async unlink; only the async code path is affected.  Having
   no real clue about Ceph internals, I was unable to find out why the
   MDS wouldn't give me the "Fxr" capabilities, so I patched
   get_caps_for_async_unlink() to always succeed.

   (Note that the WARNING dump above was found on an unpatched kernel,
   without this kludge - this is not a theoretical bug.)

2. Add a sleep call after ceph_mdsc_submit_request() so the unlink
   completion gets handled by a worker thread before drop_nlink() is
   called.  This guarantees that the `i_nlink` is already zero before
   drop_nlink() runs.

The solution is to skip the counter decrement when it is already zero,
but doing so without a lock is still racy (TOCTOU).  Since
ceph_fill_inode() and handle_cap_grant() both hold the
`ceph_inode_info.i_ceph_lock` spinlock while set_nlink() runs, this
seems like the proper lock to protect the `i_nlink` updates.

I found prior art in NFS and SMB (using `inode.i_lock`) and AFS (using
`afs_vnode.cb_lock`).  All three have the zero check as well.

Cc: stable@vger.kernel.org
Fixes: 2ccb454 ("ceph: perform asynchronous unlink if we have sufficient caps")
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
@qswat-orbit-external
Copy link
Copy Markdown

Merge Check Failed: CR Not Eligible for Merge

CR 4560023 is not eligible for merge.

The parent software image for kernel.qli.2.0 is not development complete.

Entity: kernel.qli.2.0
CR: 4560023
Reason: CR_CANNOT_MERGE

Please ensure the CR passes both CCT (ComponentChangeTasks) and ICT (Integration Change Tasks) validations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants