uv run reframe -J=--account=csstaff -J=--reservation=uss140-shs131-nv590-staging -J=--gpus-per-node=4 -C /capstor/store/cscs/cscs/public/reframe/reframe-stable/starlex/cscs-reframe-tests.git/config/cscs.py --mode=maintenance --report-junit=report.xml --run --name CPUNodeBurnStreamCE --name CudaNodeBurnStreamCE --name DcgmRpmCheck --name PyTorchDdpCeNv
[==========] Running 6 check(s)
[==========] Started on Mon Jun 29 10:35:01 2026+0200
[----------] start processing checks
[ RUN ] DcgmRpmCheck /989abc80 @starlex:normal+builtin
[ RUN ] CudaNodeBurnStreamCE /af7164be @starlex:normal+builtin
[ RUN ] CPUNodeBurnStreamCE /4872bcfd @starlex:normal+builtin
[ RUN ] PyTorchDdpCeNv %num_nodes=1 %aws_ofi_nccl=True %image=nvcr.io#nvidia/pytorch:25.06-py3 /d1772459 @starlex:normal+builtin
[ RUN ] PyTorchDdpCeNvlarge %num_nodes=3 %aws_ofi_nccl=True %image=nvcr.io#nvidia/pytorch:25.06-py3 /367e5166 @starlex:normal+builtin
[ RUN ] PyTorchDdpCeNvlarge %num_nodes=8 %aws_ofi_nccl=True %image=nvcr.io#nvidia/pytorch:25.06-py3 /d5dbe538 @starlex:normal+builtin
[ PASSED ] Ran 0/6 test case(s) from 6 check(s) (0 failure(s), 0 expected failure(s), 0 skipped, 0 aborted)
[==========] Finished on Mon Jun 29 10:35:08 2026+0200
ERROR: run session stopped: key error: <reframe.core.schedulers.slurm._SlurmJob object at 0xffffae3ba850>
ERROR: Traceback (most recent call last):
File "/users/antonk/reframe/reframe/frontend/cli.py", line 1780, in main
runner.runall(testcases, restored_cases)
~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/users/antonk/reframe/reframe/core/logging.py", line 1138, in _fn
return fn(*args, **kwargs)
File "/users/antonk/reframe/reframe/frontend/executors/__init__.py", line 731, in runall
self._runall(testcases)
~~~~~~~~~~~~^^^^^^^^^^^
File "/users/antonk/reframe/reframe/frontend/executors/__init__.py", line 824, in _runall
self._policy.exit()
~~~~~~~~~~~~~~~~~^^
File "/users/antonk/reframe/reframe/frontend/executors/policies.py", line 430, in exit
self._poll_tasks()
~~~~~~~~~~~~~~~~^^
File "/users/antonk/reframe/reframe/frontend/executors/policies.py", line 480, in _poll_tasks
sched.poll(*jobs)
~~~~~~~~~~^^^^^^^
File "/users/antonk/reframe/reframe/core/schedulers/slurm.py", line 561, in poll
self._cancel_if_blocked(jobs)
~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
File "/users/antonk/reframe/reframe/core/schedulers/slurm.py", line 606, in _cancel_if_blocked
pending_reasons[pending_job].setdefault([])
~~~~~~~~~~~~~~~^^^^^^^^^^^^^
KeyError: <reframe.core.schedulers.slurm._SlurmJob object at 0xffffae3ba850>
Log file(s) saved in '/users/antonk/reframe/reframe.log', '/users/antonk/reframe/reframe.out'
Reported by @toxa81 :
I think the bug was introduced in #3690 because we don't initialize the
pending_reasons[pending_job]correctly for theSlurmJobScheduler. It is initialized fine forSqueueJobScheduler. Will make a PR to fix it