Compensated summation for cpu_statevec_anyCtrlAnyTargDenseMatr_sub (#598) - [unitaryHACK] by mk0dz · Pull Request #791 · QuEST-Kit/QuEST

mk0dz · 2026-06-12T22:39:54Z

Summary

Adds an optional compensated-summation path to improve numerical accuracy in CPU dense-matrix target evolution.

The implementation introduces a thinner reduction in cpu_statevec_anyCtrlAnyTargDenseMat_sub(), controlled by the compile-time flag:

-DQUEST_COMPENSATED_DENSEMAT_SUM=ON

The feature is disabled by default to preserve existing performance characteristics.

Implementation Notes

Accumulates the Kahan compensation term into a local cpu_qcomp variable and writes it back once after the reduction.
Applies only to the CPU dense-matrix path.
This source file is not built with fast-math, so the compensation term is preserved by the compiler.
Replaces the previous TODO comment with documentation describing the behaviour and trade-offs.

Files Changed

quest/src/cpu/cpu_subroutines.cpp
quest/src/cpu/CMakeLists.txt

Accuracy & Performance

Benchmarked against a __float128 reference using the naive vs. compensated Kahan implementation across all three floating-point precisions.

Precision	Accuracy @ 12 Targets	Runtime Cost
fp32	~58× lower error	~3.3×
fp64	~29× lower error	~2.2× (~1.76× for 10 targets on a 24-qubit state)
fp80	~25× lower error	~2.1×

Observations

Accuracy improvements grow with target count.
Benefits become negligible below roughly 7 targets.
Runtime overhead remains greater than 1× because the kernel is compute-bound on matrix multiplication rather than memory-bandwidth limited.
The Kahan recurrence prevents efficient vectorization.

Validation

All existing dense CompMatr tests pass for both compensated and uncompensated builds, including:

applyCompMatr
Controlled variants

Test results:

93,791 assertions
4 test cases

Rationale for Default-Off

The improvement is primarily beneficial for large or ill-conditioned matrices, while introducing a consistent ~2–3× runtime overhead.

Keeping the feature disabled by default preserves current performance expectations while allowing users to opt in when higher numerical accuracy is required.

Future Work

A possible alternative is pairwise summation, which offers:

Error scaling of approximately O(log N)
Better vectorization opportunities
Potentially lower runtime overhead

This can be evaluated separately in a future PR.

I also acknowledge claude for guiding me thorough it.

/claim #598

Closes #598.

…uEST-Kit#598) The dense-matrix subroutine's inner reduction is liable to catastrophic cancellation for many target qubits. This adds an opt-in compensated path (-DQUEST_COMPENSATE_DENSEMATR_SUM=ON); base_qcomp's operators are plain IEEE arithmetic so the compensation is honoured directly. Single-CPU benchmarks (fp32/fp64/fp80): relative error improves ~25-58x at 12 targets, the benefit growing with target count; runtime cost is ~2-3.3x (compute-bound) falling to ~1.8x in the large-statevector regime. Left opt-in (off by default) per that trade-off.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compensated summation for cpu_statevec_anyCtrlAnyTargDenseMatr_sub (#598) - [unitaryHACK]#791

Compensated summation for cpu_statevec_anyCtrlAnyTargDenseMatr_sub (#598) - [unitaryHACK]#791
mk0dz wants to merge 1 commit into
QuEST-Kit:develfrom
mk0dz:feat/598-compensated-densematr-sum

mk0dz commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mk0dz commented Jun 12, 2026

Summary

Implementation Notes

Files Changed

Accuracy & Performance

Observations

Validation

Rationale for Default-Off

Future Work

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant