Skip to content

Compensated summation for cpu_statevec_anyCtrlAnyTargDenseMatr_sub (#598) - [unitaryHACK]#791

Open
mk0dz wants to merge 1 commit into
QuEST-Kit:develfrom
mk0dz:feat/598-compensated-densematr-sum
Open

Compensated summation for cpu_statevec_anyCtrlAnyTargDenseMatr_sub (#598) - [unitaryHACK]#791
mk0dz wants to merge 1 commit into
QuEST-Kit:develfrom
mk0dz:feat/598-compensated-densematr-sum

Conversation

@mk0dz

@mk0dz mk0dz commented Jun 12, 2026

Copy link
Copy Markdown

Summary

Adds an optional compensated-summation path to improve numerical accuracy in CPU dense-matrix target evolution.

The implementation introduces a thinner reduction in cpu_statevec_anyCtrlAnyTargDenseMat_sub(), controlled by the compile-time flag:

-DQUEST_COMPENSATED_DENSEMAT_SUM=ON

The feature is disabled by default to preserve existing performance characteristics.

Implementation Notes

  • Accumulates the Kahan compensation term into a local cpu_qcomp variable and writes it back once after the reduction.
  • Applies only to the CPU dense-matrix path.
  • This source file is not built with fast-math, so the compensation term is preserved by the compiler.
  • Replaces the previous TODO comment with documentation describing the behaviour and trade-offs.

Files Changed

  • quest/src/cpu/cpu_subroutines.cpp
  • quest/src/cpu/CMakeLists.txt

Accuracy & Performance

Benchmarked against a __float128 reference using the naive vs. compensated Kahan implementation across all three floating-point precisions.

Precision Accuracy @ 12 Targets Runtime Cost
fp32 ~58× lower error ~3.3×
fp64 ~29× lower error ~2.2× (~1.76× for 10 targets on a 24-qubit state)
fp80 ~25× lower error ~2.1×

Observations

  • Accuracy improvements grow with target count.
  • Benefits become negligible below roughly 7 targets.
  • Runtime overhead remains greater than 1× because the kernel is compute-bound on matrix multiplication rather than memory-bandwidth limited.
  • The Kahan recurrence prevents efficient vectorization.
accuracy bandwidth_regime runtime tradeoff

Validation

All existing dense CompMatr tests pass for both compensated and uncompensated builds, including:

  • applyCompMatr
  • Controlled variants

Test results:

  • 93,791 assertions
  • 4 test cases

Rationale for Default-Off

The improvement is primarily beneficial for large or ill-conditioned matrices, while introducing a consistent ~2–3× runtime overhead.

Keeping the feature disabled by default preserves current performance expectations while allowing users to opt in when higher numerical accuracy is required.

Future Work

A possible alternative is pairwise summation, which offers:

  • Error scaling of approximately O(log N)
  • Better vectorization opportunities
  • Potentially lower runtime overhead

This can be evaluated separately in a future PR.

I also acknowledge claude for guiding me thorough it.

/claim #598

Closes #598.

…uEST-Kit#598)

The dense-matrix subroutine's inner reduction is liable to catastrophic
cancellation for many target qubits. This adds an opt-in compensated path
(-DQUEST_COMPENSATE_DENSEMATR_SUM=ON); base_qcomp's operators are plain IEEE
arithmetic so the compensation is honoured directly.

Single-CPU benchmarks (fp32/fp64/fp80): relative error improves ~25-58x at 12
targets, the benefit growing with target count; runtime cost is ~2-3.3x
(compute-bound) falling to ~1.8x in the large-statevector regime. Left opt-in
(off by default) per that trade-off.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant