Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
93 commits
Select commit Hold shift + click to select a range
a7c8d31
Add sparse.linalg iterative solvers
abagusetty Apr 2, 2026
e3e9052
Add sparse.linalg iterative solvers
abagusetty Apr 2, 2026
c0e9fb5
Add sparse.linalg iterative solvers
abagusetty Apr 2, 2026
943c52f
Add sparse.linalg iterative solvers
abagusetty Apr 2, 2026
1191f7e
Add sparse.linalg iterative solvers
abagusetty Apr 2, 2026
2384185
Fix deprecated tol kwarg in SciPy host fallback
abagusetty Apr 3, 2026
4720291
sparse/linalg: complete LinearOperator algebra, CG/GMRES/MINRES with …
abagusetty Apr 3, 2026
58cc44b
Add tests for scipy.sparse.linalg: LinearOperator, cg, gmres, minres
abagusetty Apr 6, 2026
3a50062
Fix implicit numpy conversion; use .asnumpy() for dpnp arrays
abagusetty Apr 6, 2026
62cf7a4
tests: add comprehensive sparse linalg tests for LinearOperator, cg, …
abagusetty Apr 6, 2026
d924816
tests: rewrite sparse linalg tests to match dpnp test_linalg.py style
abagusetty Apr 6, 2026
f295bc1
Fix dtype inference: use int8 trial vector so matvec preserves operat…
abagusetty Apr 6, 2026
8c68d98
add onemkl sparse gemv pybind logic
abagusetty Apr 6, 2026
0c4a888
sparse: add pybind11 module, CMakeLists, and hook _sparse_impl into _…
abagusetty Apr 6, 2026
4993120
Remove redundant sparse_gemv.hpp passthrough header
abagusetty Apr 6, 2026
a7ddc1c
sparse: gemv.cpp includes gemv.hpp directly
abagusetty Apr 6, 2026
14cb5c4
sparse: capture exec_q from CSR data at closure construction
abagusetty Apr 6, 2026
890238c
sparse/gemv: add missing headers, input validation, and MKL/SYCL exce…
abagusetty Apr 6, 2026
838dfd8
sparse/gemv: replace explicit if/else type dispatch with 2-D dispatch…
abagusetty Apr 6, 2026
7bc86c9
sparse/gemv.hpp: rename init_sparse_gemv_dispatch_vector -> init_spar…
abagusetty Apr 7, 2026
6136da2
sparse/gemv: fix deprecated set_csr_data and unused nnz warning
abagusetty Apr 7, 2026
ed58333
minor cleanup for sparse extensions
abagusetty Apr 7, 2026
0a32b57
Fix SyntaxError: remove stray backslash in aslinearoperator hasattr s…
abagusetty Apr 7, 2026
6910332
Fix tests: replace numpy.asarray(dpnp_arr) with dpnp_arr.asnumpy()
abagusetty Apr 7, 2026
2a4566f
Fix test failures: dtype guards and preconditioner/callback_type vali…
abagusetty Apr 7, 2026
4292518
sparse/linalg: pure-GPU CG/GMRES/MINRES, drop all CPU fallback paths,…
abagusetty Apr 7, 2026
125dab5
Fix dtype.char AttributeError on dpnp dtype objects in CG/GMRES/MINRES
abagusetty Apr 7, 2026
2d753cf
Fix M guard, MINRES betacheck decay and gbar init in Paige-Saunders r…
abagusetty Apr 7, 2026
cb2a5b8
fix: correct 6 runtime bugs in sparse linalg iterative solvers
abagusetty Apr 7, 2026
b70ecfd
Fix MINRES Paige-Saunders QR recurrence (fixes TestSolverConsistency …
abagusetty Apr 7, 2026
969b1e9
Fix MINRES stagnation false-positive on float32: reorder convergence/…
abagusetty Apr 7, 2026
18bd2c3
fix: 3 bugs in _iterative.py (asnumpy, GMRES V alloc, MINRES atol)
abagusetty Apr 7, 2026
cd4907a
Fix deprecated tol kwarg in SciPy host fallback (cg, gmres use rtol=)
abagusetty Apr 7, 2026
c6d109d
sparse/linalg: fix cg/gmres/minres -- rtol alias, M support, dead SPD…
abagusetty Apr 7, 2026
ea4989b
sparse/linalg: fix SpMV handle lifecycle, complex dtypes, tol→rtol, M…
abagusetty Apr 7, 2026
c223ce2
update WIP
abagusetty Apr 7, 2026
e1a41b3
minor fixes
abagusetty Apr 8, 2026
4442530
Add testing
abagusetty Apr 9, 2026
ac3bed5
black formatting
abagusetty Apr 9, 2026
3e47f4a
Merge branch 'IntelPython:master' into feature-sparse-linalg-solvers
abagusetty Apr 9, 2026
a4ee24f
Update CHANGELOG.md
abagusetty Apr 9, 2026
7e81346
Merge branch 'feature-sparse-linalg-solvers' of https://github.com/ab…
abagusetty Apr 9, 2026
c330a04
remove stale testing
abagusetty Apr 9, 2026
0badee4
Add the missing onemkl-sycl-sparse dep to conda-recipe
abagusetty Apr 9, 2026
4be3d9a
fix pre-commit issues
abagusetty Apr 9, 2026
651aaeb
Try again with formatting fix
abagusetty Apr 13, 2026
eecf9d6
Merge branch 'master' into feature-sparse-linalg-solvers
abagusetty Apr 13, 2026
555e547
Merge remote-tracking branch 'origin/master' into feature-sparse-lina…
abagusetty Apr 22, 2026
8d0149e
Remove __bool__ fallback from _is_boolean in _slicing.pxi
vlad-perevezentsev Apr 24, 2026
672a45c
Support range/list as advanced index keys in dpnp_array
vlad-perevezentsev Apr 24, 2026
7ef44b7
Add tests for range/list advanced indexing
vlad-perevezentsev Apr 24, 2026
5f5b0b1
Update changelog
vlad-perevezentsev Apr 24, 2026
830ef88
Handle list/empty list advanced indexing correctly
vlad-perevezentsev Apr 24, 2026
18bb358
Apply remark
vlad-perevezentsev Apr 28, 2026
e64845c
Merge master into fix_inplace_indexind_4d
vlad-perevezentsev Apr 28, 2026
87388b4
Merge master into fix_inplace_indexind_4d
vlad-perevezentsev Apr 29, 2026
9605678
Merge remote-tracking branch 'upstream/fix_inplace_indexind_4d' into …
abagusetty Apr 29, 2026
f66f7eb
Merge branch 'master' into feature-sparse-linalg-solvers
abagusetty May 6, 2026
d04ca68
Merge branch 'feature-sparse-linalg-solvers' of https://github.com/ab…
abagusetty May 6, 2026
3ab930c
fix clang-formagt
abagusetty May 6, 2026
181eb59
fix the cyclic importing issue
abagusetty May 7, 2026
f38ae7b
Merge branch 'master' into feature-sparse-linalg-solvers
abagusetty May 8, 2026
a524637
One more try to fix the git actions
abagusetty May 9, 2026
83f6a3d
Merge branch 'feature-sparse-linalg-solvers' of https://github.com/ab…
abagusetty May 9, 2026
a375760
Fix pylint failures
abagusetty May 20, 2026
36042c6
Merge branch 'master' into feature-sparse-linalg-solvers
abagusetty May 20, 2026
902eeff
Update dpnp/backend/extensions/sparse/CMakeLists.txt
abagusetty May 26, 2026
2b2ac0d
Update dpnp/backend/extensions/sparse/gemv.cpp
abagusetty May 26, 2026
04ad7be
Update dpnp/backend/extensions/sparse/gemv.cpp
abagusetty May 26, 2026
d77e516
Update dpnp/backend/extensions/sparse/types_matrix.hpp
abagusetty May 26, 2026
d9734d8
Update dpnp/backend/extensions/sparse/gemv.cpp
abagusetty May 26, 2026
8cb5656
Merge branch 'IntelPython:master' into feature-sparse-linalg-solvers
abagusetty May 26, 2026
efe896c
Merge branch 'IntelPython:master' into feature-sparse-linalg-solvers
abagusetty May 27, 2026
c3aae1d
add left out files
abagusetty May 27, 2026
ec72540
Merge branch 'feature-sparse-linalg-solvers' of https://github.com/ab…
abagusetty May 27, 2026
2c12c20
sparse: route csr_matrix.dot through oneMKL sparse::gemv
May 27, 2026
deea67b
sparse/linalg: fix dtype inference, info contract, implicit-sync bugs…
May 27, 2026
50e9df4
sparse/linalg: fuse GMRES Arnoldi via alpha/beta gemv; one-sync CG; h…
May 27, 2026
b395046
sparse/linalg: opt LinearOperator out of NumPy ufunc dispatch; harden…
May 27, 2026
70ab579
test: dtype-aware rtol in gmres convergence tests
May 27, 2026
fb11515
gmres: use oneMKL conjugate-transpose directly; fix complex Arnoldi m…
May 27, 2026
e0cb7de
_interface: enforce dpnp's no-implicit-host-transfer rule
May 28, 2026
16f7450
_interface: drop module-level numpy import; use .conjugate() and .dty…
May 28, 2026
f910092
_interface: drop redundant numpy mentions from comments
May 28, 2026
cfabd12
_iterative: use math.* for host scalar arithmetic; numpy stays for fi…
May 29, 2026
6b2b645
_iterative: drop dead stores and xtype alias
May 29, 2026
7f49318
sparse: address pylint findings from pre-commit
May 29, 2026
70e3c30
sparse: apply isort fixes
May 29, 2026
ec1797a
Fix formatting
abagusetty May 29, 2026
7b05eef
test_linalg: align style with sibling cupyx tests
May 29, 2026
2b43800
Merge branch 'feature-sparse-linalg-solvers' of https://github.com/ab…
May 29, 2026
0182d6a
Merge branch 'master' into feature-sparse-linalg-solvers
abagusetty May 29, 2026
04e4876
Move msg from v20 to v21
May 29, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ This release is compatible with NumPy 2.4.5.
* Added C API functions for `dpnp.tensor.usm_ndarray` setters and getters to avoid ABI breakage if `dpnp.tensor.usm_ndarray` is modified [gh-2866](https://github.com/IntelPython/dpnp/pull/2866)
* Added support for buffer protocol objects as advanced index keys in `dpnp.ndarray` [#2889](https://github.com/IntelPython/dpnp/pull/2889)
* Added `--includes` and `--include-dir` options to the `dpnp` CLI [#2916](https://github.com/IntelPython/dpnp/pull/2916)
* Added implementation of `dpnp.scipy.sparse.linalg import LinearOperator, cg, gmres, minres` [#2841](https://github.com/IntelPython/dpnp/pull/2841)

### Changed

Expand Down
1 change: 1 addition & 0 deletions conda-recipe/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ requirements:
- {{ pin_compatible('onemkl-sycl-lapack', min_pin='x.x', max_pin='x') }}
- {{ pin_compatible('onemkl-sycl-rng', min_pin='x.x', max_pin='x') }}
- {{ pin_compatible('onemkl-sycl-vm', min_pin='x.x', max_pin='x') }}
- {{ pin_compatible('onemkl-sycl-sparse', min_pin='x.x', max_pin='x') }}
- numpy
- intel-gpu-ocl-icd-system

Expand Down
1 change: 1 addition & 0 deletions dpnp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,7 @@ add_subdirectory(backend/extensions/statistics)
add_subdirectory(backend/extensions/ufunc)
add_subdirectory(backend/extensions/vm)
add_subdirectory(backend/extensions/window)
add_subdirectory(backend/extensions/sparse)

add_subdirectory(dpnp_algo)
add_subdirectory(dpnp_utils)
Expand Down
26 changes: 26 additions & 0 deletions dpnp/backend/extensions/blas/blas_py.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,32 @@ PYBIND11_MODULE(_blas_impl, m)
py::arg("depends") = py::list());
}

{
// y = alpha * op(A) * x + beta * y, the full BLAS gemv form.
// Used by dpnp.scipy.sparse.linalg.gmres to fuse the Arnoldi
// orthogonalisation (u -= V @ h) into a single oneMKL call
// (trans_op=0, alpha=-1, beta=1) and to write the Hessenberg
// column h = V^H @ u directly into the Hessenberg matrix slice
// (trans_op=2, alpha=1, beta=0).
//
// trans_op selects the op applied to A:
// 0 = N y = alpha * A @ x + beta * y
// 1 = T y = alpha * A^T @ x + beta * y
// 2 = C y = alpha * A^H @ x + beta * y (F-contig only)
//
// For complex matrices the scalars must be exactly
// representable as their real form: callers pass {-1, 0, 1};
// fractional or complex scalars would lose the imaginary
// component on the C++ cast.
m.def("_gemv_alpha_beta", &blas_ns::gemv_alpha_beta,
"Call `gemv` from oneMKL BLAS library with explicit "
"alpha, beta, and tri-state trans_op (0=N, 1=T, 2=C). "
"Computes y = alpha * op(A) * x + beta * y.",
py::arg("sycl_queue"), py::arg("matrixA"), py::arg("vectorX"),
py::arg("vectorY"), py::arg("trans_op"), py::arg("alpha"),
py::arg("beta"), py::arg("depends") = py::list());
}

{
m.def("_syrk", &blas_ns::syrk,
"Call `syrk` from oneMKL BLAS library to compute "
Expand Down
137 changes: 121 additions & 16 deletions dpnp/backend/extensions/blas/gemv.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -49,14 +49,20 @@ namespace type_utils = dpnp::tensor::type_utils;

using ext::common::init_dispatch_vector;

// Impl signature now carries alpha and beta as double. Each per-T impl
// casts them to T at the very end (T(alpha_d), T(beta_d)) so the same
// dispatch vector serves both the legacy alpha=1/beta=0 wrapper and
// the new alpha_beta entry point used by the GMRES Arnoldi fast path.
typedef sycl::event (*gemv_impl_fn_ptr_t)(sycl::queue &,
oneapi::mkl::transpose,
const std::int64_t,
const std::int64_t,
const double, // alpha
const char *,
const std::int64_t,
const char *,
const std::int64_t,
const double, // beta
char *,
const std::int64_t,
const bool,
Expand All @@ -69,10 +75,12 @@ static sycl::event gemv_impl(sycl::queue &exec_q,
oneapi::mkl::transpose transA,
const std::int64_t m,
const std::int64_t n,
const double alpha_d,
const char *matrixA,
const std::int64_t lda,
const char *vectorX,
const std::int64_t incx,
const double beta_d,
char *vectorY,
const std::int64_t incy,
const bool is_row_major,
Expand All @@ -84,6 +92,15 @@ static sycl::event gemv_impl(sycl::queue &exec_q,
const T *x = reinterpret_cast<const T *>(vectorX);
T *y = reinterpret_cast<T *>(vectorY);

// Cast alpha/beta into the matrix value type. For complex T the
// single-argument constructor sets the imaginary component to
// zero, which is exact for the GMRES use case (alpha and beta are
// always one of {-1, 0, 1}) and for the dpnp.dot wrapper (alpha=1,
// beta=0). Callers passing fractional or complex scalars through
// this path would lose the imaginary component silently.
const T alpha = static_cast<T>(alpha_d);
const T beta = static_cast<T>(beta_d);

std::stringstream error_msg;
bool is_exception_caught = false;

Expand Down Expand Up @@ -112,13 +129,13 @@ static sycl::event gemv_impl(sycl::queue &exec_q,
// or 'C' for a conjugate transpose.
m, // Number of rows in matrix A.
n, // Number of columns in matrix A.
T(1), // Scaling factor for the matrix-vector product.
alpha, // Scaling factor for the matrix-vector product.
a, // Pointer to the input matrix A.
lda, // Leading dimension of matrix A, which is the
// stride between successive rows (for row major layout).
x, // Pointer to the input vector x.
incx, // The stride of vector x.
T(0), // Scaling factor for vector y.
beta, // Scaling factor for vector y.
y, // Pointer to output vector y, where the result is stored.
incy, // The stride of vector y.
depends);
Expand All @@ -141,14 +158,38 @@ static sycl::event gemv_impl(sycl::queue &exec_q,
return gemv_event;
}

std::pair<sycl::event, sycl::event>
gemv(sycl::queue &exec_q,
const dpnp::tensor::usm_ndarray &matrixA,
const dpnp::tensor::usm_ndarray &vectorX,
const dpnp::tensor::usm_ndarray &vectorY,
const bool transpose,
const std::vector<sycl::event> &depends)
// Shared validation + dispatch. Both gemv() (alpha=1, beta=0) and
// gemv_alpha_beta() funnel through here so behaviour stays identical
// across the two entry points.
//
// ``trans_op`` is a tri-state matching oneapi::mkl::transpose:
// 0 = N (no transpose),
// 1 = T (plain transpose),
// 2 = C (conjugate-transpose, complex only).
//
// The legacy gemv() entry-point only ever needs N/T (real or complex,
// no conjugate semantics) and forwards trans_op = 0 or 1. The
// gemv_alpha_beta() entry-point exposes the full tri-state so the
// GMRES Arnoldi inner step can request V^H directly instead of
// post-conjugating the result of a T-mode gemv -- which is
// mathematically wrong for a complex right-hand vector (the identity
// conj(V^T @ u) == V^H @ u holds only when u is real-valued).
static std::pair<sycl::event, sycl::event>
gemv_dispatch(sycl::queue &exec_q,
const dpnp::tensor::usm_ndarray &matrixA,
const dpnp::tensor::usm_ndarray &vectorX,
const dpnp::tensor::usm_ndarray &vectorY,
const int trans_op,
const double alpha,
const double beta,
const std::vector<sycl::event> &depends)
{
if (trans_op < 0 || trans_op > 2) {
throw py::value_error("gemv: trans_op must be 0 (N), 1 (T), or 2 (C).");
}
const bool is_transposed = (trans_op != 0);
const bool is_conj_trans = (trans_op == 2);

const int matrixA_nd = matrixA.get_ndim();
const int vectorX_nd = vectorX.get_ndim();
const int vectorY_nd = vectorY.get_ndim();
Expand Down Expand Up @@ -182,10 +223,22 @@ std::pair<sycl::event, sycl::event>
"Input matrix is not c-contiguous nor f-contiguous.");
}

// Conjugate-transpose is only wired up for column-major (F-contig)
// matrices. The row-major remap (treating a C-contig matrix as its
// column-major transpose) does not extend cleanly to the C op
// because (A^T)^H == conj(A), which oneMKL does not expose as a
// gemv mode. Callers needing C-mode on row-major input must
// F-contigify first (e.g. via dpnp.asarray(A, order="F")).
if (is_conj_trans && !is_matrixA_f_contig) {
throw py::value_error(
"gemv: trans_op = 2 (conjugate-transpose) requires an "
"F-contiguous matrix; pass dpnp.asarray(A, order='F') first.");
}

const py::ssize_t *a_shape = matrixA.get_shape_raw();
const py::ssize_t *x_shape = vectorX.get_shape_raw();
const py::ssize_t *y_shape = vectorY.get_shape_raw();
if (transpose) {
if (is_transposed) {
if (a_shape[0] != x_shape[0]) {
throw py::value_error("The number of rows in A must be equal to "
"the number of elements in X.");
Expand All @@ -209,6 +262,9 @@ std::pair<sycl::event, sycl::event>
oneapi::mkl::transpose transA;
std::size_t src_nelems;

// Resolve the storage layout into the oneMKL transpose mode.
// Conjugate-transpose is constrained to F-contig above; the
// row-major branch therefore only sees N/T here.
// cuBLAS supports only column-major storage
#if defined(USE_ONEMATH_CUBLAS)
constexpr bool is_row_major = false;
Expand All @@ -218,7 +274,11 @@ std::pair<sycl::event, sycl::event>
if (is_matrixA_f_contig) {
m = a_shape[0];
n = a_shape[1];
if (transpose) {
if (is_conj_trans) {
transA = oneapi::mkl::transpose::C;
src_nelems = n;
}
else if (is_transposed) {
transA = oneapi::mkl::transpose::T;
src_nelems = n;
}
Expand All @@ -228,9 +288,11 @@ std::pair<sycl::event, sycl::event>
}
}
else {
// Row-major-as-column-major swap. is_conj_trans is rejected
// above, so only N/T need handling.
m = a_shape[1];
n = a_shape[0];
if (transpose) {
if (is_transposed) {
transA = oneapi::mkl::transpose::N;
src_nelems = m;
}
Expand All @@ -248,7 +310,11 @@ std::pair<sycl::event, sycl::event>
const std::int64_t m = a_shape[0];
const std::int64_t n = a_shape[1];

if (transpose) {
if (is_conj_trans) {
transA = oneapi::mkl::transpose::C;
src_nelems = n;
}
else if (is_transposed) {
transA = oneapi::mkl::transpose::T;
src_nelems = n;
}
Expand Down Expand Up @@ -299,16 +365,55 @@ std::pair<sycl::event, sycl::event>
y_typeless_ptr -= (y_shape[0] - 1) * std::abs(incy) * y_elemsize;
}

sycl::event gemv_ev =
gemv_fn(exec_q, transA, m, n, a_typeless_ptr, lda, x_typeless_ptr, incx,
y_typeless_ptr, incy, is_row_major, depends);
sycl::event gemv_ev = gemv_fn(exec_q, transA, m, n, alpha, a_typeless_ptr,
lda, x_typeless_ptr, incx, beta,
y_typeless_ptr, incy, is_row_major, depends);

sycl::event args_ev = dpnp::utils::keep_args_alive(
exec_q, {matrixA, vectorX, vectorY}, {gemv_ev});

return std::make_pair(args_ev, gemv_ev);
}

std::pair<sycl::event, sycl::event>
gemv(sycl::queue &exec_q,
const dpnp::tensor::usm_ndarray &matrixA,
const dpnp::tensor::usm_ndarray &vectorX,
const dpnp::tensor::usm_ndarray &vectorY,
const bool transpose,
const std::vector<sycl::event> &depends)
{
// Legacy alpha=1, beta=0 wrapper. Existing dpnp.dot callers expect
// this exact behaviour (N or plain T only, never conjugate-
// transpose), so we forward through the shared dispatch mapping
// the bool argument to the {0=N, 1=T} subset of the tri-state.
const int trans_op = transpose ? 1 : 0;
return gemv_dispatch(exec_q, matrixA, vectorX, vectorY, trans_op,
/*alpha=*/1.0, /*beta=*/0.0, depends);
}

std::pair<sycl::event, sycl::event>
gemv_alpha_beta(sycl::queue &exec_q,
const dpnp::tensor::usm_ndarray &matrixA,
const dpnp::tensor::usm_ndarray &vectorX,
const dpnp::tensor::usm_ndarray &vectorY,
const int trans_op,
const double alpha,
const double beta,
const std::vector<sycl::event> &depends)
{
// Caller-supplied alpha / beta and full tri-state transpose.
// Used by the GMRES Arnoldi step to fuse u -= V @ h
// (trans_op=0, alpha=-1, beta=1) into a single gemv kernel, and
// to write h = V^H @ u directly into a Hessenberg column slice
// (trans_op=2, alpha=1, beta=0). For complex matrices the scalars
// must be exactly representable as their real form -- callers
// pass 1/0/-1 only, see the impl-level comment for the imag-loss
// caveat.
return gemv_dispatch(exec_q, matrixA, vectorX, vectorY, trans_op, alpha,
beta, depends);
}

template <typename fnT, typename varT>
struct GemvContigFactory
{
Expand Down
33 changes: 33 additions & 0 deletions dpnp/backend/extensions/blas/gemv.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,8 @@

namespace dpnp::extensions::blas
{
// Convenience wrapper: alpha = 1, beta = 0. Preserved for the existing
// dpnp call sites (dpnp.dot etc.) that do not care about scaling.
extern std::pair<sycl::event, sycl::event>
gemv(sycl::queue &exec_q,
const dpnp::tensor::usm_ndarray &matrixA,
Expand All @@ -43,5 +45,36 @@ extern std::pair<sycl::event, sycl::event>
const bool transpose,
const std::vector<sycl::event> &depends);

// Full y = alpha * op(A) * x + beta * y form. Required by the GMRES
// Arnoldi step where we fuse u -= V @ h into a single gemv call
// (alpha = -1, beta = 1), and where we write h = V^H @ u directly into
// a Hessenberg column slice (alpha = 1, beta = 0). Both alpha and
// beta arrive as double on the Python side and are cast to the matrix
// value type inside the impl -- complex callers should use 1 / 0 / -1
// (representable exactly) to avoid silent imaginary loss.
//
// ``trans_op`` selects the operation applied to A:
// 0 = N (no transpose) y = alpha * A @ x + beta * y
// 1 = T (transpose) y = alpha * A^T @ x + beta * y
// 2 = C (conjugate-transpose) y = alpha * A^H @ x + beta * y
//
// For real-valued A, T and C are equivalent. For complex A they
// differ, and C is required for any algorithm that performs a
// Hermitian inner product through gemv -- the GMRES Arnoldi step
// (Gram-Schmidt over a complex Krylov basis) being the canonical
// example. ``trans_op = 2`` is currently only supported for
// F-contiguous (column-major) matrices; the row-major code path
// for conjugate-transpose would require an explicit element-wise
// conjugate pass and is not wired up here.
extern std::pair<sycl::event, sycl::event>
gemv_alpha_beta(sycl::queue &exec_q,
const dpnp::tensor::usm_ndarray &matrixA,
const dpnp::tensor::usm_ndarray &vectorX,
const dpnp::tensor::usm_ndarray &vectorY,
const int trans_op,
const double alpha,
const double beta,
const std::vector<sycl::event> &depends);

extern void init_gemv_dispatch_vector(void);
} // namespace dpnp::extensions::blas
Loading