Skip to content

Pull upstream PySCF updates#13

Open
nemeott wants to merge 36 commits into
SamikBose:masterfrom
pyscf:master
Open

Pull upstream PySCF updates#13
nemeott wants to merge 36 commits into
SamikBose:masterfrom
pyscf:master

Conversation

@nemeott

@nemeott nemeott commented Jun 28, 2026

Copy link
Copy Markdown
Collaborator

Tested Alanine and SN2 after the updates and no regressions were spotted. As for the merge conflict, it is only a few lines in the README.

fchapoton and others added 30 commits May 8, 2026 17:09
a few minor code simplifications and formatting in pyscf/tools
`load_ecp` previously raised RuntimeError as soon as the input was a bare name not present in pyscf's local ALIAS table, making the trailing BSE lookup at the end of the function unreachable. Names like "def2-ecp" that exist in BSE but not in ALIAS therefore failed even with basis-set-exchange installed.

Mirror the structure of `load`: when the input has no newline and is not an alias, consult BSE before giving up. Remove the unreachable BSE block at the tail of the function.
* update dispersion

* add tests

* fix threshold

* solve comments
…cf, ao2mo, mp, agf2 (#3214)

* Fix three wrong-result bugs in pbc/

* nr_direct.c: PBCVHF_direct_drv_nodddd unconditionally clobbered the
  (k, l) shell indices right after the s2kl symmetry branch had just
  computed them, silently disabling the s2kl path and producing
  out-of-range indices for kl >= nksh.

* inner_dot.c: PBC_zdot_CNC_s1 and PBC_zdot_CNN_s1 wrote the per-block
  dgemms to outR/outI (the head of the output buffer) instead of the
  pointers poutR/poutI = outR/outI + i0*nbc that were set up earlier
  in the parallel loop. Both gave wrong results for na > BLKSIZE and a
  write race under schedule(static). The companion k-point variants
  PBC_kzdot_CNC_s1/_CNN_s1 already use the per-block pointers.

* optimizer.c: PBCdel_optimizer used "if (!opt0->rcut) free(opt0->rcut)",
  i.e. only freed rcut when it was NULL (no-op) and leaked it whenever
  it was actually allocated.

* Fix loop and dgemm stride bugs in gto/ and pdft/

* gto/fill_grids_int2c.c: in both GTOgrids_int2c and the spinor
  variant, the shell-index shift "ish += ish0; jsh += jsh0" was
  performed inside the per-grid-block loop, so on every iteration past
  the first the indices were shifted by an extra ish0/jsh0. Hoist the
  shift (and the derived i0/j0/shls[0..1]) out of the grid loop. Only
  reachable when ngrids > BLKSIZE.

* gto/ft_ao.c: GTO_ft_zfill_s1hermi's "ioff != joff" branch used
  ij = j*nj+i for the (i,j) slot, but the matching s1 path at line
  1082 uses j*ni+i. Currently masked because hermi implies ni == nj,
  but the layout was inconsistent; align with the s1 convention.

* pdft/nr_numint.c: VOTdot_ao_mo's blocked dgemm wrote the output
  block starting at vv + b0i*nao + b0j with leading dimension nmo.
  The matrix is shaped [nao, nmo], so the row stride must be nmo, not
  nao. With nao != nmo (the normal PDFT case) the block went to the
  wrong row and could overrun the buffer.

* Fix stack and table overflows for high angular momentum

* gto/deriv2.c: the fx{0,1,2,3,4} / fy* / fz* stack buffers were
  sized double[SIMDD*16], but the recurrence writes
  fx0[lx*SIMDD+n] for lx up to l+2 (deriv2), l+3 (deriv3), l+4
  (deriv4). With ANG_MAX=15 the writes overrun the 16-slot array for
  l >= 14 (deriv2), l >= 13 (deriv3), l >= 12 (deriv4), stomping the
  adjacent fy0/fz0 buffers. Resize to SIMDD*(LMAX+5) for all three
  kernels, matching the autocode template at
  gto/autocode/gen-code.cl.

* gto/nr_ecp.c:
  - ECPsph_ine_opt's default branch (order > 7) reads/writes
    k0[order + K_TAYLOR_MAX] into a buffer sized K_TAB_COL=24. With
    K_TAYLOR_MAX=7 this overruns once order > 16. Reachable via
    type2_facs_rad (li + lc with li up to ANG_MAX and lc up to
    ECP_LMAX=5). Fall back to ECPsph_ine for the unsupported range.
  - ECPtype_so_cart had MALLOC_INSTACK(buf, ...) called twice in
    a row with the same target pointer, silently discarding the
    first allocation and inflating cache use. Remove the duplicate.
  - ECPdel_optimizer assigned NULL to its parameter copy ("opt =
    NULL") instead of the caller's pointer ("*opt = NULL"), leaving
    the caller with a dangling pointer.

* Fix integer overflow and uint8 wrap-around in screening / index math

* vhf/fill_nr_s8.c: ij0 = i0*(i0+1)/2 + j0 was computed in int and
  overflows once i0 reaches ~65000. Promote with a (size_t) cast on
  the first factor.

* vhf/nr_sr_vhf.c: nblock and the derived nblock2/nblock3 were
  uint32_t, and blk_id was a plain int. nblock*nblock*nblock
  overflows uint32_t past nblock ~ 1626, and "int blk_id < nblock3"
  overflows int past nblock ~ 1290. Promote nblock/nblock2/nblock3
  and blk_id/r to size_t.

* mcscf/fci_rdm.c: tril_particle_symm computed
  blk = MIN(((int)(48/norb))*norb, nnorb). For norb > 48 this
  collapses to blk = 0, after which "for (m=0; m<nnorb-blk; m+=blk)"
  is an infinite loop. Use MAX(blk_units, 1) so blk stays at norb at
  minimum.

* gto/grid_ao_drv.c: screen_index used (uint8_t)(si + 1) without an
  upper bound. For si >= 255 (very tight AOs, large -arr) the cast
  wraps mod 256 and silently demotes a maximally-significant grid
  point to "screened out" (0) or to small significance. Add a
  saturating clamp at 255. Behavior for si <= 254 is unchanged.

* Fix undefined behaviour, dead branches, and off-by-one asserts

* pbc/fill_ints.c: _nr2c_fill declared "int empty" and only assigned
  it 0 inside the conditional, then returned !empty. When no shell
  pair contributed (nimgs == 0 or all intor calls returned 0) it
  returned a garbage value. Initialize empty = 1.

* pbc/fill_ints_screened.c: two assertions "assert(dk < dkmax)"
  fired in debug builds whenever a single ksh exactly filled the
  buffer. The correct invariant is "dk <= dkmax".

* dft/libxc_itrf.c: LIBXC_max_deriv_order's inner loop iterated
  "o > 0" so it never tested order 0, and the
  "if (o == -1) return -1" check was unreachable. EXC-only
  functionals therefore returned a stale "ord" value (often 4)
  instead of 0, and a functional with no derivative flags at all
  was never reported. Iterate "o >= 0" and use an explicit "found"
  flag to drive the -1 return.

* vhf/nr_sgx_direct.c: SGXdiagonal_ints allocated three buffers
  (buf, cache, dists) that the function never reads or writes,
  including one with sizeof(int) for a double*. Remove all three
  along with the now-unused di and cache_size locals.

* Fix missing _vhf import in test_rhf

test_get_vj referenced scf._vhf._fpointer(...) but pyscf.scf does not
re-export _vhf, so the test failed at collection-time with
AttributeError. Import _vhf explicitly and reference it directly.

* Fix ECP refinement double-halving when convergence is partial

ECPtype2_cart and ECPtype_so_cart never set converged[ijl] = 1 before
the CLOSE_ENOUGH check that may zero it back to 0. The sister function
ECPtype1_cart (line ~5936) correctly sets it. Consequence: once ANY
(ic, jc, lab) block fails to converge at a refinement level, every
block — including those that converged at previous levels — is
re-entered at the next level and "prad[i] *= .5" is applied a second
time, silently corrupting the radial accumulator.

The all-zero initialization of "converged" plus the fact that the bug
only matters when some blocks converge before others is why this
escaped notice: simple test ECPs all converge at the same level.

* Honor envs->expcutoff in GTO_Gv_orth/nonorth and fix nbins clamp ordering

* gto/ft_ao.c: GTO_Gv_orth (line 552) and GTO_Gv_nonorth (line 639)
  computed "double cutoff = EXPCUTOFF * aij * 4" with the literal
  macro EXPCUTOFF = 60, ignoring env[PTR_EXPCUTOFF] that the user
  may have tightened. The companion GTO_Gv_general at line 482
  correctly uses envs->expcutoff. The two _orth/_nonorth paths
  silently dropped per-mol cutoff overrides.

* gto/grid_ao_drv.c: GTO_screen_index computed "scale = -nbins /
  log(MIN(cutoff, .1))" before clamping "nbins = MIN(127, nbins)".
  The screening formula "si = nbins - arr * scale" then mixed the
  clamped offset with the unclamped slope, so callers passing
  nbins > 127 got a different screening map than callers passing
  nbins <= 127 with the same cutoff. Move the clamp ahead of the
  scale computation so both quantities reference the same effective
  nbins.

* Initialize empty in _nr2c_screened_fill

Same pattern as the earlier fix for _nr2c_fill in pbc/fill_ints.c:
"int empty" was declared at function entry and only assigned 0 inside
the conditional intor branch, then returned via "return !empty". When
no shell pair contributed (no matching jL or all intor calls returned
0), the function returned a garbage value.

* np_helper: guard size_t multiplications and zero-size reductions

* transpose.c: NPdtranspose_021 and NPztranspose_021 computed
  "size_t nm = shape[1] * shape[2]" in int and only widened on
  assignment. Cast first operand to size_t so matrices larger than
  46340 x 46340 don't silently overflow.

* pack_tril.c: NP{d,z}{unpack,pack}_tril_2d had the same int-times-int
  pattern for "size_t nn = n * n" and "size_t n2 = n*(n+1)/2".

* condense.c: NP_Bmax, NP_imax, NP_fmax read a[0] unconditionally and
  could read past the legitimate slice when called with di == 0 or
  dj == 0 (which NPbcondense / NPicondense / NPfcondense will do for
  loc_x[i] == loc_x[i+1] groups). Add the same di == 0 || dj == 0
  guard that NP_max / NP_min / NP_absmax / NP_absmin / NP_norm
  already have.

* Guard ECP angular tables and screening init

* gto/nr_ecp.c: _angular_moment_matrix only has entries for lc 0..4
  (s..g). ECPtype_so_cart can request lc up to ECP_LMAX=5 for normal
  projectors, or lc = ecp_lmax[n] + 1 (up to 6) via the Ul fallback.
  Reading _angular_moment_matrix[5] returned a stale pointer from the
  rodata section, and the companion angi/angj/jmm_angj buffers sized
  on ECP_LMAX were too small. Skip the iloc iteration with a stderr
  warning when lc > 4 rather than crash silently.

* pbc/nr_ecp.c: per-atom-ECP-group screening init was eta = 1.f, which
  silently clamps the MIN-reduction below to 1.0 for atoms whose ECP
  primitive exponents are all > 1 (typical of tight cores). Initialize
  to FLT_MAX so the reduction returns the true smallest exponent.

* Fix multigrid pgfpair_radius same-atom shortcut

pgfpair_radius compared the raw signed components of the displacement
rab against RZERO instead of the magnitude. Any all-negative
displacement (around 1/8 of periodic-image shifted pairs) wrongly
took the same-atom branch and returned a too-large radius from
pgf_rcut(1.0, ...), inflating the task list with bogus far-field pairs
and biasing collocation results. Compare SQUARE(rab) instead.

* vhf: size_t casts and zero-initialised screening tables

The earlier fix in nr_sr_vhf.c addressed the uint32_t nblock^3 / int
blk_id overflow only in the SR driver; the main NR drivers had the
same pattern.

* nr_direct.c CVHFnr_direct_drv + CVHFnr_direct_ex_drv: promote
  nblock_*/nblock_kl/nblock_jkl to size_t and blk_id/r to size_t.
  Also compute size_limit in ssize_t and floor it so that
  di^4*ncomp > 2e8 visibly clamps instead of casting a huge unsigned
  result into a junk int (causing the v_priv flush check to misfire
  forever).

* nr_incore.c: five sites of "size_t npair = nao*(nao+1)/2;" computed
  the RHS in int (overflowing for nao > ~65535) and similarly for
  "size_t nn = nao * nao;". Cast first factor to size_t.

* hessian_screen.c: malloc(sizeof(double) * nbas*nbas) and the *2 /
  256*di^4 / 9*di^4 sizings were all int-multiplied before being
  widened to size_t. The same function even computed Nbas2 = Nbas*Nbas
  correctly one line above the broken malloc. Use the size_t form.

* fill_nr_s8.c GTO2e_cart_or_sph: loop bound nbas*(nbas+1)/2 in int
  overflows above ~46340; the same buffer size di*di*nao*nao is also
  computed in int. Promote.

* rkb_screen.c CVHFrkbssll_direct_scf_dm: dm_cond was malloc'd but
  never zeroed, unlike the sibling _ll/_sl setters which NPdset0
  immediately. The strict upper triangle of the master LL/SS/SL slots
  was then read by CVHFrkbssll_prescreen. Add the matching NPdset0.

* Fix pbc/hf_grad neighbor-list index and per-component stride

* hf_grad.c:70 indexed nl0->pairs using nbas as the j-side dim, but a
  neighbor list built from a narrower shls_slice has nl0->njsh < nbas.
  Use nl0->njsh.

* hf_grad.c:83 hardcoded "iatm*3+ic" for the per-atom output stride,
  which works for the documented comp=3 gradient case but silently
  corrupts the output for any other comp (the buffer is allocated as
  comp*natm doubles). Use iatm*comp+ic.

* Assert square layout for hermi mode in gto/fill_int2c and fill_grids_int2c

GTOint2c / GTOint2c_spinor / GTOgrids_int2c{,_spinor} hermitize the
output via NPdsymm_triu / a manual (i,j)<->(j,i) swap. Both code paths
assume the matrix is square (naoi == naoj) and the bra/ket slices
start at the same shell (ish0 == jsh0). With a rectangular or offset
slice, the per-component stride and the (i,j)<->(j,i) decode address
cells outside the intended block and silently corrupt the matrix.

Add explicit asserts so a future caller passing hermi != PLAIN on a
non-square slice fails loudly instead of returning a wrong matrix.

* mcscf: 1UL → 1ULL portability and SCI bbaa accumulation comment

* fci_string.c FCIaddrs2str: three uses of "1UL << nelec" / "1UL <<
  norb_left". On LLP64 (Windows) "unsigned long" is 32 bits, so these
  truncate for nelec >= 32 or norb_left >= 32. Use 1ULL to match the
  rest of the file (which uses 1ULL consistently from line 146 on).

* select_ci.c SCIcontract_2e_bbaa{,_symm}: document that ci1 is
  intentionally not zeroed. The Python wrappers selected_ci.py and
  selected_ci_symm.py call this after the (aa|aa) and (bb|bb)
  SCIcontract_2e_aaaa kernels and rely on the (bb|aa) contribution
  accumulating on top. A naive NPdset0 here would wipe out the alpha
  and beta same-spin contributions.

* Fix agf2/uagf2 OOB on spin-asymmetric inputs

AGF2udf_vv_vev_islice_lowmem decomposes the (i,j) work as
"j = ij % max(noa,nob)" and then unconditionally built qx_j and qa_j
by slicing the alpha-side arrays qxi[naux, nmo, noa] and
qja[naux, noa, nva] at index j. When nob > noa, j can reach values
>= noa for the cross-spin (do_os) part, and the slice readers walk
past the noa-dim into adjacent memory.

The same-spin (do_ss) and opposite-spin (do_os) flags were already
computed before the slice builds, but only consulted afterwards.
Gate the alpha-side j slice builds behind do_ss; the opposite-spin
path uses the beta-side qx_j_b / qa_j_b which are built separately.

* Gate fill_nr_3c / fill_r_3c output copy on intor return value

Sister code in gto/fill_int2e.c only copies the intor buffer into the
output when the libcint call returns non-zero (its convention for
"primitives were fully screened, output buffer not touched"). The
three-center fillers GTOnr3c_fill_s2ij and GTOr3c_fill_s2ij dropped
this gate, so when the integrator returned 0 the dcopy_s2_* /
zcopy_s2_* copies fed uninitialised buf contents into the packed
output. Wrap each in if ((*intor)(...)).

* Sister-pattern size_t casts in multigrid, vhf, np_helper, mcscf

Pattern fixes that match overflow-prone sites we already corrected
elsewhere but missed in their siblings:

* dft/multigrid.c::init_rs_grid: ngrid = mesh[0]*mesh[1]*mesh[2] was
  computed in int and silently under-sized the FFT-grid allocation
  for very fine meshes (~1024 on a side). Cast first factor to size_t.

* vhf/optimizer.c::CVHFset_q_cond / CVHFset_dm_cond: int len parameter
  truncated for nbas >= 46341 (callers pass nbas*nbas), mis-sizing the
  malloc + memcpy. Same class of bug fixed earlier in nr_sr_vhf.c,
  nr_direct.c, fill_nr_s8.c. Promote to size_t.

* np_helper/transpose.c::NPdsymm_021_sum and NPzhermi_021_sum: same
  (size_t)shape[1]*shape[1] cast we applied to NPdtranspose_021 /
  NPztranspose_021, missing from these two siblings.

* mcscf/fci_rdm.c: three pointer-arithmetic sites (ket+stra_id*nb+
  strb_id and bra+stra_id*nb+strb_id, plus the *na variant) missing
  the (size_t) cast already applied at line 83 / 111 in the same file.

* Revert CVHFset_{q,dm}_cond signature change

Per review (#3214): these functions are called via ctypes from Python,
and changing the parameter type would require auditing every caller
(some go through function pointers) to keep type-consistency. Restore
the original int len signature.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Revert intor return-value gate in fill_{nr,r}_3c

Per review (#3214): libcint zeroes buf inside intor(), so the original
unconditional dcopy/zcopy is correct. The would-be alternative — skip
the copy and instead explicitly zero the output region (as fill_int2e.c
does) — is more code for no behavior change. Revert to the original
form.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Revert eta init in PBCECP_loop back to 1.0f

Per review (#3214): eta = 1.0f is intentional — it caps the screening
estimator at a safe lower bound that always includes enough images in
the lattice sum, even when the screening estimator is inaccurate for
tight-core ECPs. Initializing to FLT_MAX and taking the true MIN can
miss required images. Trade a few extra image evaluations for
correctness.

Drop the now-unused <float.h> include.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* GTO_screen_index: assert nbins<120 instead of reordering clamp

Per review (#3214): scale must be computed using the caller's nbins so
the (nbins - arr*scale) mapping the caller expects is preserved.
Replace the moved clamp with an assert(nbins < 120), which keeps the
final uint8_t encoding (si+1, capped at 255) safe without altering
scale.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Intermediate cache size issue

* Missing header file: stdio.h

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Qiming Sun <osirpt.sun@gmail.com>
* Release 2.13.1

* Update README.md

Co-authored-by: Shirong Wang <srwang20@fudan.edu.cn>

---------

Co-authored-by: Shirong Wang <srwang20@fudan.edu.cn>
* update PBC GW module

* KRGWAC and KUGWAC: improve CPU and memory efficiency, add more features
* GWAC: Gamma-point G0W0
* add test cases

Co-authored-by: Tianyu Zhu <zhutianyu1991@gmail.com>
Co-authored-by: Christopher Hillenbrand <chillenbrand15@gmail.com>

* PBC GW: fix code style

Co-authored-by: Tianyu Zhu <zhutianyu1991@gmail.com>
Co-authored-by: Christopher Hillenbrand <chillenbrand15@gmail.com>

* add KRGWAC and KUGWAC examples

* pbc GW: update Gamma-point GWAC docstring for the finite-size correction in exchange

---------

Co-authored-by: Tianyu Zhu <zhutianyu1991@gmail.com>
Co-authored-by: Christopher Hillenbrand <chillenbrand15@gmail.com>
* Add fallback to basis-set-exchange in load function for basis sets that are found in the ALIAS dictionary but missing elements in the .dat file

* Update __init__.py

---------

Co-authored-by: Qiming Sun <osirpt.sun@gmail.com>
…ut arguments

SCF.init_guess_by_mod_huckel had a required positional argument
`updated_rule` that the method body never used (it always calls
_init_guess_huckel_orbitals with updated_rule=True). Because of that,
scf.RHF(mol).init_guess_by_mod_huckel() raised TypeError when called
without arguments, unlike the sibling init_guess_by_huckel() and the
UHF/ROHF/GHF/DHF versions, which use (self, mol=None).

Drop the unused parameter so the signature matches the other classes and
the method's own docstring (which documents only mol). The string-dispatch
path using self.mol gives the same result as before; the direct no-argument
call is fixed; an explicit mol now follows the (mol=None) convention used by
the sibling methods. A regression test for the no-argument call is added.

Also sync a few nearby docstrings with the code: list 'mod_huckel' in the
init_guess option lists (scf/__init__.py also omitted 'huckel'/'sap'), and
correct the generic conv_tol default in scf/__init__.py (1e-10 -> 1e-9).
* wrap dll

* update cmake options

* loosen one test

* wrap dll inside load_library

* force minimal change

* add psutil as dep

* cleaner load_library
…3225)

* Support complex orbitals in GCCSD; enable SOC Hamiltonian for GCCSD.
* handle NamedTemporaryFile on windows

* revert mistakes

* atexit

* fix tests

* close-then-unlink
* loosen tests in gw and tdscf

* add pytest durations

* fix more; mute a few high cost tests

* fix

* mute more

* Adjust precision in RDM tests for N2
…nents (#3250)

* gto/ecp: require two consecutive converged levels in adaptive radial quadrature

The adaptive Gauss-Chebyshev radial quadrature in ECPtype1_cart,
ECPtype2_cart, and ECPtype_so_cart declared per-primitive-pair
convergence the first time CLOSE_ENOUGH(plast, prad) held between two
successive doubled grids. For sharply-peaked integrands (combined AO+ECP
exponent (2a+g) large, n >= 2, especially at higher AO angular momentum)
two coarse rules can agree to 1e-12 relative while both still
under-sample the peak at r ~ (2a+g)^-1/2, freezing in a wrong answer.
The n=1 channel happened to stay exact because r * exp(-c r^2) is far
smoother under the log-quadratic change of variables used.

Promote the per-pair flag to a counter and require two consecutive
CLOSE_ENOUGH matches before the pair is removed from refinement.

Also rewrite CLOSE_ENOUGH as a clean relative test (max of |x|, |y|
instead of |y| in the denominator, plus the absolute fallback). The
previous 1e-12 absolute floor falsely matched high-l / high-exponent
integrals whose true magnitude was below it; the new form catches the
0 == 0 case naturally and has no magic absolute threshold.

For a Kr atom with a 48-term large-exponent local ECP this removes a
constant ~4e-5 Ha absolute-energy error. Worst-case relative error on a
full alpha x g x n x l sweep (local + semilocal channels, alpha in
{1, 1e2, 1e4, 3e5}, g in {1e1, 1e3, 1e5, 1e7}, n in {1, 2}, l in
{0, 1, 2}) drops from 5.5e-5 to 4.7e-13. LANL2DZ benchmark (Cu + 6H)
shows no measurable slowdown.

Closes #3249.

* gto/ecp: regression test for large-exponent local and semilocal ECP integrals

For a same-center single-primitive AO of angular momentum l with a
one-term local or semilocal ECP channel c * r^(n-2) * exp(-g r^2), the
matrix element factorises so that the ratio I(g1) / I(g2) at fixed
alpha is

    ((2 alpha + g2) / (2 alpha + g1)) ** ((n + 2 l + 1) / 2)

independent of the AO normalisation. This is a stringent and cheap
closed-form check on the radial quadrature.

Sweep n in {1, 2}, l in {0, 1, 2}, alpha in {1, 1e2, 1e4, 3e5},
g in {1e1, 1e3, 1e5, 1e7}, local and semilocal channels.

Before the convergence-check fix worst-case rel. err was ~5.5e-5
(local) and ~1e-1 (semilocal at l = 2, large g); now 4.7e-13.

Refs #3249.
* fix win test

* fix more

* should not close on unix

* loosen tests for windows
* Fixed KeyError for MM ECPs in project_to_atomic_orbitals

* Added test for pre_orth_ao with coreless ECPs. Note that there are erroneous warnings from creating coreless ECPs (see #3172), but these can be safely ignored.

* Refactor ECP handling in orth.py

Updated exception handling for ECPs to check '_ecp' attribute instead of 'ecp'. Added comments for clarity.

* Fix formatting in test_orth.py

---------

Co-authored-by: Qiming Sun <osirpt.sun@gmail.com>
In MC-PDFT unittests which perform a variety of calculations on LiH,
SA-MCSCF calculations sometimes converge to different solutions.
Try to control this by initializing all calc'ns with converged
orbitals from high-symmetry version in setup.
* mcpdft: add MC26 and COF26 on-top functionals

MC26 shares the MC23/MC25 ansatz (reparametrized M06-L combined with a
CAS wave function contribution): E_ot = a0*E_CAS + E_xc[rep-M06L], with
a0 = 0.2781. COF26 extends this with reparametrized MN15-L exchange and
correlation components (libxc 260/261), a0 = 0.3096. The linear
parameters enter the LibXC components directly (facs = 1); a0 is the CAS
mixing coefficient supplied via hyb.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* dft: normalize CF22D xc code and enable its built-in D3 dispersion

- libxc: register 'CF22D' in XC_ALIAS so xc='CF22D' resolves to
  HYB_MGGA_X_CF22D,MGGA_C_CF22D (previously required 'CF22D,CF22D').
- scf/dispersion: whitelist CF22D so it automatically applies its D3
  zero-damping correction. CF22D is parameterized under zero damping
  (s6=1.0, rs6=1.53, s8=0.0), shipped with simple-dftd3 as 'cf22d'.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Yuhao Chen <chenyuhao@shu.edu.cn>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
* Add CABS support

* Update mp2f12_slow

* Add frozen core model, add ROHF reference support

* Update CABS tests

* Promote info to note for simple runs

* Have consistent auxbasis name

* Register CABS in mp module

* Provide simple example of CABS run

* Return helper back
* add Bethe-Salpeter equation to mol GW module
* support both spin-restricted and spin-unrestricted
* implementation for Davidson algorithm, full diagonalization and Lanczos algorithm
* energy-specific BSE

Co-authored-by: Tianyu Zhu <zhutianyu1991@gmail.com>
Co-authored-by: Christopher Hillenbrand <chillenbrand15@gmail.com>

* GW improvements: Bethe-Salpeter equation
* save BSE eigenvalues and eigenvectors in the object
* fix dimension errors in pre-conditioning step

* Update 04-bse.py with reference results for H2O

Added reference results for H2O using PBE and def2-SVP.

---------

Co-authored-by: Tianyu Zhu <zhutianyu1991@gmail.com>
Co-authored-by: Christopher Hillenbrand <chillenbrand15@gmail.com>
Co-authored-by: Qiming Sun <osirpt.sun@gmail.com>
* RPA improvements: PBC RPA routines

* KRPA/KURPA: new routine for spin-restricted/unrestricted k-point RPA
* support low-memory mode
* support smeared systems
* support ACFDT exchange for smeared systems
* add test files and example

Co-authored-by: Tianyu Zhu <zhutianyu1991@gmail.com>
Co-authored-by: Christopher Hillenbrand <chillenbrand15@gmail.com>
Co-authored-by: Chaoqun Zhang <bbzhchq@gmail.com>
Co-authored-by: Jincheng Yu <pimetamon@gmail.com>

* RPA improvements: PBC RPA routines
* add finite-size corrections for outcore routines

---------

Co-authored-by: Tianyu Zhu <zhutianyu1991@gmail.com>
Co-authored-by: Christopher Hillenbrand <chillenbrand15@gmail.com>
Co-authored-by: Chaoqun Zhang <bbzhchq@gmail.com>
Co-authored-by: Jincheng Yu <pimetamon@gmail.com>
* use newer pyberny in ci

* fix f-contig problem

* remove unnecessary verbose

* rewrite qmmm contractions

* fix pytblis test

* fix pytblis comments; update einsums
sunqm and others added 6 commits June 22, 2026 21:57
* add symmetrie bracking via homu lumo mixing

* UHF/UKS: add breaksym='mix' for HOMO-LUMO spin symmetry breaking

* add tests and update examples/scf/32

* removed delocalization requirement

* Apply suggestion from @jeanwsr

Co-authored-by: Shirong Wang <srwang20@fudan.edu.cn>

* Update test_uhf.py

Also removed lines 168-170 which depended on these variables.

* fix lint and add tests for breaksym='mix'

* remove delocalization check from breaksym mix test

* fix trailing whitespace

---------

Co-authored-by: Shirong Wang <srwang20@fudan.edu.cn>
…3257)

* pbc/df: chunk kpt-pairs in _CCGDFBuilder.outcore_auxe2

The CCDF int3c kernel was invoked for all nkpts^2 kpt-pairs at once,
producing an output buffer of (nkpts_ij, max_buflen, nauxc) doubles
(R + I). For a 6x6x6 k-mesh with j_only=False this is 46656 pairs and
the buffer reaches hundreds of GB, causing OOM even when the formula
at line 295 honestly returns buflen ~ 1 (the AO-shell granularity
prevents shrinking further, and the "memory usage may be N times over
max_memory" warning then prints but the loop allocates anyway).

Split kikj_idx into chunks sized so each chunk's int3c output fits in
max_memory, and rebuild gen_int3c_kernel per chunk with reindex_k
restricted to that chunk. The shell-block granularity (sh_ranges) is
kept fixed across chunks so fswap row writes remain consistent, and
the pre-allocated fswap layout (indexed by global kpt-pair index) is
unchanged so downstream readers see no difference.

The merge_dd path that pairs (ij_idx, ji_idx) within a kk_adapted
group requires both indices to live in the same chunk; that path now
groups whole kk_adapted groups together. A local pair_pos map
translates global kpt-pair indices to positions in the chunk's
outR/outI arrays.

When kpts_chunk >= nkpts_ij (small problems or ample memory) nchunks
== 1 and the execution path is identical to before.

* pbc/df: drop unused reindex_k assignments left over from chunking patch

The chunk loop computes reindex_k_chunk inline per chunk, so the
top-level reindex_k variable became dead. Removes ruff F841.
* Improve high-order CC implementations

- Add RCCSDT(Q) correction implementation, unit tests, and example
- Unify intermediate construction in RCCSDT, RCCSDTQ, and UCCSDT
- Reduce redundant tensor contractions
- Add orthogonalization for T4 amplitudes in RCCSDTQ
- Synchronize amplitude updates and update iteration-specific tests
- Fix typos and minor code/documentation issues

* Fix typos

* Fix typo in timing log

* Reduce intermediate memory usage by reusing slice buffers
* Fix atomic X approximation for GHF X2C

* Fix basis linear dependency treatments in X2C

* Enable zquatev eigen solver for x2c; Update DHF and X2C tests

* install dependency in CI

* Add to_gpu operation; Disable the atomic X approximation for pbc

* Disable atom-X approximation in pbc-x2c

* cleanup

* Update tests
@nemeott nemeott marked this pull request as ready for review June 28, 2026 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.