compiler: Avoid int32 overflow in linearized host-device transfer size by gaoflow · Pull Request #2939 · devitocodes/devito

gaoflow · 2026-05-29T02:07:52Z

Description

When a host↔device data transfer is linearized, its array section size is emitted as a product of the Function's per-dimension sizes, for example:

#pragma acc enter data copyin(u[0:u_vec->size[0]*u_vec->size[1]*u_vec->size[2]*u_vec->size[3]])

The u_vec->size[i] fields are 32-bit C ints, so the product size[0]*size[1]*size[2]*size[3] is evaluated in 32-bit arithmetic. For a Function with more than ~2**31 elements (e.g. the reporter's 1295**3 ≈ 2.17e9 points, ~24.5 GB) the product overflows int before it is used as the transfer bound, producing a bogus size (the reporter saw 18446744065653020036) and a corrupt / failed device transfer — independent of index-mode=int64/linearize=True, because those control the kernel index type, not the type of the transfer-clause arithmetic.

As @mloubout noted on the issue, the fix is to perform the size multiplication in 64-bit. This casts each factor of a product section bound to a 64-bit integer:

#pragma acc enter data copyin(u[0:(long)(u_vec->size[0])*(long)(u_vec->size[1])*(long)(u_vec->size[2])*(long)(u_vec->size[3])])

Casting the whole product ((long)(a*b*c)) would be too late — the overflow would already have happened in 32-bit — so each factor is cast individually, which forces every multiplication to be 64-bit regardless of operand ordering.

The change lives in PragmaTransfer._generate, so it is scoped to host-device transfer clauses only. Non-product bounds (a single dimension size, an offset, a constant) cannot overflow and are left untouched, addressing the concern that there is "no reason to use long for all of those". Non-transfer expressions (e.g. free-space guards, TMA descriptors) are unaffected.

Reproduction

from devito import Eq, Grid, Operator, TimeFunction

grid = Grid(shape=(4, 5, 6))
u = TimeFunction(name='u', grid=grid)
op = Operator(Eq(u.forward, u + 1), platform='nvidiaX', language='openacc',
              opt=('advanced', {'linearize': True}))
print(op.body.maps[0].ccode.value)

Before:

acc enter data copyin(u[0:u_vec->size[0]*u_vec->size[1]*u_vec->size[2]*u_vec->size[3]])

After:

acc enter data copyin(u[0:(long)(u_vec->size[0])*(long)(u_vec->size[1])*(long)(u_vec->size[2])*(long)(u_vec->size[3])])

Verification

The fix applies to both backends (openacc copyin/copyout/delete and openmp map(to:/release:)) and to 2D/3D Functions.
The non-linearized transfer path (separate per-dimension sections [0:s0][0:s1]...) is unchanged — there is no product there, hence no overflow.
Added TestPassesOptional::test_linearize_transfer_no_overflow asserting each size[i] factor of a linearized transfer is cast to long and that no bare 32-bit product remains.
Updated the existing test_gpu_openmp.py expectations (test_basic, test_multiple_eqns) whose OpenMP transfers use the flattened product form.
Host (CPU) operators, including linearize=True, build and run unchanged (no device transfers emitted). flake8 clean on the changed files.

Note: the GPU test modules are skipif(['nodevice']), so the codegen assertions run on the GPU CI runners. They were validated locally by forcing platform='nvidiaX'.

When a host-device data transfer is linearized, its array section size is emitted as a product of the Function's per-dimension sizes, e.g. `copyin(u[0:u_vec->size[0]*u_vec->size[1]*u_vec->size[2]*u_vec->size[3]])`. The `size[i]` fields are 32-bit C ints, so for a Function with more than ~2**31 elements the product overflows `int` before it is used as the transfer bound, yielding a bogus size and a corrupt/failed device transfer. Cast each factor of the product to a 64-bit integer so the multiplication is carried out in 64-bit arithmetic. Casting the whole product would be too late (the overflow would already have occurred), so each factor is cast individually. Non-product bounds (a single size, an offset, a constant) cannot overflow and are left untouched, as are non-transfer expressions. Fixes devitocodes#2777

Address review: replace the ad-hoc _avoid_overflow helper with the existing as_long. as_long only substituted plain Symbols (retrieve_symbols), so it was a no-op on the IndexedPointer size factors (vec->size[i]) of a linearized transfer bound; extend it to retrieve_terminals so Indexed/IndexedPointer leaves are cast too. Keep the cast scoped to Mul products in PragmaTransfer so non-linearized multi-dimensional sections are not needlessly upcast.

…nt directly Per review: as_long already walks the expression args, so the cast() helper and its is_Mul check are unnecessary. Apply as_long to the section extent directly. Output is unchanged: the start bound is always 0/an offset (left as-is) and the extent is the size product that as_long promotes to 64-bit.

gaoflow · 2026-06-01T19:39:38Z

Good point — dropped the cast helper and the is_Mul check in 454434b, applying as_long to the section extent directly:

sections = ''.join([f'[{ccode(i)}:{ccode(as_long(j))}]'
                    for i, j in self.sections])

The generated code is unchanged: the start bound i is always 0/an offset (left as-is, can't overflow) and the extent j is the size product that as_long promotes to 64-bit, e.g. (long)(size[3])*(long)(size[2])*.... I left i uncast rather than wrapping both bounds, since the overflow site (#2777) is the extent product.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compiler: Avoid int32 overflow in linearized host-device transfer size#2939

compiler: Avoid int32 overflow in linearized host-device transfer size#2939
gaoflow wants to merge 3 commits into
devitocodes:mainfrom
gaoflow:fix-2777-transfer-size-overflow

gaoflow commented May 29, 2026

Uh oh!

Uh oh!

Uh oh!

gaoflow commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

gaoflow commented May 29, 2026

Description

Reproduction

Verification

Uh oh!

Uh oh!

Uh oh!

gaoflow commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants