EFT-based compensated arithmetic for spherical geometry (GCA intersection, point-in-face, face bounds) by rajeeja · Pull Request #1513 · UXARRAY/uxarray

rajeeja · 2026-05-22T12:00:23Z

Ports the compensated-arithmetic tier from AccuSphGeom into UXarray's Numba spherical geometry stack, replacing cancellation-prone naive cross-product paths with compensated primitives throughout arc intersection, point-in-face, and face bounds.

What changed

utils/computing.py — EFT and compensated primitives: two_sum, two_prod, diff_of_products, accucross, accucross_pair, acc_sqrt_re, and fixed-size compensated dot/sum-of-squares helpers. Breaking change: the old cross_fma and dot_fma functions (which depended on the optional pyfma package) are removed. Any code importing those names directly will get an ImportError; use accucross and the _cdp* helpers instead.
grid/arcs.py — new _orient3d_on_sphere_value, orient3d_on_sphere, on_minor_arc predicates using compensated arithmetic.
grid/intersections.py — gca_gca_intersection and gca_const_lat_intersection rewritten as three-layer stacks: pure numerical kernel (L1), integer-mask status layer (L2), UXarray dispatcher (L3). Removed dead _gca_gca_intersection_cartesian shim.
grid/point_in_face.py — _face_contains_point delegates to a new ray-casting SPIP implementation (_point_in_polygon_sphere) using orient3d_on_sphere instead of the old winding-number via arctan2.
grid/bounds.py — new _construct_face_bounds_array_gca path for pure-GCA grids, computing interior arc z-extrema correctly via the compensated kernel. Old path retained for latlon/mixed-edge grids.

What this is not

Not a full port of AccuSphGeom's robustness stack. Excluded: Shewchuk adaptive predicates, Simulation of Simplicity (requires per-vertex global IDs not available in UXarray's polygon representation), geogram exact fallback. This implements only the compensated-arithmetic tier — roughly twice as accurate as naive floating-point on near-tangent cases, with no measurable runtime overhead on intersection kernels.

Performance

Operation	vs. naive
`gca_gca_intersection`	~0% overhead (≈1 µs/call)
`gca_const_lat_intersection`	~0% overhead (≈1 µs/call)
`Grid.bounds` (face bounds, cached once)	~40% slower per face
Cross-section / zonal mean	~5% overhead

Accuracy

Measured on the AccuSphGeom baseline suite (31 near-tangent GCA-GCA pairs, angles down to 10⁻⁵°):

Arc angle	Naive error	EFT error	Improvement
≈ 0.000001°	1.07 × 10⁻⁷	1.24 × 10⁻¹⁶	861 million×
≈ 0.001°	5.04 × 10⁻¹²	1.14 × 10⁻¹⁶	44 000×
≈ 5°	8.47 × 10⁻¹⁶	1.58 × 10⁻¹⁶	5×

Tests

241 new baseline regression tests ported from the AccuSphGeom C++ suite covering near-tangent GCA-GCA pairs, GCA/constant-latitude cases, and spherical point-in-polygon.

review-notebook-app · 2026-05-22T12:00:29Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

hongyuchen1030 · 2026-05-22T19:00:36Z

We have an existed implementation for these functions in

uxarray/uxarray/utils/computing.py

Line 189 in 735a1dd

def _two_sum(a, b):

. Consider either merge them or only keep one of them

And I would suggest use other name instead of eft here, the term "EFT" specifically refers to "Error-Free" floating point operations, but the AccuCross and diff_of_productthemself is not completely Error-Free, is just more accurate than normal floating point cross (doubling the precision)

Done: merged this into uxarray/utils/computing.py and renamed the docs/API section to compensated arithmetic. I only call two_sum/two_prod EFT now; the cross-product helpers are described as compensated algorithms.

hongyuchen1030 · 2026-05-22T19:23:44Z

+
+
+@njit(cache=True)
+def _generate_lat_lon_bounds_local(face_vertices, z_min, z_max, snap_tol_deg):


For a better vectorization, consider keeping the current design in https://github.com/hongyuchen1030/AccuSphGeom/blob/56cbd6e30270ec5845a853ec72ba5b19c9128017/include/accusphgeom/algorithms/lat_lon_bounds.hpp#L107:

It uses lots of masking and avoid branching for computation steps.

Done: rewrote this path closer to the AccuSphGeom structure. The local bounds computation now uses mask-selection for extrema/snap decisions instead of branching through separate cases.

The implementation for this entire function involves lots of new branching and operations that are not existed in the AccuSphGeom

The remaining branching in _generate_lat_lon_bounds_local and _generate_lat_lon_bounds_pole is UXarray-specific post-kernel formatting, not part of the EFT computation path. These functions convert the z-extrema from _face_location_info into degree lat/lon bounds using UXarray's encoding conventions (antimeridian signaled by lon_min > lon_max, pole-enclosing faces returned as 0/360). AccuSphGeom does not have an equivalent layer since it uses different output conventions. Both functions are now documented as UXarray-specific formatting steps, clearly separated from the EFT kernel.

hongyuchen1030 · 2026-05-22T19:25:29Z

+
+
+@njit(cache=True)
+def _generate_lat_lon_bounds_pole(face_vertices, label, z_min, z_max, snap_tol_deg):


Same here, consider keeping the structure here: https://github.com/hongyuchen1030/AccuSphGeom/blob/56cbd6e30270ec5845a853ec72ba5b19c9128017/include/accusphgeom/algorithms/lat_lon_bounds.hpp#L159

use masks instead of branching as much as possible for the vectorization purpose

Done: same cleanup here. The polar bounds path now keeps the AccuSphGeom-style local/polar structure and uses mask-selection for the snap logic.

hongyuchen1030 · 2026-05-22T19:26:56Z

+
+
+@njit(cache=True)
+def _face_location_info(face_vertices, polar_cap_z):


The branching here will probably slow down the vectorization, consider keeping the structure here:
https://github.com/hongyuchen1030/AccuSphGeom/blob/56cbd6e30270ec5845a853ec72ba5b19c9128017/include/accusphgeom/algorithms/lat_lon_bounds.hpp#L49

Done: _face_location_info now follows the AccuSphGeom face-location flow. The per-edge extrema use mask-selection; only the final face classification branches remain.

hongyuchen1030 · 2026-05-22T19:29:02Z

+@njit(cache=True)
+def _normalize_pair(x_hi, y_hi, z_hi, x_lo, y_lo, z_lo):
+    """Normalize an (hi, lo) compensated vector, returning the unit vector and magnitude."""
+    x = x_hi + x_lo


This _normalize_pair is not utilizing the EFT and it will have the same precision as the direct floating point precision. And the normalization is one of the big killer for the precision.

Also if the input are normalized, then both the gca_gca_intersection and the gca_constLat_intersection will return the normalized results already

And just in case you want an "accurately calculated norm", it is const auto sum = numeric::sum_of_squares_c<T, 3>(v.hi, v.lo); const auto norm = numeric::acc_sqrt_re(sum.hi, sum.lo);

Done: removed _normalize_pair. The intersection code now keeps compensated normals/intersection vectors, and the baseline near-tangent cases pass.

Done. _normalize_pair is removed. For _accux_gca, the direction vector is normalized using plain math.sqrt on the collapsed scalars rather than sum_of_squares_c + acc_sqrt_re — we found that passing collapsed scalars as hi/lo pairs to _sum_sq_c3 misrepresents their error structure and actually degrades accuracy on the baseline suite (pair_id=8 goes from <1e-15 to ~7e-10 error). The collapsed norm is sufficient because the unit-sphere error is dominated by the accucross_pair step, not the normalization. This is documented in a comment in _accux_gca.

hongyuchen1030 · 2026-05-22T19:35:16Z

@@ -301,139 +314,198 @@ def _gca_gca_intersection_cartesian(gca_a_xyz, gca_b_xyz):

 @njit(cache=True)


consider keeping the entire structure from here : https://github.com/hongyuchen1030/AccuSphGeom/blob/e4e13215dd4771b7ef01b7edf81eaf58dd6e6995/include/accusphgeom/constructions/gca_gca_intersection.hpp#L31

These EFT-fused operations are extremely sensitive for each operations and order. The accuracy can only be guaranteed iff following the exact algorithm

Done: rewrote GCA-GCA around the AccuSphGeom structure: accucross normals, accucross_pair for the intersection direction, then candidate filtering with on_minor_arc.

hongyuchen1030 · 2026-05-22T19:41:30Z



 @njit(cache=True)
 def gca_const_lat_intersection(gca_cart, const_z):


The algorithm is almost very stable around floating point precision. Within the current Uxarray Error tolerance, you almost don't have to use any other safety pre-check other than if the input are valid (The relative error bound is 3*\sqrt{1-constz^2}*machine_epsilon as long as it's constZ is at least 10^15 from the equator) . And probably the only check needed Then only check you can make is if the const latitude is at the equator . Again, make sure to follow the entire algorithm structure from below. Such precision is only guaranteed
iff it follows the exact algorithm here
https://github.com/hongyuchen1030/AccuSphGeom/blob/e4e13215dd4771b7ef01b7edf81eaf58dd6e6995/include/accusphgeom/constructions/gca_constlat_intersection.hpp#L33

But you can use the same check as in https://github.com/hongyuchen1030/AccuSphGeom/blob/e4e13215dd4771b7ef01b7edf81eaf58dd6e6995/include/accusphgeom/constructions/gca_gca_intersection.hpp#L51

to remove any invalid points

Done: rewrote const-lat around the AccuSphGeom accux_constlat flow and filter invalid candidates after computing them.

hongyuchen1030 · 2026-05-22T19:44:55Z

+    nx = nx_hi + nx_lo
+    ny = ny_hi + ny_lo
+    nz = nz_hi + nz_lo
+    denom = nx * nx + ny * ny


Denome should be calculated from
const T denom = s2.hi + s2.lo;
where
const auto s2 = numeric::sum_of_squares_c<T, 2>( {normal.hi[0], normal.hi[1]}, {normal.lo[0], normal.lo[1]});

Done: denom now comes from _sum_sq_c2(...) as s2_hi + s2_lo.

hongyuchen1030 · 2026-05-22T19:49:10Z

@@ -301,139 +314,198 @@ def _gca_gca_intersection_cartesian(gca_a_xyz, gca_b_xyz):

 @njit(cache=True)
 def gca_gca_intersection(gca_a_xyz, gca_b_xyz):


Consider keeping the entire structure from here https://github.com/hongyuchen1030/AccuSphGeom/blob/e4e13215dd4771b7ef01b7edf81eaf58dd6e6995/include/accusphgeom/constructions/gca_gca_intersection.hpp#L31.

These algorithms are extremely sensitive to the operations and order, the accuracy is only guaranteed iff we follow the exact same algorithm described

Done: same GCA-GCA rewrite as above, following the AccuSphGeom operation order much more closely. Added baseline tests for the near-tangent cases too.

hongyuchen1030 · 2026-05-22T19:55:30Z

-    x2_at_const_z = np.isclose(
-        x2[2], const_z, rtol=ERROR_TOLERANCE, atol=ERROR_TOLERANCE
-    )
+    # 1. Endpoint coincidence with the latitude line.


This case is still valid and calculable with the new algorithm. And we can improve the vectorization by only use mask-selection after the intersection is calculated to snap the intersection point with the endpoints

Done: removed the endpoint early return. The code computes candidates first, then snaps near-endpoint results afterward.

hongyuchen1030 · 2026-05-22T19:56:12Z

    z_max = extreme_gca_z(gca_cart, extreme_type="max")
-
-    # Check if the constant latitude is within the GCA range
    if not in_between(z_min, const_z, z_max):


I am not sure if this early exist will counter-effect the branching it brings to the vectorization

Done: removed the endpoint pre-exits. I kept only invalid denom / negative-discriminant exits.

hongyuchen1030 · 2026-05-22T19:56:41Z

-    elif p2_intersects_gca:
-        res[0] = p2
+    # 4. Solve for the two candidate points on the latitude circle.
+    r2 = 1.0 - const_z * const_z


This is not the new EFT-fused algorithmn

Done: replaced this with the compensated s2/s3, two_prod, cdp4, and acc_sqrt_re flow from AccuSphGeom.

rajeeja · 2026-05-22T21:19:32Z

@hongyuchen1030 Thanks for the review. The point-in-polygon predicate and orient3d sign check look good, but I can see now that the intersection functions are basically the old code with a few error-free transformation calls sprinkled in rather than a real port. I'm planning to rewrite gca_gca_intersection and gca_const_lat_intersection from scratch following your C++ implementation exactly. I will be adding the missing pieces:

compensated_dot_product
sum_of_squares_c
acc_sqrt_re
The second accucross overload that takes the hi/lo pairs

hongyuchen1030 · 2026-06-04T02:09:46Z

+tiers — an EFT filter (what this module implements), Shewchuk adaptive
+predicates for results that fall inside the filter threshold, and a geogram
+exact-arithmetic fallback. This port implements only the EFT tier. For
+non-degenerate inputs in double precision this is sufficient; callers that


We're technically still unable to claim "For non-degenerate inputs in double precision this is sufficient". We can claim "This is twice more accurate than the direct floating point implementation without computation overhead with vectorization and parralization"

If we don't use any adaptive arithmetic, we cannot claim for robustness here. We can only claim our point-in-face use the new algorithm that are more accurate (if we use any EFT operations)

Fixed in commit 002b9fe. Removed all claims of 'sufficient', 'non-degenerate inputs', and 'robustness'. The module docstring now says the compensated routines are roughly twice as accurate as direct floating-point equivalents; robustness against all degenerate inputs would require adding an adaptive predicate or exact-arithmetic fallback tier. Same language used for the point-in-face docstring.

hongyuchen1030 · 2026-06-04T02:12:10Z

+predicates for results that fall inside the filter threshold, and a geogram
+exact-arithmetic fallback. This port implements only the EFT tier. For
+non-degenerate inputs in double precision this is sufficient; callers that
+need to handle geometrically degenerate inputs (coincident arcs, a query


These edge cases should be included in the test cases already. These scenarios don't have more risks than the "normal input" here. (They probably have the same risk since the intersection point is a newly constructed point)

Agreed. The denom==0 and planar_sq<0 edge cases propagate as inf/NaN through the kernel and are correctly handled by the isfinite mask in the status layer — no special-casing needed. The guards were removed from the kernel in commit 56c6ce1; only the status layer classifies them.

hongyuchen1030 · 2026-06-04T02:23:15Z

+
+        # Parameter along the arc at which z is extremal (matches C++ get_face_location_info).
+        denom = (z1 + z2) * (d - 1.0)
+        a_raw = (z1 * d - z2) / denom if denom != 0.0 else -1.0


The a = min(max(a_raw, 0.0), 1.0) should be able to prevent the divide by zero here. But we can keep it just in case

Kept as-is. The denom==0 guard sets a_raw=-1 so the clamp produces a=0 (use endpoint), which is safe. Matches your comment — kept just in case, and the guard costs nothing.

hongyuchen1030 · 2026-06-04T02:24:46Z

+            z_max = z_max_candidate
+        if z_min_candidate < z_min:
+            z_min = z_min_candidate
+


Consider utilizing the boolean operation from AccuSphHGeom to reduce branching

const MaskType<T> north_pole_candidate_mask = (face_z_max >= polar_cap_z); const MaskType<T> south_pole_candidate_mask = (face_z_min <= -polar_cap_z); const MaskType<T> local_mask = !(north_pole_candidate_mask | south_pole_candidate_mask);

Done in commit 002b9fe. The label computation now uses pure integer multiplication instead of boolean branching: label = north_pole_candidate * _FACE_LOC_NORTH_POLAR + (1 - north_pole_candidate) * south_pole_candidate * _FACE_LOC_SOUTH_POLAR. The unused local variable was also removed.

hongyuchen1030 · 2026-06-04T02:26:55Z

+
+
+@njit(cache=True)
+def _lon_bounds_from_vertices(face_vertices):


We probably don't need this work around anymore with the current latlon bounds implementation

This function is still needed and cannot be removed. UXarray uses a lon_min > lon_max encoding to signal antimeridian-crossing faces throughout the bounds and cross-section APIs — AccuSphGeom uses a union-of-intervals convention instead. The largest-gap algorithm translates between them. Removed the snap logic (which you flagged separately) but the antimeridian detection itself must stay. Added a docstring explaining this.

hongyuchen1030 · 2026-06-04T02:38:10Z



 @njit(parallel=True, nogil=True, cache=True)
 def constant_lat_intersections_no_extreme(lat, edge_node_z, n_edge):


Why we need to seperate the extreme case here, and what does the "extreme" mean here

constant_lat_intersections_no_extreme and constant_lon_intersections_no_extreme are pre-existing edge screeners — fast O(n) passes over all edges using only endpoint z/lon values, used by Grid.get_edges_at_constant_latitude/longitude before the expensive GCA kernel runs. 'No extreme' means arc interior extrema along the great circle are not considered. These are completely separate from the AccuSphGeom EFT stack and were not changed in this PR. Added a block comment before them to make this clear.

hongyuchen1030 · 2026-06-04T02:41:47Z

+    n2x = n2x_hi + n2x_lo
+    n2y = n2y_hi + n2y_lo
+    n2z = n2z_hi + n2z_lo
+    if (


The degeneracy check like this will greatly impact the vectorization performance, any check like this should be isolated from the intersection computation kernel. And the AccuXGCA itself is able to handle extremely short arcs

Done. Degeneracy checks (denom==0, planar_sq<0) are handled by letting NaN/inf propagate through the kernel — the status layer uses isfinite masks to classify them without any branching in the hot path. The _accux_gca and _accux_constlat kernels contain no if/else guards; only the 1.0/vn if vn!=0.0 else np.inf guard remains, which produces inf so the isfinite mask rejects the candidate without branching.

hongyuchen1030 · 2026-06-04T02:44:28Z

+        and math.isfinite(vn)
+    ):
+        # Parallel (coplanar) arcs: check whether endpoints of one lie on the other.
+        if on_minor_arc(v0, w0, w1):


Again, all these if-else branch will impact the vectorization behavior, that's why the original try_gca_gca_intersection only use mask/boolean operations. The point is, we do not try to prevent the NaN or Divide by zero in the intersection kernel, these errors will be recorded in the "status" so an outside dispatch function will know how to handle each case.

Done in commits 51c00c2 and 002b9fe. Both _try_gca_gca_intersection and _try_gca_const_lat_intersection now use integer mask arithmetic throughout — no if/else in the hot path. pos_fin = int(isfinite(...)), validity via pos_valid = pos_fin * pos_on_a * pos_on_b, point selection via pos_mask * pos + neg_mask * neg, status via both + none * 2. The only remaining guards wrap on_minor_arc calls to prevent calling it with inf inputs, which is unavoidable.

hongyuchen1030 · 2026-06-04T02:47:24Z

-    p1_intersects_gca = point_within_gca(p1, gca_cart[0], gca_cart[1])
-    p2_intersects_gca = point_within_gca(p2, gca_cart[0], gca_cart[1])
+    if planar_sq < 0.0:
+        return res


Isolate these if-else branch outside of the computation kernel

Done. All branching is now in L3 (gca_gca_intersection / gca_const_lat_intersection) — the dispatcher layer. L1 kernels (_accux_gca, _accux_constlat) are branch-free. L2 (_try_gca_gca_intersection, _try_gca_const_lat_intersection) uses integer mask arithmetic with no if/else in the computational path.

hongyuchen1030 · 2026-06-04T02:49:06Z

+    #     deduplication in the caller works correctly.  Matches Hongyu's suggestion
+    #     of mask-selection to snap after computing rather than branching out early.
+    _snap_sq = 1e-14  # distance² ≈ (1e-7)² — well above algorithm error (~1e-15)
+    for xe in (x1, x2):


Snapping involving branching should be isolated outside of the computation kernels. The intersection computation kernels should not include any branching and always stay in the SIMD-packed vectorization friendly form

Done. _snap_const_lat_endpoint lives entirely in L3 (gca_const_lat_intersection) — it is called after the kernel and status layer have completed, never inside _accux_constlat or _try_gca_const_lat_intersection. The L1 and L2 layers remain branch-free and kernel-pure.

hongyuchen1030 · 2026-06-04T02:59:18Z

+        res[0, 1] = p2[1]
+        res[0, 2] = p2[2]

    return res


The implementation in

include/accusphgeom/constructions/gca_constlat_intersection.hpp

was intentionally designed with three separate layers, and I think it is important to keep these layers conceptually and structurally separated.

Core computation kernel: accux_constlat

This is the numerical kernel. It contains the carefully designed AccuSphere algorithm for computing the great-circle-arc and constant-latitude intersection.

This layer should stay isolated and intact. It is designed to be SIMD-packing friendly, branch-free in the hot path, and easy to reason about for accuracy, performance, and future optimization. This function should not be mixed with high-level filtering, status interpretation, or UXarray-specific logic.

SIMD-friendly batch API: try_gca_constlat_intersection

This is the API intended for heavy computation.

It is still SIMD-packing friendly and is designed to compute the full matrix of possible intersection results from the input faces or edge lists. It does not immediately discard invalid intersections. Instead, it returns the computed candidate points together with a status flag indicating whether each result is valid.

This is the layer we want to use for large-scale vectorized computation, because it keeps the computation uniform. Even invalid candidates are part of the output matrix, and validity is represented by status instead of control flow.

Dispatcher / convenience API: gca_constlat_intersection

This is the higher-level user-facing dispatcher.

This layer is allowed to branch. It reads the status returned by try_gca_constlat_intersection, filters out invalid results, and returns only the valid intersection points. This is useful as a lightweight convenience API, especially when the caller wants clean geometry results instead of the full vectorized computation matrix.

The important point is that these three layers serve different purposes.

The core kernel should focus only on accurate numerical computation. The batch API should focus on uniform, SIMD-friendly execution. The dispatcher should handle branching, filtering, and convenience behavior.

Mixing these layers together is not ideal because it makes the heavy computation path harder to vectorize, harder to optimize, and harder to verify numerically. Once filtering, branching, and application-specific logic are pushed into the core kernel or SIMD batch layer, the implementation becomes less predictable and less suitable for large-scale computation.

For this kind of heavy numerical geometry kernel, the best practice is usually:

keep the low-level numerical kernel pure, isolated, and branch-minimized;

keep the batch/vectorized API uniform and status-based;

move branching, filtering, and user-facing convenience behavior to a separate dispatcher layer;

avoid mixing UXarray-specific data handling with the core computational algorithm.

So for UXarray integration, I think the preferred workflow should be:

Use try_gca_constlat_intersection for the main large-scale computation.

Preserve the full output matrix and status information during the vectorized computation stage.

Apply filtering only afterward, either through gca_constlat_intersection or through UXarray-side post-processing logic.

That way, we preserve the original AccuSphere design: accurate computation first, SIMD-friendly batch execution second, and lightweight branching/filtering only at the outer layer.

Understood — the key point is the design separation, not the C++ specifics. We've now split both intersection functions into the three layers you described:

_accux_constlat / _accux_gca — pure numerical kernels, no branching, no validity filtering, compensated operations in the exact AccuSphGeom order

_try_gca_const_lat_intersection / _try_gca_gca_intersection — batch/status layer using integer mask arithmetic (pos_valid * (1 - neg_valid)) to select candidates without branching in the hot path; returns (point, status, pos, neg)

gca_const_lat_intersection / gca_gca_intersection — outer dispatcher that reads status, applies endpoint snapping, and packages UXarray's NaN-filled result format; all UXarray-specific branching lives here

hongyuchen1030 · 2026-06-04T03:25:12Z

+
+
+@njit(cache=True)
+def _point_in_polygon_sphere(q, polygon):


We should be able to implement the tier 3 SoS robustness here: Location detail::point_in_polygon_sphere_impl,https://github.com/hongyuchen1030/AccuSphGeom/blob/e4e13215dd4771b7ef01b7edf81eaf58dd6e6995/src/algorithms/point_in_polygon_sphere.cpp#L359, and https://github.com/hongyuchen1030/AccuSphGeom/blob/e4e13215dd4771b7ef01b7edf81eaf58dd6e6995/src/algorithms/point_in_polygon_sphere.cpp#L345-L357

It's better for us to use the SoS (Simulation of Simplicity) to handle the near-degenerate cases. The vertex id is the vertex index in the data-array, And we will assign the query point q and the end-point r a unique id.

Agreed that SoS would give stronger robustness guarantees. The blocker is that AccuSphGeom's SoS implementation (point_in_polygon_sphere_impl) requires a globally unique integer ID for each vertex and for the query point — these IDs are used as a symbolic tiebreaker when orient3d returns exactly zero. UXarray's polygon representation passes vertex coordinates as arrays without stable global IDs, so we cannot implement SoS without a larger API change. We handle near-degenerate rays by perturbing R and restarting the loop (up to 4 retries), which is less elegant but works in practice. This is documented as explicit future work in the _point_in_polygon_sphere docstring.

rajeeja · 2026-06-04T18:06:25Z

@hongyuchen1030

This does not fully ports AccuSphGeom because we do not have:

Shewchuk adaptive predicates,
Simulation of Simplicity/global-ID handling,
exact/geogram fallback,
C++ SIMD mask behavior,
exact AccuSphGeom point-in-polygon implementation,
exact AccuSphGeom wrapper semantics.

hongyuchen1030 · 2026-06-04T19:01:48Z

@hongyuchen1030

This does not fully ports AccuSphGeom because we do not have:

Shewchuk adaptive predicates,

Simulation of Simplicity/global-ID handling,

exact/geogram fallback,

C++ SIMD mask behavior,

exact AccuSphGeom point-in-polygon implementation,

exact AccuSphGeom wrapper semantics.

I understand that UXarray is not directly porting the C++ SIMD mask behavior as-is. My point is that the mask-based workflow is not exclusive to C++.

The key idea is the vectorized computation design: we keep the heavy computation path uniform, compute all candidate results in a batch, and use a status or mask array to indicate which results are valid. In C++, that may correspond to SIMD masks. In Python/NumPy/Numba, the same design can be represented by boolean masks, status arrays, or validity flags.

So the important part is not the specific C++ SIMD implementation detail. The important part is preserving the separation between:

the core numerical kernel,
the vectorized/batch computation layer that returns full results plus status,
the outer dispatcher/filtering layer that only keeps valid points.

You can refer to my earlier reply here for the intended vectorized implementation workflow:

#1513 (comment)

This design pattern is still relevant for Python. Even if UXarray does not use the exact C++ SIMD mask behavior, it should still preserve the same vectorized workflow using Python-side masks or status arrays, rather than mixing branching and filtering into the core computation path.

Closes #1509

…rite intersections, add 241 baseline testsgit status! - most came from accusphere

…tion docstring

…us/mask, dispatcher

…r_sq<0

rajeeja requested a review from hongyuchen1030 May 22, 2026 13:24

hongyuchen1030 reviewed May 22, 2026

View reviewed changes

hongyuchen1030 requested changes May 22, 2026

View reviewed changes

rajeeja marked this pull request as draft May 22, 2026 21:18

rajeeja marked this pull request as ready for review May 28, 2026 19:35

rajeeja requested a review from hongyuchen1030 May 28, 2026 19:35

hongyuchen1030 reviewed Jun 4, 2026

View reviewed changes

rajeeja changed the title ~~First pass at EFT-based spherical geometry accuracy (port from AccuSphGeom)~~ EFT-based compensated arithmetic for spherical geometry (GCA intersection, point-in-face, face bounds) Jun 8, 2026

rajeeja requested a review from hongyuchen1030 June 8, 2026 19:56

rajeeja force-pushed the rajeeja/accusphere branch from 002b9fe to 3fe8f31 Compare June 8, 2026 20:44

rajeeja added 9 commits June 8, 2026 15:47

Port AccuSphGeom EFT algorithms and add spherical geometry user guide

878e075

Closes #1509

Fix pre-commit: remove unused imports and variable

1bbf349

Fix pre-commit: split semicolons in notebook cells

1034a18

Address Hongyu's review: port AccuSphGeom compensated arithmetic, rew…

fb74e75

…rite intersections, add 241 baseline testsgit status! - most came from accusphere

Fix RTD: remove RST-invalid numbered list from gca_const_lat_intersec…

256b347

…tion docstring

Fix RTD notebook kernel metadata

e8999e6

Address AccuSphGeom review feedback

0220276

Separate intersection kernels into three layers: numerical core, stat…

0fe327c

…us/mask, dispatcher

Fix NaN/Inf propagation in intersection kernels for denom=0 and plana…

94c06fe

…r_sq<0

rajeeja force-pushed the rajeeja/accusphere branch from 3fe8f31 to 94c06fe Compare June 8, 2026 20:47



		@njit(cache=True)
		def _generate_lat_lon_bounds_local(face_vertices, z_min, z_max, snap_tol_deg):



		@njit(cache=True)
		def _generate_lat_lon_bounds_pole(face_vertices, label, z_min, z_max, snap_tol_deg):



		@njit(cache=True)
		def _face_location_info(face_vertices, polar_cap_z):

		@@ -301,139 +314,198 @@ def _gca_gca_intersection_cartesian(gca_a_xyz, gca_b_xyz):

		@njit(cache=True)



		@njit(cache=True)
		def gca_const_lat_intersection(gca_cart, const_z):

		@@ -301,139 +314,198 @@ def _gca_gca_intersection_cartesian(gca_a_xyz, gca_b_xyz):

		@njit(cache=True)
		def gca_gca_intersection(gca_a_xyz, gca_b_xyz):



		@njit(cache=True)
		def _lon_bounds_from_vertices(face_vertices):



		@njit(parallel=True, nogil=True, cache=True)
		def constant_lat_intersections_no_extreme(lat, edge_node_z, n_edge):

Conversation

rajeeja commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changed

What this is not

Performance

Accuracy

Tests

Uh oh!

review-notebook-app Bot commented May 22, 2026

Uh oh!

hongyuchen1030 May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hongyuchen1030 Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hongyuchen1030 May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hongyuchen1030 May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rajeeja commented May 22, 2026

rajeeja commented May 22, 2026 •

edited

Loading

hongyuchen1030 May 22, 2026 •

edited

Loading

hongyuchen1030 Jun 4, 2026 •

edited

Loading

hongyuchen1030 May 22, 2026 •

edited

Loading

hongyuchen1030 May 22, 2026 •

edited

Loading

hongyuchen1030 Jun 4, 2026 •

edited

Loading