Skip to content

EFT-based compensated arithmetic for spherical geometry (GCA intersection, point-in-face, face bounds)#1513

Open
rajeeja wants to merge 9 commits into
mainfrom
rajeeja/accusphere
Open

EFT-based compensated arithmetic for spherical geometry (GCA intersection, point-in-face, face bounds)#1513
rajeeja wants to merge 9 commits into
mainfrom
rajeeja/accusphere

Conversation

@rajeeja

@rajeeja rajeeja commented May 22, 2026

Copy link
Copy Markdown
Contributor

Closes #1509

Ports the compensated-arithmetic tier from AccuSphGeom into UXarray's Numba spherical geometry stack, replacing cancellation-prone naive cross-product paths with compensated primitives throughout arc intersection, point-in-face, and face bounds.

What changed

  • utils/computing.py — EFT and compensated primitives: two_sum, two_prod, diff_of_products, accucross, accucross_pair, acc_sqrt_re, and fixed-size compensated dot/sum-of-squares helpers. Breaking change: the old cross_fma and dot_fma functions (which depended on the optional pyfma package) are removed. Any code importing those names directly will get an ImportError; use accucross and the _cdp* helpers instead.
  • grid/arcs.py — new _orient3d_on_sphere_value, orient3d_on_sphere, on_minor_arc predicates using compensated arithmetic.
  • grid/intersections.pygca_gca_intersection and gca_const_lat_intersection rewritten as three-layer stacks: pure numerical kernel (L1), integer-mask status layer (L2), UXarray dispatcher (L3). Removed dead _gca_gca_intersection_cartesian shim.
  • grid/point_in_face.py_face_contains_point delegates to a new ray-casting SPIP implementation (_point_in_polygon_sphere) using orient3d_on_sphere instead of the old winding-number via arctan2.
  • grid/bounds.py — new _construct_face_bounds_array_gca path for pure-GCA grids, computing interior arc z-extrema correctly via the compensated kernel. Old path retained for latlon/mixed-edge grids.

What this is not

Not a full port of AccuSphGeom's robustness stack. Excluded: Shewchuk adaptive predicates, Simulation of Simplicity (requires per-vertex global IDs not available in UXarray's polygon representation), geogram exact fallback. This implements only the compensated-arithmetic tier — roughly twice as accurate as naive floating-point on near-tangent cases, with no measurable runtime overhead on intersection kernels.

Performance

Operation vs. naive
gca_gca_intersection ~0% overhead (≈1 µs/call)
gca_const_lat_intersection ~0% overhead (≈1 µs/call)
Grid.bounds (face bounds, cached once) ~40% slower per face
Cross-section / zonal mean ~5% overhead

Accuracy

Measured on the AccuSphGeom baseline suite (31 near-tangent GCA-GCA pairs, angles down to 10⁻⁵°):

Arc angle Naive error EFT error Improvement
≈ 0.000001° 1.07 × 10⁻⁷ 1.24 × 10⁻¹⁶ 861 million×
≈ 0.001° 5.04 × 10⁻¹² 1.14 × 10⁻¹⁶ 44 000×
≈ 5° 8.47 × 10⁻¹⁶ 1.58 × 10⁻¹⁶

Tests

241 new baseline regression tests ported from the AccuSphGeom C++ suite covering near-tangent GCA-GCA pairs, GCA/constant-latitude cases, and spherical point-in-polygon.

@review-notebook-app

Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@rajeeja rajeeja requested a review from hongyuchen1030 May 22, 2026 13:24
Comment thread uxarray/grid/_eft.py Outdated

@hongyuchen1030 hongyuchen1030 May 22, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have an existed implementation for these functions in

def _two_sum(a, b):
. Consider either merge them or only keep one of them

And I would suggest use other name instead of eft here, the term "EFT" specifically refers to "Error-Free" floating point operations, but the AccuCross and diff_of_productthemself is not completely Error-Free, is just more accurate than normal floating point cross (doubling the precision)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: merged this into uxarray/utils/computing.py and renamed the docs/API section to compensated arithmetic. I only call two_sum/two_prod EFT now; the cross-product helpers are described as compensated algorithms.

Comment thread uxarray/grid/bounds.py


@njit(cache=True)
def _generate_lat_lon_bounds_local(face_vertices, z_min, z_max, snap_tol_deg):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a better vectorization, consider keeping the current design in https://github.com/hongyuchen1030/AccuSphGeom/blob/56cbd6e30270ec5845a853ec72ba5b19c9128017/include/accusphgeom/algorithms/lat_lon_bounds.hpp#L107:

It uses lots of masking and avoid branching for computation steps.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: rewrote this path closer to the AccuSphGeom structure. The local bounds computation now uses mask-selection for extrema/snap decisions instead of branching through separate cases.

@hongyuchen1030 hongyuchen1030 Jun 4, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation for this entire function involves lots of new branching and operations that are not existed in the AccuSphGeom

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The remaining branching in _generate_lat_lon_bounds_local and _generate_lat_lon_bounds_pole is UXarray-specific post-kernel formatting, not part of the EFT computation path. These functions convert the z-extrema from _face_location_info into degree lat/lon bounds using UXarray's encoding conventions (antimeridian signaled by lon_min > lon_max, pole-enclosing faces returned as 0/360). AccuSphGeom does not have an equivalent layer since it uses different output conventions. Both functions are now documented as UXarray-specific formatting steps, clearly separated from the EFT kernel.

Comment thread uxarray/grid/bounds.py


@njit(cache=True)
def _generate_lat_lon_bounds_pole(face_vertices, label, z_min, z_max, snap_tol_deg):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, consider keeping the structure here: https://github.com/hongyuchen1030/AccuSphGeom/blob/56cbd6e30270ec5845a853ec72ba5b19c9128017/include/accusphgeom/algorithms/lat_lon_bounds.hpp#L159

use masks instead of branching as much as possible for the vectorization purpose

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: same cleanup here. The polar bounds path now keeps the AccuSphGeom-style local/polar structure and uses mask-selection for the snap logic.

Comment thread uxarray/grid/bounds.py


@njit(cache=True)
def _face_location_info(face_vertices, polar_cap_z):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The branching here will probably slow down the vectorization, consider keeping the structure here:
https://github.com/hongyuchen1030/AccuSphGeom/blob/56cbd6e30270ec5845a853ec72ba5b19c9128017/include/accusphgeom/algorithms/lat_lon_bounds.hpp#L49

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: _face_location_info now follows the AccuSphGeom face-location flow. The per-edge extrema use mask-selection; only the final face classification branches remain.

Comment thread uxarray/grid/intersections.py Outdated
@njit(cache=True)
def _normalize_pair(x_hi, y_hi, z_hi, x_lo, y_lo, z_lo):
"""Normalize an (hi, lo) compensated vector, returning the unit vector and magnitude."""
x = x_hi + x_lo

@hongyuchen1030 hongyuchen1030 May 22, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This _normalize_pair is not utilizing the EFT and it will have the same precision as the direct floating point precision. And the normalization is one of the big killer for the precision.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also if the input are normalized, then both the gca_gca_intersection and the gca_constLat_intersection will return the normalized results already

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And just in case you want an "accurately calculated norm", it is const auto sum = numeric::sum_of_squares_c<T, 3>(v.hi, v.lo); const auto norm = numeric::acc_sqrt_re(sum.hi, sum.lo);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: removed _normalize_pair. The intersection code now keeps compensated normals/intersection vectors, and the baseline near-tangent cases pass.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. _normalize_pair is removed. For _accux_gca, the direction vector is normalized using plain math.sqrt on the collapsed scalars rather than sum_of_squares_c + acc_sqrt_re — we found that passing collapsed scalars as hi/lo pairs to _sum_sq_c3 misrepresents their error structure and actually degrades accuracy on the baseline suite (pair_id=8 goes from <1e-15 to ~7e-10 error). The collapsed norm is sufficient because the unit-sphere error is dominated by the accucross_pair step, not the normalization. This is documented in a comment in _accux_gca.

Comment thread uxarray/grid/intersections.py Outdated
@@ -301,139 +314,198 @@ def _gca_gca_intersection_cartesian(gca_a_xyz, gca_b_xyz):

@njit(cache=True)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider keeping the entire structure from here : https://github.com/hongyuchen1030/AccuSphGeom/blob/e4e13215dd4771b7ef01b7edf81eaf58dd6e6995/include/accusphgeom/constructions/gca_gca_intersection.hpp#L31

These EFT-fused operations are extremely sensitive for each operations and order. The accuracy can only be guaranteed iff following the exact algorithm

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: rewrote GCA-GCA around the AccuSphGeom structure: accucross normals, accucross_pair for the intersection direction, then candidate filtering with on_minor_arc.

Comment thread uxarray/grid/intersections.py Outdated


@njit(cache=True)
def gca_const_lat_intersection(gca_cart, const_z):

@hongyuchen1030 hongyuchen1030 May 22, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The algorithm is almost very stable around floating point precision. Within the current Uxarray Error tolerance, you almost don't have to use any other safety pre-check other than if the input are valid (The relative error bound is 3*\sqrt{1-constz^2}*machine_epsilon as long as it's constZ is at least 10^15 from the equator) . And probably the only check needed Then only check you can make is if the const latitude is at the equator . Again, make sure to follow the entire algorithm structure from below. Such precision is only guaranteed
iff it follows the exact algorithm here
https://github.com/hongyuchen1030/AccuSphGeom/blob/e4e13215dd4771b7ef01b7edf81eaf58dd6e6995/include/accusphgeom/constructions/gca_constlat_intersection.hpp#L33

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: rewrote const-lat around the AccuSphGeom accux_constlat flow and filter invalid candidates after computing them.

Comment thread uxarray/grid/intersections.py Outdated
nx = nx_hi + nx_lo
ny = ny_hi + ny_lo
nz = nz_hi + nz_lo
denom = nx * nx + ny * ny

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Denome should be calculated from
const T denom = s2.hi + s2.lo;
where
const auto s2 = numeric::sum_of_squares_c<T, 2>( {normal.hi[0], normal.hi[1]}, {normal.lo[0], normal.lo[1]});

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: denom now comes from _sum_sq_c2(...) as s2_hi + s2_lo.

Comment thread uxarray/grid/intersections.py Outdated
@@ -301,139 +314,198 @@ def _gca_gca_intersection_cartesian(gca_a_xyz, gca_b_xyz):

@njit(cache=True)
def gca_gca_intersection(gca_a_xyz, gca_b_xyz):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider keeping the entire structure from here https://github.com/hongyuchen1030/AccuSphGeom/blob/e4e13215dd4771b7ef01b7edf81eaf58dd6e6995/include/accusphgeom/constructions/gca_gca_intersection.hpp#L31.

These algorithms are extremely sensitive to the operations and order, the accuracy is only guaranteed iff we follow the exact same algorithm described

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: same GCA-GCA rewrite as above, following the AccuSphGeom operation order much more closely. Added baseline tests for the near-tangent cases too.

Comment thread uxarray/grid/intersections.py Outdated
x2_at_const_z = np.isclose(
x2[2], const_z, rtol=ERROR_TOLERANCE, atol=ERROR_TOLERANCE
)
# 1. Endpoint coincidence with the latitude line.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case is still valid and calculable with the new algorithm. And we can improve the vectorization by only use mask-selection after the intersection is calculated to snap the intersection point with the endpoints

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: removed the endpoint early return. The code computes candidates first, then snaps near-endpoint results afterward.

Comment thread uxarray/grid/intersections.py Outdated
z_max = extreme_gca_z(gca_cart, extreme_type="max")

# Check if the constant latitude is within the GCA range
if not in_between(z_min, const_z, z_max):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this early exist will counter-effect the branching it brings to the vectorization

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: removed the endpoint pre-exits. I kept only invalid denom / negative-discriminant exits.

Comment thread uxarray/grid/intersections.py Outdated
elif p2_intersects_gca:
res[0] = p2
# 4. Solve for the two candidate points on the latitude circle.
r2 = 1.0 - const_z * const_z

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the new EFT-fused algorithmn

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: replaced this with the compensated s2/s3, two_prod, cdp4, and acc_sqrt_re flow from AccuSphGeom.

@rajeeja rajeeja marked this pull request as draft May 22, 2026 21:18
@rajeeja

rajeeja commented May 22, 2026

Copy link
Copy Markdown
Contributor Author

@hongyuchen1030 Thanks for the review. The point-in-polygon predicate and orient3d sign check look good, but I can see now that the intersection functions are basically the old code with a few error-free transformation calls sprinkled in rather than a real port. I'm planning to rewrite gca_gca_intersection and gca_const_lat_intersection from scratch following your C++ implementation exactly. I will be adding the missing pieces:

  • compensated_dot_product
  • sum_of_squares_c
  • acc_sqrt_re
  • The second accucross overload that takes the hi/lo pairs

@rajeeja rajeeja marked this pull request as ready for review May 28, 2026 19:35
@rajeeja rajeeja requested a review from hongyuchen1030 May 28, 2026 19:35
Comment thread uxarray/utils/computing.py Outdated
tiers — an EFT filter (what this module implements), Shewchuk adaptive
predicates for results that fall inside the filter threshold, and a geogram
exact-arithmetic fallback. This port implements only the EFT tier. For
non-degenerate inputs in double precision this is sufficient; callers that

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're technically still unable to claim "For non-degenerate inputs in double precision this is sufficient". We can claim "This is twice more accurate than the direct floating point implementation without computation overhead with vectorization and parralization"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't use any adaptive arithmetic, we cannot claim for robustness here. We can only claim our point-in-face use the new algorithm that are more accurate (if we use any EFT operations)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 002b9fe. Removed all claims of 'sufficient', 'non-degenerate inputs', and 'robustness'. The module docstring now says the compensated routines are roughly twice as accurate as direct floating-point equivalents; robustness against all degenerate inputs would require adding an adaptive predicate or exact-arithmetic fallback tier. Same language used for the point-in-face docstring.

Comment thread uxarray/utils/computing.py Outdated
predicates for results that fall inside the filter threshold, and a geogram
exact-arithmetic fallback. This port implements only the EFT tier. For
non-degenerate inputs in double precision this is sufficient; callers that
need to handle geometrically degenerate inputs (coincident arcs, a query

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These edge cases should be included in the test cases already. These scenarios don't have more risks than the "normal input" here. (They probably have the same risk since the intersection point is a newly constructed point)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. The denom==0 and planar_sq<0 edge cases propagate as inf/NaN through the kernel and are correctly handled by the isfinite mask in the status layer — no special-casing needed. The guards were removed from the kernel in commit 56c6ce1; only the status layer classifies them.

Comment thread uxarray/grid/bounds.py

# Parameter along the arc at which z is extremal (matches C++ get_face_location_info).
denom = (z1 + z2) * (d - 1.0)
a_raw = (z1 * d - z2) / denom if denom != 0.0 else -1.0

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The a = min(max(a_raw, 0.0), 1.0) should be able to prevent the divide by zero here. But we can keep it just in case

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kept as-is. The denom==0 guard sets a_raw=-1 so the clamp produces a=0 (use endpoint), which is safe. Matches your comment — kept just in case, and the guard costs nothing.

Comment thread uxarray/grid/bounds.py
z_max = z_max_candidate
if z_min_candidate < z_min:
z_min = z_min_candidate

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider utilizing the boolean operation from AccuSphHGeom to reduce branching

  const MaskType<T> north_pole_candidate_mask = (face_z_max >= polar_cap_z);
  const MaskType<T> south_pole_candidate_mask = (face_z_min <= -polar_cap_z);
  const MaskType<T> local_mask = !(north_pole_candidate_mask | south_pole_candidate_mask);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in commit 002b9fe. The label computation now uses pure integer multiplication instead of boolean branching: label = north_pole_candidate * _FACE_LOC_NORTH_POLAR + (1 - north_pole_candidate) * south_pole_candidate * _FACE_LOC_SOUTH_POLAR. The unused local variable was also removed.

Comment thread uxarray/grid/bounds.py


@njit(cache=True)
def _lon_bounds_from_vertices(face_vertices):

@hongyuchen1030 hongyuchen1030 Jun 4, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably don't need this work around anymore with the current latlon bounds implementation

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is still needed and cannot be removed. UXarray uses a lon_min > lon_max encoding to signal antimeridian-crossing faces throughout the bounds and cross-section APIs — AccuSphGeom uses a union-of-intervals convention instead. The largest-gap algorithm translates between them. Removed the snap logic (which you flagged separately) but the antimeridian detection itself must stay. Added a docstring explaining this.



@njit(parallel=True, nogil=True, cache=True)
def constant_lat_intersections_no_extreme(lat, edge_node_z, n_edge):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need to seperate the extreme case here, and what does the "extreme" mean here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

constant_lat_intersections_no_extreme and constant_lon_intersections_no_extreme are pre-existing edge screeners — fast O(n) passes over all edges using only endpoint z/lon values, used by Grid.get_edges_at_constant_latitude/longitude before the expensive GCA kernel runs. 'No extreme' means arc interior extrema along the great circle are not considered. These are completely separate from the AccuSphGeom EFT stack and were not changed in this PR. Added a block comment before them to make this clear.

Comment thread uxarray/grid/intersections.py Outdated
n2x = n2x_hi + n2x_lo
n2y = n2y_hi + n2y_lo
n2z = n2z_hi + n2z_lo
if (

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The degeneracy check like this will greatly impact the vectorization performance, any check like this should be isolated from the intersection computation kernel. And the AccuXGCA itself is able to handle extremely short arcs

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Degeneracy checks (denom==0, planar_sq<0) are handled by letting NaN/inf propagate through the kernel — the status layer uses isfinite masks to classify them without any branching in the hot path. The _accux_gca and _accux_constlat kernels contain no if/else guards; only the 1.0/vn if vn!=0.0 else np.inf guard remains, which produces inf so the isfinite mask rejects the candidate without branching.

Comment thread uxarray/grid/intersections.py Outdated
and math.isfinite(vn)
):
# Parallel (coplanar) arcs: check whether endpoints of one lie on the other.
if on_minor_arc(v0, w0, w1):

@hongyuchen1030 hongyuchen1030 Jun 4, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, all these if-else branch will impact the vectorization behavior, that's why the original try_gca_gca_intersection only use mask/boolean operations. The point is, we do not try to prevent the NaN or Divide by zero in the intersection kernel, these errors will be recorded in the "status" so an outside dispatch function will know how to handle each case.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in commits 51c00c2 and 002b9fe. Both _try_gca_gca_intersection and _try_gca_const_lat_intersection now use integer mask arithmetic throughout — no if/else in the hot path. pos_fin = int(isfinite(...)), validity via pos_valid = pos_fin * pos_on_a * pos_on_b, point selection via pos_mask * pos + neg_mask * neg, status via both + none * 2. The only remaining guards wrap on_minor_arc calls to prevent calling it with inf inputs, which is unavoidable.

Comment thread uxarray/grid/intersections.py Outdated
p1_intersects_gca = point_within_gca(p1, gca_cart[0], gca_cart[1])
p2_intersects_gca = point_within_gca(p2, gca_cart[0], gca_cart[1])
if planar_sq < 0.0:
return res

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isolate these if-else branch outside of the computation kernel

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. All branching is now in L3 (gca_gca_intersection / gca_const_lat_intersection) — the dispatcher layer. L1 kernels (_accux_gca, _accux_constlat) are branch-free. L2 (_try_gca_gca_intersection, _try_gca_const_lat_intersection) uses integer mask arithmetic with no if/else in the computational path.

# deduplication in the caller works correctly. Matches Hongyu's suggestion
# of mask-selection to snap after computing rather than branching out early.
_snap_sq = 1e-14 # distance² ≈ (1e-7)² — well above algorithm error (~1e-15)
for xe in (x1, x2):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Snapping involving branching should be isolated outside of the computation kernels. The intersection computation kernels should not include any branching and always stay in the SIMD-packed vectorization friendly form

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. _snap_const_lat_endpoint lives entirely in L3 (gca_const_lat_intersection) — it is called after the kernel and status layer have completed, never inside _accux_constlat or _try_gca_const_lat_intersection. The L1 and L2 layers remain branch-free and kernel-pure.

res[0, 1] = p2[1]
res[0, 2] = p2[2]

return res

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation in

include/accusphgeom/constructions/gca_constlat_intersection.hpp

was intentionally designed with three separate layers, and I think it is important to keep these layers conceptually and structurally separated.

  1. Core computation kernel: accux_constlat

    This is the numerical kernel. It contains the carefully designed AccuSphere algorithm for computing the great-circle-arc and constant-latitude intersection.

    This layer should stay isolated and intact. It is designed to be SIMD-packing friendly, branch-free in the hot path, and easy to reason about for accuracy, performance, and future optimization. This function should not be mixed with high-level filtering, status interpretation, or UXarray-specific logic.

  2. SIMD-friendly batch API: try_gca_constlat_intersection

    This is the API intended for heavy computation.

    It is still SIMD-packing friendly and is designed to compute the full matrix of possible intersection results from the input faces or edge lists. It does not immediately discard invalid intersections. Instead, it returns the computed candidate points together with a status flag indicating whether each result is valid.

    This is the layer we want to use for large-scale vectorized computation, because it keeps the computation uniform. Even invalid candidates are part of the output matrix, and validity is represented by status instead of control flow.

  3. Dispatcher / convenience API: gca_constlat_intersection

    This is the higher-level user-facing dispatcher.

    This layer is allowed to branch. It reads the status returned by try_gca_constlat_intersection, filters out invalid results, and returns only the valid intersection points. This is useful as a lightweight convenience API, especially when the caller wants clean geometry results instead of the full vectorized computation matrix.

The important point is that these three layers serve different purposes.

The core kernel should focus only on accurate numerical computation. The batch API should focus on uniform, SIMD-friendly execution. The dispatcher should handle branching, filtering, and convenience behavior.

Mixing these layers together is not ideal because it makes the heavy computation path harder to vectorize, harder to optimize, and harder to verify numerically. Once filtering, branching, and application-specific logic are pushed into the core kernel or SIMD batch layer, the implementation becomes less predictable and less suitable for large-scale computation.

For this kind of heavy numerical geometry kernel, the best practice is usually:

  • keep the low-level numerical kernel pure, isolated, and branch-minimized;
  • keep the batch/vectorized API uniform and status-based;
  • move branching, filtering, and user-facing convenience behavior to a separate dispatcher layer;
  • avoid mixing UXarray-specific data handling with the core computational algorithm.

So for UXarray integration, I think the preferred workflow should be:

  1. Use try_gca_constlat_intersection for the main large-scale computation.
  2. Preserve the full output matrix and status information during the vectorized computation stage.
  3. Apply filtering only afterward, either through gca_constlat_intersection or through UXarray-side post-processing logic.

That way, we preserve the original AccuSphere design: accurate computation first, SIMD-friendly batch execution second, and lightweight branching/filtering only at the outer layer.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood — the key point is the design separation, not the C++ specifics. We've now split both intersection functions into the three layers you described:

  • _accux_constlat / _accux_gca — pure numerical kernels, no branching, no validity filtering, compensated operations in the exact AccuSphGeom order
  • _try_gca_const_lat_intersection / _try_gca_gca_intersection — batch/status layer using integer mask arithmetic (pos_valid * (1 - neg_valid)) to select candidates without branching in the hot path; returns (point, status, pos, neg)
  • gca_const_lat_intersection / gca_gca_intersection — outer dispatcher that reads status, applies endpoint snapping, and packages UXarray's NaN-filled result format; all UXarray-specific branching lives here



@njit(cache=True)
def _point_in_polygon_sphere(q, polygon):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to implement the tier 3 SoS robustness here: Location detail::point_in_polygon_sphere_impl,https://github.com/hongyuchen1030/AccuSphGeom/blob/e4e13215dd4771b7ef01b7edf81eaf58dd6e6995/src/algorithms/point_in_polygon_sphere.cpp#L359, and https://github.com/hongyuchen1030/AccuSphGeom/blob/e4e13215dd4771b7ef01b7edf81eaf58dd6e6995/src/algorithms/point_in_polygon_sphere.cpp#L345-L357

It's better for us to use the SoS (Simulation of Simplicity) to handle the near-degenerate cases. The vertex id is the vertex index in the data-array, And we will assign the query point q and the end-point r a unique id.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that SoS would give stronger robustness guarantees. The blocker is that AccuSphGeom's SoS implementation (point_in_polygon_sphere_impl) requires a globally unique integer ID for each vertex and for the query point — these IDs are used as a symbolic tiebreaker when orient3d returns exactly zero. UXarray's polygon representation passes vertex coordinates as arrays without stable global IDs, so we cannot implement SoS without a larger API change. We handle near-degenerate rays by perturbing R and restarting the loop (up to 4 retries), which is less elegant but works in practice. This is documented as explicit future work in the _point_in_polygon_sphere docstring.

@rajeeja

rajeeja commented Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

@hongyuchen1030

This does not fully ports AccuSphGeom because we do not have:

  • Shewchuk adaptive predicates,
  • Simulation of Simplicity/global-ID handling,
  • exact/geogram fallback,
  • C++ SIMD mask behavior,
  • exact AccuSphGeom point-in-polygon implementation,
  • exact AccuSphGeom wrapper semantics.

@hongyuchen1030

Copy link
Copy Markdown
Contributor

@hongyuchen1030

This does not fully ports AccuSphGeom because we do not have:

  • Shewchuk adaptive predicates,
  • Simulation of Simplicity/global-ID handling,
  • exact/geogram fallback,
  • C++ SIMD mask behavior,
  • exact AccuSphGeom point-in-polygon implementation,
  • exact AccuSphGeom wrapper semantics.

I understand that UXarray is not directly porting the C++ SIMD mask behavior as-is. My point is that the mask-based workflow is not exclusive to C++.

The key idea is the vectorized computation design: we keep the heavy computation path uniform, compute all candidate results in a batch, and use a status or mask array to indicate which results are valid. In C++, that may correspond to SIMD masks. In Python/NumPy/Numba, the same design can be represented by boolean masks, status arrays, or validity flags.

So the important part is not the specific C++ SIMD implementation detail. The important part is preserving the separation between:

  1. the core numerical kernel,
  2. the vectorized/batch computation layer that returns full results plus status,
  3. the outer dispatcher/filtering layer that only keeps valid points.

You can refer to my earlier reply here for the intended vectorized implementation workflow:

#1513 (comment)

This design pattern is still relevant for Python. Even if UXarray does not use the exact C++ SIMD mask behavior, it should still preserve the same vectorized workflow using Python-side masks or status arrays, rather than mixing branching and filtering into the core computation path.

@rajeeja rajeeja changed the title First pass at EFT-based spherical geometry accuracy (port from AccuSphGeom) EFT-based compensated arithmetic for spherical geometry (GCA intersection, point-in-face, face bounds) Jun 8, 2026
@rajeeja rajeeja requested a review from hongyuchen1030 June 8, 2026 19:56
@rajeeja rajeeja force-pushed the rajeeja/accusphere branch from 002b9fe to 3fe8f31 Compare June 8, 2026 20:44
@rajeeja rajeeja force-pushed the rajeeja/accusphere branch from 3fe8f31 to 94c06fe Compare June 8, 2026 20:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Port new algorithms to python from AccuSphGeom repo

2 participants