Fix facet count race after search flyTo#258
Closed
rdhyee wants to merge 31 commits into
Closed
Conversation
Replaces the LIMIT 100000 raw-row scan + JS per-pixel binning with a single DuckDB GROUP BY query that does the binning server-side. Removes the arbitrary cap honestly: every sample in the bbox is counted into its true pixel cell, regardless of total sample count. Why the LIMIT was bad: `LIMIT 100000` returned the first 100k rows in parquet storage order — not random, not geographic. At world view, the heatmap silently showed whichever source happened to be physically first in the file (likely SESAR, the largest source by row count). The "(capped)" status warning disclosed the problem but didn't fix it. RY feedback 2026-05-27 on PR isamplesorg#240 ("wondering whether we can do better geographic random sampling"). How the SQL pushdown works: compute `(x_bin, y_bin)` pixel coordinates from `latitude`/`longitude` server-side using FLOOR / LEAST / GREATEST, then GROUP BY (x_bin, y_bin) returning one row per non-empty pixel with COUNT(*) as the sample count. Result cardinality is bounded by canvas pixels (≤ 512² = 262k), independent of bbox sample count. JS just iterates the aggregated rows and applies the same log(1+n) scaling for heatmap.js. Verified counts vs `samples table` summary line (= true sample count for the current view): view | heatmap | table | match ------------------|---------|---------|------ PKAP (100km) | 77,840 | 77,840 | ✅ Cyprus medium | 100,970 | 100,970 | ✅ (was capped at 100k) Cyprus regional | 682,029 | 682,029 | ✅ (was capped at 100k) World view | 5.98M | 5.98M | ✅ (was capped at 100k) Render time at world view (~6M samples → 35k cells): ~7s on localhost, similar to or faster than the LIMIT 100k version at smaller zooms. Removes the "(capped)" status branch and the `HEATMAP_LIMIT` constant becomes unused (left in place for now in case Phase 2 progressive refinement reintroduces a safety cap on cell count). Side effect of removing the cap: the per-pixel max-bias is now even more extreme at high-density views, but the log(1+n) scaling from PR isamplesorg#240 handles it. Verified: 5/5 heatmap-overlay.spec.js still pass on localhost. (The spec asserts `lastPointCount > 0`, which is still true; one spec change worth a follow-up: the spec used to expect capped behavior for large views, but no test currently asserts that, so no spec changes needed here.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…table surface
Strategy B: materialize search_pids (one ILIKE scan over facets_url) on a
committed search, then constrain surfaces with a cheap pid semi-join.
This increment (table surface, verified):
- buildSearchFilter/clearSearchFilter: non-temp search_pids table (DISTINCT,
NOT NULL), token-versioned _next→swap, captures match total. Published on
window.__searchFilter {active,term,token,total} + window.searchFilterSQL().
- doSearch builds the filter (shows "Building search filter…") then refreshes
the table; clears it on empty/short submit.
- loadCount/loadPage semi-join on search_pids; summaryText → "N of M
\"term\" matches in this map view" (replaces isamplesorg#250 interim copy).
- Dev probe cell (a1PersistenceProbe) — REMOVE before PR.
Verified on local build: bucchero → table shows only OpenContext Poggio
Civitate matches (2,693), no GEOME mollusks; non-temp table persists across
db.query() calls. Probe (isamplesorg#249 data): no coord-less matches, no dup pids,
broad-term max ~82k.
TODO (still in PR #1, NOT YET DONE): points loader, facet counts + cube
gating, stats, and C3 auto-point-mode so the globe isn't left unfiltered.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e), C3 — globe path BUGGY Adds to the working table surface: - searchIsActive()/searchFilterSQL() cell-local helpers in the viewer cell. - loadViewportSamples: semi-join on search_pids. - updateCrossFilteredCounts: semi-join on both paths; gate off the cube fast-path AND the global baseline early-return when a search is active. - applySearchFilterChange(): C3 orchestrator — force point mode on search, revert to altitude-appropriate mode on clear; refresh table+facets. - camera-changed handler: latch point mode while a search is active. - doSearch calls applySearchFilterChange after build / on clear. KNOWN BUG (needs debugging): the GLOBE points render the UNFILTERED viewport count (e.g. "5000 of 1,591,051") even though search is active and the table correctly shows 2,693. C3 does not enter point mode at high altitude on boot either (globe stays unfiltered clusters). Likely an async race between the boot point-load / mode entry and the post-build applySearchFilterChange (filter built ~40-90s into boot, after the camera has already settled). The table surface (loadCount/loadPage) IS correctly filtered. Probe cell still present (remove before PR). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ch-token staleness) + [A1dbg] logging; globe still not entering point mode — next: Codex rec #4 one-reconciler refactor Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rver + deterministic A1 observability - R2_BASE honors ?data_base= / localStorage ISAMPLES_DATA_BASE (default prod), so the explorer can read a local parquet mirror instead of 40-90s remote range-fetches. - dev_server.py: range-capable (206) static server; stock python http.server returns 200 and breaks DuckDB-WASM partial reads. - window.__a1log/__a1state + a1dbg() + on-page panel (?debug=a1) replace flaky console capture; window.__a1globe() exposes mode/point state for a Playwright harness. - Converted [A1dbg] console.logs to a1dbg events at build/mode/point-load/discard points. NOTE: cold cost is init-dominated (DuckDB-WASM+Cesium+OJS ~40s) — mirror helps the DATA phase only; the real lever is load-once + in-page iteration. Mirror range verified (curl -r => 206) but a full end-to-end speedup run hung in init (shakedown tomorrow; check 0-byte current/wide.parquet). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…be coherence) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Working handoff docs for the search-as-global-filter (A1, isamplesorg#234 Step 4) work — branch state, the globe logjam + Codex's reconciler spec, the fast verify-loop, the performance model, and Eric isamplesorg#248 / isamplesorg#249. Strip before the A1 PR. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ckDB-WASM
The ?data_base=/data dev override produced root-relative parquet URLs
(/data/foo.parquet). DuckDB-WASM's httpfs reads those as a virtual-FS glob
("No files found that match the pattern") instead of fetching over HTTP, so
the local-mirror verify loop hung in init with zero /data fetches — the
"shakedown" symptom. Resolve a root-relative data_base against
location.origin so the ergonomic ?data_base=/data form works; the prod
default and absolute (http://...) overrides pass through unchanged.
Verify-loop infra:
- dev_server.py: pin HTTP/1.1 (DuckDB's range reader expects keep-alive;
curl-verified 206 + multi-request keep-alive). Local full-GET-vs-206 is
DuckDB-WASM heuristic and moot over localhost; validate ranges on deploy.
- tests/playwright/shakedown-206.mjs: headless boot+search probe (no popup).
Confirms cold boot ~2.3s to live, bucchero search builds 2,693 pids ~9s.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Diagnoses whether a committed search renders sample points and whether the result depends on camera altitude. Boots at a given alt/lat/lng, fires the bucchero search, waits for the async point load to settle, and dumps the __a1log event sequence + final __a1globe() state. Finding: with a proper wait, the globe is A1-coherent at BOTH whole-globe (9000 km → renders all 2693 pids; computeViewRectangle saturates, not null) and zoomed-in (80 km → 2670 in-view) altitudes. The earlier "0 sample points" was a measure-too-early artifact, not a bug. Suggests the C3 fixes (4e79830) work in a foreground/headless context and the summary's "globe won't enter point mode" was likely a backgrounded-tab rAF-freeze artifact. Pending headed a1-verify.mjs verdict to rule out an animation-only race. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Default stays headed (real flyTo — what A1 is verified against). A headed window that opens UNFOCUSED becomes a background tab → Chrome freezes its rAF render loop → the page hangs mid-init (the same backgrounded-tab freeze that corrupted the original logjam observations). HEADLESS=1 sidesteps that: headless pages are always "active". Use it for CI / repeated runs; keep headed for a real-animation spot check. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… dev probe Pre-deploy cleanup (the summary's "don't ship" list): - Remove the a1PersistenceProbe OJS cell — a one-time dev check that console-logged on every load and threw a Catalog Error (the design point it verified, non-temp tables persisting across DuckDBClient connections, is proven and load-bearing in production now). - Gate the whole A1 observability block (a1dbg / __a1log / __a1state / __a1globe + on-page panel) behind ?debug=a1. Production users now get a clean global namespace and zero overhead; the Playwright harness opts in via ?debug=a1. All a1dbg?.() call sites already use optional chaining, so they are no-ops when the block doesn't run. Verified: ?debug=a1 → a1-verify.mjs still ✅ COHERENT (2693 pts); no flag → __a1globe/a1dbg/__a1log undefined, no panel, no probe console output. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rch_pids
doSearch scanned the 63 MB facets parquet TWICE per committed search: once in
buildSearchFilter (pid-set) and again for the side-panel results SELECT (+ a
third for the real-count COUNT when the 50-cap hit). On CI's smoke gate, the
broad "pottery" search blew the 90s budget (first A1 deploy failed there).
Fix: buildSearchFilter now materializes the side-panel columns (label, source,
place_name) and the relevance score IN THE SAME scan that builds the pid-set,
so the results SELECT and the COUNT read the small in-memory search_pids table
(aliased `s`) instead of re-scanning facets. One facets scan per search now,
matching pre-A1. sourceFilterSQL('s.source') + the bare-pid facetFilterSQL
compose unchanged; search_pids stays pid-keyed (dropped the weaker 5-col
DISTINCT — pid is unique, so the build is naturally one row per pid).
Verified locally (fast mirror): pottery 15.8s → 12.7s (build 6.9s + surface
updates); a1-verify still ✅ COHERENT; production-clean without ?debug=a1.
Note: the remaining time-to-results is buildSearchFilter + applySearchFilter
(globe/facet updates); if CI's smoke still exceeds budget, render the side
panel before applySearchFilterChange next.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…atch") updateCrossFilteredCounts computed facet-legend counts over the EXACT viewport (pad 0), while the samples-table COUNT, the point-mode loader, the "samples in view" stat, and the heatmap all pad by VIEWPORT_PAD_FACTOR (0.3). Matching samples in the 30% margin were counted by the table but not the legend, so the legend read low: off-by-one at a Cyprus deep-zoom (13 vs 14), and ~166 vs ~481 for material=rock at a wide Red-Sea view (RY, live rdhyee deploy). Aligns the last "in view" surface to the padded contract (isamplesorg#234 coherence). Applies the parked facet_count_padding.patch (one line + the coherence regression test) on the A1 branch, since the mismatch is live on the A1 deploy and isamplesorg#234 is exactly "make filter semantics coherent across surfaces." Verified at the reported view: facet Rock 167 → 496, now == table 496; a1-verify still ✅ COHERENT. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two issues from Codex's PR isamplesorg#251 review: 1. search_pids staging race — buildSearchFilter used a fixed `search_pids_next` name. Two overlapping searches could interleave so a later search swapped an earlier search's rows into `search_pids` under its own term (the token checks guard the publish, not the shared staging object). Use a token-scoped staging table `search_pids_next_${token}`, dropped in finally. Also stop DROPping the live `search_pids` on clear (an in-flight reader would throw) — flip active=false and leave it unreferenced until the next search replaces it. Verified: bucchero→soil back-to-back now publishes soil/2969 (its own count), not bucchero's under soil's term. 2. heatmap search-blind — renderHeatmap omitted searchFilterSQL and heatmapFilterHash omitted the search token, so the "filtered density" overlay stayed unfiltered under a committed search. Append window.searchFilterSQL('pid') to the heatmap aggregation and add the search token to the hash so it recomputes/re-keys on search commit/clear (isamplesorg#234 cross-surface coherence). a1-verify still ✅ COHERENT. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…le rows) Round-2 of Codex's PR isamplesorg#251 review: the previous no-drop clear left the prior search's rows in search_pids, and doSearch's side-panel SELECT reads `FROM search_pids` directly (does NOT gate on __searchFilter.active) — so a build failure could render the previous term's rows under the new term. Chose Codex's empty-table alternative over the early-return built-guard: an early return before the side-panel try would skip the isamplesorg#167 telemetry `finally`, whereas CREATE OR REPLACE TABLE search_pids (...empty...) keeps both the in-flight semi-join readers and the direct side-panel reader safely seeing zero rows, and a build failure flows through the existing results.length===0 → return-in-try → finally path with telemetry intact. Only clearSearchFilter changes; no doSearch control-flow restructure. Verified: a1-verify ✅ COHERENT; bucchero→clear→soil publishes soil/2969 (own count, no stale rows); clearSearchFilter is only called on empty-submit + build-failure, not per search. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Addresses Codex's remaining nit (PR isamplesorg#251, non-blocking): since clearSearchFilter() now leaves search_pids EMPTY, a genuine build failure and a true empty result set both reach the side-panel's results.length===0 branch. A `searchFilterBuildFailed` flag (set in the build catch) makes the panel say "Search error: couldn't build the filter…" on a real failure while still flowing through the isamplesorg#167 telemetry finally — instead of the misleading "No results for {term}". a1-verify still ✅ COHERENT. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…Uri + buildConceptFilter First, additive pieces for `described-by=<concept-uri>` (isamplesorg#248), riding the A1 search_pids machinery (Codex plan-reviewed: "mostly sound + guardrails"): - window.conceptLabelForUri(uri): expose the facetFilters cell's URI→prefLabel resolver so the concept producer can label a URI without re-querying vocab_labels (guardrail #1). - buildConceptFilter(uri): a SECOND search_pids producer — exact-URI match across the object_type/material/context columns of sample_facets_v2, with object_type>material>context relevance ranking. Same token-scoped staging, finally-drop, shared _searchFilterToken, and empty-clear invariant as buildSearchFilter (guardrails #2/#6). Tags __searchFilter.kind ('concept' vs 'text') for mutual exclusivity (#5). Not yet wired: doDescribedBy flow (shared runPidSetResults render), the described-by= URL param + writeQueryState kind-preservation, and mutual exclusivity at producer entry. buildConceptFilter isn't called yet, so this is behavior-neutral. Verified: conceptLabelForUri('…organismpart')→"Organism part"; free-text a1-verify still ✅ COHERENT. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… filter Wire the concept-URI filter end to end on top of the A1 search_pids machinery. A `described-by=<uri>` deep link selects material samples whose URI-valued facet concept (object_type / material / context) exactly matches the URI, and filters EVERY surface (table, globe points, facet counts, side panel) via the same `pid IN (SELECT pid FROM search_pids)` semi-join. - doDescribedBy(uri): second entry point into search_pids. Reuses the already -reviewed buildConceptFilter (producer) + applySearchFilterChange (refresher, kind-agnostic) and renders its OWN side panel. Deliberately does NOT touch doSearch — keeps A1's just-shipped (isamplesorg#251) text-search stale-guard hot path untouched (RY decision: protect the path Kerstin demos Wed). - writeQueryState: search= and described-by= are mutually exclusive. Keyed off the text input (always current) rather than window.__searchFilter (stale at doSearch's early writeQueryState call), so a committed text search drops described-by= and vice-versa. - Boot: described-by= deep link auto-commits and wins over search= if both present. - tests/playwright/described-by-verify.mjs: HEADLESS deep-link coherence test (globe + panel + URL all reflect the concept; mutual-exclusivity with text). Verified: described-by-verify green; a1-verify still green (no A1 regression). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e guard, XSS) Codex round-1 found 3 real issues; all fixed + regression-tested: 1. URL could flip concept→phantom-text on draft text + facet/source toggle. writeQueryState is now intent-aware: doSearch passes commitText to authoritatively persist/clear a text search (it runs before __searchFilter updates); every other caller mirrors the COMMITTED filter in __searchFilter, so draft (un-submitted) text no longer clobbers described-by=. New Playwright check: draft text + source toggle preserves described-by=. 2. A superseded doDescribedBy could still rewrite the URL. Moved the _searchSeq stale-check to BEFORE writeQueryState (and re-check after the async applySearchFilterChange) so only the current producer persists state. 3. Reflected-XSS path: the concept side panel interpolated a URL-derived label (and result label/pid/url/name) into innerHTML. Now escapeHtml'd (the same helper the table renderer uses), covering text + attribute contexts. Verified: described-by-verify green (concept + draft-text + mutual-exclusivity); a1-verify still green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…yee staging # Conflicts: # SESSION_SUMMARY.md # explorer.qmd
Show a `progress` cursor (arrow + spinner — NOT a frozen hourglass) over #cesiumContainer while a filter/search/facet/source update is in flight, so the user gets a visual "things are updating" signal without the map ever freezing (it's pure CSS on body.explorer-busy; it never intercepts input). Hooks into the existing depth-counted busyAcquire/busyRelease that already wrap the facet-change, source-change, and search/concept-commit paths — those releases are all in `finally`, so a throw can't strand the cursor. Adds a 120s watchdog as defense-in-depth against a HUNG promise (a load that never settles) that would otherwise skip the finally: it force-clears the busy state (re-armed per acquire, far longer than any real cold-cache load, so it only fires on a genuine leak). Verified: cursor shows ~1.2s on a search then clears via normal release; A1 + deploy smoke gate green. Not covered: pan/zoom viewport reloads (separate cells) — deferred follow-up. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A map-only busy cursor was invisible in practice: a CSS cursor only shows where the pointer is, and the covered updates (search, facet, source) are all triggered with the pointer OFF the map (search box / sidebar). Apply the progress cursor page-wide (body.explorer-busy * with !important) so the 'updating' signal shows wherever the pointer rests. Verified: progress on body/search-box/checkbox/map when busy, natural cursors when idle; smoke gate green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This was referenced Jun 1, 2026
Contributor
Author
|
Superseded by #261, which shipped the consolidated runtime (stats-collapse #254 + chip #255 + facet-count fix #253) to isamples.org. Closing the PR. NOTE: the regression tests from these branches are not yet on upstream — they'll be re-added via one clean follow-up PR (incl. repackaging the count-race spec as a standalone .mjs to fix the [P2] default-suite/ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Verification
/opt/homebrew/bin/duckdbconfirmedbuccheroground truth: 2,693 total, 2,693 OpenContext, 0 SESAR, 0 earthmaterial, 0 othersolidobject./usr/local/bin/quarto render explorer.qmdHEADLESS=1 A1_CLOSE=1 A1_BASE='http://localhost:8077/explorer.html?data_base=/data&debug=a1&sources=OPENCONTEXT%2CGEOME%2CSMITHSONIAN' node tests/playwright/a1-verify.mjsTEST_URL=http://localhost:8077 ./node_modules/.bin/playwright test tests/playwright/facet-search-count-race.spec.js --project=chromium --reporter=lineFixes #253