Skip to content

FEATURE: expose V8 ScriptCompiler::CachedData via Context#compile#413

Open
ursm wants to merge 1 commit into
rubyjs:mainfrom
ursm:feature/cached-data-411
Open

FEATURE: expose V8 ScriptCompiler::CachedData via Context#compile#413
ursm wants to merge 1 commit into
rubyjs:mainfrom
ursm:feature/cached-data-411

Conversation

@ursm
Copy link
Copy Markdown
Contributor

@ursm ursm commented May 18, 2026

Summary

Implements #411 — exposes V8's ScriptCompiler::CachedData so callers can persist per-script bytecode cache and skip re-parsing large bundles on subsequent processes.

# First process: produce the cache
ctx = MiniRacer::Context.new
script = ctx.compile(File.read("bundle.js"), filename: "bundle.js", produce_cache: true)
File.binwrite("bundle.js.cache", script.cached_data) if script.cached_data
script.run

# Later process: restore from blob, skip the parse step
cached = File.binread("bundle.js.cache")
ctx = MiniRacer::Context.new
script = ctx.compile(File.read("bundle.js"), filename: "bundle.js", cached_data: cached)
script.run

API surface

  • MiniRacer::Context#compile(source, filename:, cached_data:, produce_cache:)MiniRacer::Script
  • Script#run — executes the compiled script; safe to call multiple times
  • Script#cached_data — bytecode blob (nil when the supplied cached_data: was accepted; populated on initial compile or after rejection, only when produce_cache: true was set)
  • Script#cache_rejected? — boolean for cache-key invalidation telemetry
  • Script#dispose / Script#disposed? — eager handle release
  • MiniRacer::V8_CACHED_DATA_VERSION_TAG — module-level constant (populated on first Context.new) wrapping v8::ScriptCompiler::CachedDataVersionTag(); mix into cache keys so a libv8-node bump invalidates blobs automatically

Safety constraints

Found while wiring this into a real embedder (capybara-simulated driving Discourse). All three are documented in the README; the first two are also enforced at runtime.

  1. produce_cache: defaults to false. Passing true from inside a host-fn callback raises MiniRacer::RuntimeError. V8's CreateCodeCache walks live isolate state and corrupts the parser when re-entered from a JS → Ruby → JS frame; standalone repro at https://github.com/rubyjs/mini_racer/issues — TODO. Warm the cache from the top level instead.
  2. Cross-process reuse requires byte-identical snapshot data on both sides. MiniRacer::Snapshot.new(src).dump is non-deterministic across processes, so feeding the same source string to two Snapshot.new calls produces different blobs and V8 rejects every cached_data crossing that boundary. Use Snapshot#dump → persist → Snapshot.load(bytes) instead.
  3. Cross-process reuse is incompatible with Platform.set_flags!(:single_threaded). V8's single-threaded mode embeds process-local state in the cache blob, so cached_data is always rejected when consumed in a fresh process. Same-process reuse (e.g. a Context pool) still works. Embedders that need both will need to disable :single_threaded for the cache-producing / cache-consuming path.

Design notes

Context dispose ordering: State::~State() walks st.scripts and resets each v8::Persistent<v8::Script> under the existing Locker/Isolate::Scope before isolate->Dispose(). Handle table is owned per-State.

Concurrency: compile/run/dispose RPCs go through the existing rendezvous mutex path; the handle table is only touched from the V8 thread. The new State::in_callback counter is incremented in v8_api_callback (also V8-thread-only) and read inside v8_compile; no cross-thread access, no atomic needed.

GC finalizer trade-off: script_free does NOT send a dispose RPC — taking rr_mtx from a Ruby finalizer thread risks deadlock. Handles freed via finalizer rely on State::~State() walking the table at isolate teardown. Long-lived Contexts with many short-lived Scripts will accumulate handles until Context#dispose. Documented in README; Script#dispose is available for eager release.

CachedData buffer policy: input blob uses BufferNotOwned pointing at the ValueDeserializer's ArrayBuffer backing store (valid for the v8_compile call), avoiding a copy of potentially MB-sized blobs.

Packet protocol: new tags 'K' (compile), 'R' (run), 'D' (dispose) added to dispatch1. 'C' was already taken by call, hence 'K' for compile.

Refs #411.

@ursm
Copy link
Copy Markdown
Contributor Author

ursm commented May 18, 2026

For sequencing context: #412 (Module API) is the next planned PR but I'm holding it back until this one lands. The two share a lot of C++ surface (handle table, packet protocol, dispose ordering) so iterating patterns here once will be cheaper than rebasing #412 twice. Flagging in case it helps frame the review.

@ursm ursm force-pushed the feature/cached-data-411 branch 5 times, most recently from a66648a to f6eaa25 Compare May 30, 2026 07:58
@ursm ursm changed the title FEATURE: expose V8 ScriptCompiler::CachedData via Context#compile FEATURE: add Context#compile returning a Script handle May 30, 2026
Adds Context#compile returning a MiniRacer::Script handle that can be
re-run multiple times and exposes V8's per-script bytecode cache.

Callers pass `cached_data:` to skip re-parsing on subsequent processes
and opt in to `produce_cache: true` to read the freshly produced blob
back via `script.cached_data` for persistence.

The MiniRacer::V8_CACHED_DATA_VERSION_TAG constant exposes V8's
CachedDataVersionTag() so callers can invalidate their cache when
libv8-node is bumped.

Safety constraints (documented in README and CHANGELOG):

* produce_cache defaults to false; passing true from inside a host-fn
  callback raises MiniRacer::RuntimeError. V8's CreateCodeCache walks
  live isolate state and corrupts the parser when re-entered from a
  JS->Ruby->JS frame; warm the cache from the top level instead.
* Cross-process reuse requires both processes to load byte-identical
  snapshot data via Snapshot#dump / Snapshot.load. Snapshot.new(src)
  is non-deterministic across processes, so feeding the same source
  string to both sides is not enough — the cache will be rejected.
* Cross-process reuse is incompatible with
  Platform.set_flags!(:single_threaded). V8's single-threaded mode
  embeds process-local state in the cache blob; same-process reuse
  still works.

TruffleRuby ships a shim that falls back to source replay since
GraalJS has no equivalent per-script cache reachable from
Polyglot::InnerContext.

Refs rubyjs#411.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ursm ursm force-pushed the feature/cached-data-411 branch from f6eaa25 to 0329ecc Compare May 30, 2026 09:12
@ursm ursm changed the title FEATURE: add Context#compile returning a Script handle FEATURE: expose V8 ScriptCompiler::CachedData via Context#compile May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant