Skip to content

manifest/bazel: workspace-root walker for nested workspaces#1341

Closed
Simon (simonhj) wants to merge 1 commit into
v1.xfrom
simon/bazel/walker
Closed

manifest/bazel: workspace-root walker for nested workspaces#1341
Simon (simonhj) wants to merge 1 commit into
v1.xfrom
simon/bazel/walker

Conversation

@simonhj
Copy link
Copy Markdown

@simonhj Simon (simonhj) commented May 28, 2026

Context

This is the first of a five-PR series rewriting socket manifest bazel's
Maven extraction path to (a) discover nested workspaces, and (b) replace
static Starlark parsing with Bazel-native discovery. The full series:
(1) workspace walker (this PR — pure helper, no callers yet);
(2) bazel mod show_extension parser + tri-state probe primitives + per-
invocation --output_user_root plumbing in the runner;
(3) per-repo metadata cquery
(4) orchestrator rewrite that wires (1)–(3) together and removes the
legacy Starlark-regex discovery + kind-only probe;
(5) customer --bazel-flag / --bazel-startup-flag / --bazel-maven-repo
passthrough for matrix builds. Each PR is independently green (typecheck +
tests); follow-up PRs depend on this branch but this branch leaves the
orchestrator untouched, so the walker arrives as unused code until (4)
lands.

This greatly simplifies things, the [ast

Summary

Adds findWorkspaceRoots({ cwd, ignoreDirNames?, ignoreDirPrefixes?, verbose? })
— a pure-function file-tree walker that returns every directory containing
a Bazel workspace marker (MODULE.bazel, WORKSPACE, or WORKSPACE.bazel).

Real-world monorepos host multiple workspace roots: Today
socket manifest bazel only inspects the invocation cwd, so anything
under a nested workspace is silently invisible. This walker is the
building block for fixing that.

Behaviour

  • Walks the tree rooted at cwd; yields every directory whose
    immediate children include a workspace marker file.
  • Returns absolute, sorted paths for determinism.
  • Continues descending after finding a root — nested workspaces
    (root MODULE.bazel + examples/*/MODULE.bazel) are common; both
    are reported.

…very

`findWorkspaceRoots` walks the tree from cwd and returns every directory
containing MODULE.bazel / WORKSPACE / WORKSPACE.bazel. Monorepos host
multiple workspace roots (e.g. examples/<name>/MODULE.bazel, mobile/
MODULE.bazel under an otherwise non-Bazel root); the per-workspace
algorithm in the orchestrator runs once per discovered root.

Pruning matches the previous lockfile walker: skip the usual non-workspace
directories (.git, node_modules, .socket-auto-manifest, etc.), Bazel's
`bazel-*` output_base symlinks (so we never recurse into tens of GiB of
generated state), and `dist*` build-output directories. Caps `MAX_WALK_DEPTH`
and `MAX_WORKSPACE_ROOTS` guard against pathological inputs and symlink
loops.

Pure-function module with no Bazel calls; unit tests use a tmpdir
fixture tree and cover the root-only, nested, prune, symlink, and
sort-determinism cases.
@simonhj
Copy link
Copy Markdown
Author

Consolidating into a single PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant