Skip to content

__aliasMarker refactoring to cover complex cases with ALIASes (antalya-26.3)#1844

Open
filimonov wants to merge 33 commits into
antalya-26.3from
feature/antalya-26.3/alias_marker_fixes
Open

__aliasMarker refactoring to cover complex cases with ALIASes (antalya-26.3)#1844
filimonov wants to merge 33 commits into
antalya-26.3from
feature/antalya-26.3/alias_marker_fixes

Conversation

@filimonov
Copy link
Copy Markdown
Member

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix wrong results / errors for ALIAS columns read through Distributed (and Merge-over-Distributed, distributed-over-distributed) by reconciling shard headers by name via a reworked __aliasMarker.

Documentation entry for user-facing changes

Forward-port of #1463 onto antalya-26.3.

What: __aliasMarker now works in two phases — inject a marker carrying a live column reference during distributed rewrite, then materialize it to the column identifier (__tableN.col) right before serializing to the shard — so an inlined ALIAS expression keeps a name-matchable identity across the initiator/shard boundary. Header reconciliation prefers matching by name (with a position fallback) in the planner and StorageMerge. See the design comment in identity.h.

Why: without it, distributed ALIAS-column queries could return swapped columns or fail with a type mismatch (CANNOT_PARSE_TEXT / THERE_IS_NO_COLUMN).

Notes:

  • New setting enable_alias_marker (default on); set to 0 on the initiator for mixed-version clusters whose shards don't understand __aliasMarker.
  • Adds stateless regression tests 03842/03843/03921/03923/03928/03930-03934 (plain Distributed, parallel replicas, Merge, lambda-misuse guard, setting-effect).

Related: #1424, #1335 (closed via #1463).

filimonov and others added 25 commits May 27, 2026 07:50
Signed-off-by: Mikhail Filimonov <mfilimonov@altinity.com>
Signed-off-by: Mikhail Filimonov <mfilimonov@altinity.com>
Signed-off-by: Mikhail Filimonov <mfilimonov@altinity.com>
Signed-off-by: Mikhail Filimonov <mfilimonov@altinity.com>
Signed-off-by: Mikhail Filimonov <mfilimonov@altinity.com>
Signed-off-by: Mikhail Filimonov <mfilimonov@altinity.com>
Signed-off-by: Mikhail Filimonov <mfilimonov@altinity.com>
- switch PlannerJoinTree header conversion from position to name matching when column names are unique and sets match

- add stateless regression 03922 for alias-marker column swap across distributed reads

Signed-off-by: Mikhail Filimonov <mfilimonov@altinity.com>
This reverts commit b6cbb20.

Signed-off-by: Mikhail Filimonov <mfilimonov@altinity.com>
Signed-off-by: Mikhail Filimonov <mfilimonov@altinity.com>
Signed-off-by: Mikhail Filimonov <mfilimonov@altinity.com>
Add a planner utility that first tries header conversion by column name and falls back to position when name matching is not possible.\n\nUse it in PlannerJoinTree and parallel replicas query planning to handle duplicate alias-heavy schemas more robustly, while emitting trace diagnostics when position fallback is used.\n\nUpdate 03921 distributed-over-distributed alias tests and pin parallel-replica settings for deterministic results.

Signed-off-by: Mikhail Filimonov <mfilimonov@altinity.com>
Nested ALIAS columns (a2 contains a1's subexpression) over a single-shard
Distributed table, matrix over prefer_localhost_replica x serialize_query_plan,
with the single-node result as oracle. Already green on this branch (the
PlannerJoinTree name-first conversion from b0ce0c9 covers it) - kept as a
regression lock.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Mikhail Filimonov <mfilimonov@altinity.com>
…licasQuery)

Nested ALIAS columns over a Distributed table read with parallel replicas, AST and
serialized-plan transport, single-node result as oracle. Green on this branch
(findParallelReplicasQuery name-first conversion covers it) - kept as a regression lock.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Mikhail Filimonov <mfilimonov@altinity.com>
Rewrites 03928 as a black-box test: nested ALIAS columns read through a Merge
over a Distributed table, no explicit __aliasMarker, single-node result as oracle.
Currently FAILS - alias columns come back as 0 through StorageMerge reconciliation.
Probe .stderr/.stdout artifacts dropped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Mikhail Filimonov <mfilimonov@altinity.com>
…r names

When a Merge table reads an underlying Distributed table with ALIAS columns, the
alias expressions in convertAndFilterSourceStream were emitted under their plain
logical name (e.g. `a`), while the Merge reconciliation expects analyzer column
identifiers (e.g. `__table1.a`). The identifier columns were therefore treated as
missing and filled with defaults, yielding wrong results (zeros).

Map each alias's plain logical name to the unambiguous target-header identifier and
emit the alias output under that identifier, so the downstream header reconciliation
matches by name with no positional fallback. Makes the header correct by construction;
any extra expression columns from the child are simply dropped by name matching.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Mikhail Filimonov <mfilimonov@altinity.com>
Reorders two nested ALIAS columns and adds a computed expression over them, read
through a Distributed table (AST and serialized-plan transport), with the single-node
result as oracle. Passes via name-first header reconciliation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Mikhail Filimonov <mfilimonov@altinity.com>
Black-box regression trip-wires for the Hybrid unknown-table scenarios from
Altinity/ClickHouse issues #1208, #1209, #1422 (verified fixed on this branch).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Mikhail Filimonov <mfilimonov@altinity.com>
Now that alias outputs are emitted under their target identifier, the child header
matches the Merge header by full name, so the short-name lookup helpers
(get_short_name, short_name_count, the short-name branch in find_header_column) and
the unreachable duplicate header.has() branch are dead. Revert convertAndFilterSourceStream
to the upstream name -> smallest -> position -> unneeded shape. The position fallback in
the converting step is kept (load-bearing for normal distributed aggregation).

Verified: 03928/03930/03931/03932/03921/03842/03843/03923 + the Merge regression set
(type cast, virtual columns, alias-merge, structure unification, sample factor) all pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Mikhail Filimonov <mfilimonov@altinity.com>
…3926)

These tests pinned enable_alias_marker=0 yet asserted non-swapped output. The
__aliasMarker is the mechanism that prevents the column swap; with it disabled the
swap is expected by design (a String alias routed to a UInt64 column -> CANNOT_PARSE_TEXT),
so the references were aspirational and the tests could never pass. Marker-on coverage
(03921/03928/03930/03931/03932) already validates the supported configuration. Copies
kept under _local_files_and_notes/dropped_tests/ for the record.

Signed-off-by: Mikhail Filimonov <mfilimonov@altinity.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…changes

- Planner::makeConvertingActionsPreferNameThenPosition: only fall back to positional
  matching for THERE_IS_NO_COLUMN / NUMBER_OF_COLUMNS_DOESNT_MATCH (the cases name
  matching legitimately cannot handle, e.g. a remote aggregate-state column matched by
  ordinal); rethrow any other name-mode error instead of silently masking a real
  schema/type problem. Fallback stays at TRACE (it is a common, expected path).
- finalizeAliasMarkersForDistributedSerialization: do not descend into lambda bodies.
  A marker inside a lambda (e.g. arrayMap(x -> __aliasMarker(x, x), ...)) over a
  Distributed table resolved its column argument to the lambda parameter, which has no
  table source, and raised a user-triggerable LOGICAL_ERROR (server abort under
  abort_on_logical_error). Covered by 03933.
- StorageDistributed: clone the alias expression before removeAlias() in the marker
  branch; getExpression() may return a node shared elsewhere in the tree.
- Clarify enable_alias_marker setting description (it is a correctness toggle).
- Remove dead/unused code: unused includes (base/hex.h, ClientInfo.h), unused
  serialize_query_plan extern, a stray blank line, and make logPositionConversionMismatch
  file-local with a TRACE-level guard. Refresh a stale comment in findParallelReplicasQuery.

Tests: 03933 (direct/lambda __aliasMarker use must not crash) and 03934 (enable_alias_marker
on=correct vs off=reintroduces the column swap, proving the setting's effect).

Signed-off-by: Mikhail Filimonov <mfilimonov@altinity.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…traction

Replace the two hand-rolled rfind('.')-and-substr blocks in convertAndFilterSourceStream
with Nested::splitName(name, /*reverse=*/true), the canonical helper the analyzer itself
uses to derive a column's logical name from its `__tableN.col` identifier. Behavior is
unchanged (fall back to the full name when there is no dot).

Note: the short-name alias-injection loop is retained - it is load-bearing for resolving
alias-of-alias expressions (e.g. `b ALIAS a + 1`); removing it breaks 03928 with
NOT_FOUND_COLUMN_IN_BLOCK, so the reviewers' "possibly redundant" hypothesis does not hold.

Signed-off-by: Mikhail Filimonov <mfilimonov@altinity.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous form (SELECT a, b FROM merge ORDER BY x over a two-shard cluster) relied on
the distributed merge ordering of duplicate rows, which is not guaranteed and differed on
arm_binary CI (correct values, wrong row order). Rewrite as
SELECT x, a, b ... GROUP BY x, a, b ORDER BY x: GROUP BY keeps x in the required columns
(the ALIAS expansion needs it) and deduplicates the per-shard duplicates, and ORDER BY over
the distinct x values is a total order independent of the merge. The test still fails if the
alias columns are swapped or zeroed.

Signed-off-by: Mikhail Filimonov <mfilimonov@altinity.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both tests target AST-path behavior; the "distributed plan" CI flavor forces
serialize_query_plan=1 globally, which exposed two plan-path differences unrelated to
what these tests assert:
- 03934: on the serialized-plan path the header is reconciled by name without the marker,
  so the marker_off query does NOT swap/error there - the serverError expectation only
  holds on the AST path.
- 03923: hybrid + IN-subquery on the plan path hits a separate header-reconciliation gap
  (THERE_IS_NO_COLUMN __table1.string_col), orthogonal to the unknown-table issues this
  test covers.
Pin serialize_query_plan=0 so both run on the intended AST path under any CI flavor.

Signed-off-by: Mikhail Filimonov <mfilimonov@altinity.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The marker travels to shards as the __aliasMarker function in distributed SQL. If a shard
does not understand it (older/forked build), set enable_alias_marker=0 on the initiator to
disable injection and fall back to previous behavior. No version negotiation is performed by
design; document this in the setting description.

Signed-off-by: Mikhail Filimonov <mfilimonov@altinity.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 27, 2026

Workflow [PR], commit [2cb9bd9]

filimonov and others added 4 commits May 27, 2026 19:18
Parallel replicas over a small non-replicated table can read the same
rows on several replicas under randomized settings, duplicating output
(observed ×3 on arm CI). Switch to `SELECT x, a1, a2 ... GROUP BY x, a1, a2
ORDER BY x` to deduplicate while keeping `x` in the required columns for the
ALIAS expansion. The test still fails if `a1`/`a2` are swapped or wrong.
Mirrors the earlier 03928 fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Mikhail Filimonov <mfilimonov@altinity.com>
Six alias-marker regression tests dropped from the upstream port branch
(alias_marker3) had their column-order scenarios already covered by upstream PR
ClickHouse#94644. Preserves the original SQL and reference files (under 26.3 slot
numbers) plus a dated rationale Markdown so future contributors don't reinvent
the same shapes.
makeConvertingActionsPreferNameThenPosition previously tried `MatchColumnsMode::Name`
inside a try/catch and fell back to `MatchColumnsMode::Position` on
`THERE_IS_NO_COLUMN` / `NUMBER_OF_COLUMNS_DOESNT_MATCH`. That works functionally but
throws a C++ exception on every call site where Name cannot match by construction --
most notably on every `SELECT count() FROM mt_table` because `Optimized trivial count`
emits its output column under the source-table column name (e.g. `x`) while the
analyzer-expected header uses the aggregate function name (`count()`). The exception
travelled the full stack-unwind path before Position rescued the conversion.

Restore the predicate-based selector originally introduced in `b6cbb20e7f6` (reverted
by `39a235d405c` for unclear reasons -- not a defect). `canMatchByNameWithoutAmbiguity`
checks set equality of source and result column names with no duplicates in O(n) hash
work and routes to `Name` only when the rename can be done by name; otherwise routes
straight to `Position` and emits a `LOG_TEST` diagnostic.

Behaviour is identical to the prior form on every distributed-ALIAS scenario the
suite covers (Name path), and identical to upstream Position-blind behaviour on the
non-alias paths the prior form was paying try/catch for. Verified locally on:

  - alias suite: 03921, 03928, 03930, 03931, 03932, 03933, 03228
  - non-alias load-bearing: 00059_shard_global_in_mergetree (count over remote),
    00754_distributed_optimize_skip_select_on_unused_shards,
    00028, 00111, 00124, 01099, 02184, 02402

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Mikhail Filimonov <mfilimonov@altinity.com>
filimonov and others added 4 commits May 29, 2026 15:08
- StorageDistributed::ReplaseAliasColumnsVisitor: capture
  function_node.getArguments().getNodes() in a local reference before the
  move-assign + index-access pair. Two separate accessor calls are safe
  but the original code read as indexing a moved-from container.

- StorageMerge::convertAndFilterSourceStream: document the suffix-vs-prefix
  concern in Nested::splitName(reverse=true) as a latent-only issue.
  Investigated 2026-05-29: the bug does not reproduce today because Nested
  subcolumns appear in the child stream with their dotted name verbatim,
  so the alias expression resolves directly without going through the
  suffix-stripping bridge. Comment in code points to the proper fix
  (stripAnalyzerTablePrefix) if reachability changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Mikhail Filimonov <mfilimonov@altinity.com>
(cherry picked from commit f323e68)
…ookup)

`ReadFromMerge::convertAndFilterSourceStream` mapped logical alias names
to analyzer identifiers via `Nested::splitName(name, /*reverse=*/true)`,
which splits on the LAST dot. For backtick-quoted dotted column names
(e.g. `\`n.a\`` -> analyzer identifier `__table1.\`n.a\``) this yields
the wrong logical name (`a` instead of `n.a`); the alias output was then
emitted under the wrong name, the downstream `addMissingDefaults` saw
the expected `__tableN.\`n.a\`` column as missing and filled it with
type defaults -- silent wrong data.

Replace the analyzer-naming-convention regex with a schema lookup: take
the candidate set from the actual declared Merge schema (via
`merge_storage_snapshot->metadata->getColumns()`) and suffix-match each
declared column name against the `header` / `pipe_columns` identifier
names. A header column named `<prefix>.<C>` or `<prefix>.\`<C>\`` is
known to correspond to Merge column `C` because `C` is in the candidate
set. This is robust to dotted column names (Nested-style and
backtick-quoted) and does not hardcode the `__tableN` prefix shape.

The same lookup also drives the input-side bridge that exposes child
inputs under their plain names so `buildQueryTree(alias.expression)`
resolves references like `a` or `n.a` against
`__table1.a` / `__table1.\`n.a\``-named inputs -- required for
alias-of-alias resolution such as `b ALIAS a + 1`.

Pass `merge_storage_snapshot->metadata->getColumns()` to the (static)
helper so it has access to the Merge schema.

Add a long TODO documenting the structural awkwardness of the two-step
design (rewrite the child query in `getModifiedQueryInfo`, recompute
alias values here) and a potential future unification via
`__aliasMarker` -- noting that the marker function call disappears at
plan-build time so MergeTree index analysis on predicates over ALIAS
columns is unaffected.

Tests:
- 04281_storage_merge_over_distributed_alias: Merge over Distributed
  with nested ALIAS columns (`b ALIAS a + 1` where `a ALIAS x + 1`).
- 04286_dotted_alias_merge_over_distributed: Merge with explicit
  backtick-quoted dotted column names declared as ALIAS in the
  underlying storage, over a two-shard Distributed cluster -- the shape
  that triggers `__tableN.\`n.a\`` identifiers and uncovers the
  suffix-vs-prefix splitName bug.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Mikhail Filimonov <mfilimonov@altinity.com>
…nostic

(cherry picked from commit bfbf06fa04a07daafc115801ef81ae7deb66ff57)
The previous commit assumed analyzer identifiers for dotted Merge
columns are backtick-quoted (e.g. `__tableN.\`n.a\``). The 26.3
analyzer emits them raw (`__tableN.n.a`). 04286_dotted_alias_merge_over_distributed
exposed this: header matching only the backtick-quoted form missed the
raw-dot identifier, the alias output was emitted under the wrong name
and `addMissingDefaults` filled the expected column with type defaults
(silent wrong data: `n.a` came back as 0).

Try both forms in the suffix match: `<prefix>.<C>` and (only for dotted
column names) `<prefix>.\`<C>\``. Either match accepts the analyzer
identifier as corresponding to the Merge column `C`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Mikhail Filimonov <mfilimonov@altinity.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant