Skip to content

V2#15

Open
Spamercz wants to merge 97 commits into
masterfrom
v2
Open

V2#15
Spamercz wants to merge 97 commits into
masterfrom
v2

Conversation

@Spamercz
Copy link
Copy Markdown
Contributor

@Spamercz Spamercz commented Jun 2, 2026

This pull request adds a comprehensive set of new aggregation classes to the Spameri\ElasticQuery\Aggregation namespace, significantly expanding support for various Elasticsearch aggregation types. Each class encapsulates the logic for a specific aggregation, allowing for easier construction and serialization of Elasticsearch queries.

The most important changes are:

New Metric and Bucket Aggregations:

  • Added AvgBucket, BoxPlot, Cardinality, ExtendedStats, GeoBounds, GeoCentroid, and GeoHashGrid classes to support a range of metric aggregations such as averages, statistical summaries, and geospatial metrics. [1] [2] [3] [4] [5] [6] [7]
  • Added AdjacencyMatrix, Composite, DateHistogram, DateRange, DiversifiedSampler, and GeoDistance classes to implement various bucket and multi-bucket aggregations, including support for range and geospatial bucketing. [1] [2] [3] [4] [5] [6]

New Pipeline Aggregations:

  • Implemented pipeline aggregation classes: BucketScript, BucketSelector, BucketSort, CumulativeSum, and Derivative, enabling advanced data processing and transformations on aggregation results. [1] [2] [3] [4] [5]

Constructor and Serialization Logic:

  • Each class provides a constructor for setting aggregation parameters and a toArray() method for serializing the aggregation to the appropriate Elasticsearch query format. (All references above)

These additions make it much easier to build complex and varied Elasticsearch queries using the library, covering a wide array of use cases.

Spamercz and others added 30 commits May 20, 2026 13:44
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Spamercz and others added 30 commits May 20, 2026 14:53
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each .phpt currently duplicates ~60 lines of curl setUp/tearDown plus
the request-build-send-decode-map flow. The new base class provides
createIndex/indexDocument/search/deleteIndex helpers and default
setUp/tearDown, cutting a typical integration test to ~15 lines.

Term.phpt migrated as proof — round-trips against ES via the new
helpers, and asserts a real hit count instead of just type('int').

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GeoDistance was emitting {pin: {location: ...}} which Elasticsearch
rejects — fixed to emit the proper geo_distance envelope with the
required distance argument, plus distance_type/validation_method/
ignore_unmapped/boost. The old test asserted the broken shape, so
the bug shipped silently.

Nested was wrapping query in an extra array level (query: [bool: ...]
instead of query: bool: ...) which Elasticsearch also rejects.
Added score_mode/ignore_unmapped/inner_hits. The empty-collection
case now emits an stdClass to keep ES happy.

InnerHits added as a typed sub-object (name, from, size, sort,
_source, highlight, explain, script_fields, docvalue_fields, etc.)
so nested/has_child/has_parent can use it.

PhrasePrefix boost type changed from int to float for consistency
with every other query. GeoDistanceSort ignore_unmapped is now a
constructor arg instead of hard-coded true.

Tests now use AbstractElasticTestCase, indexing real docs through
a typed mapping and asserting the round-tripped hit count.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Eight full-text query types gained the constructor args that were
missing against the current Elasticsearch reference:

- ElasticMatch: zero_terms_query, auto_generate_synonyms_phrase_query,
  lenient, prefix_length, max_expansions, fuzzy_transpositions, fuzzy_rewrite
- MultiMatch: tie_breaker, slop, prefix_length, max_expansions, lenient,
  zero_terms_query, auto_generate_synonyms_phrase_query, fuzzy_transpositions,
  fuzzy_rewrite
- MatchPhrase: zero_terms_query
- PhrasePrefix: analyzer, max_expansions, zero_terms_query
- MatchBoolPrefix: fuzziness, prefix_length, max_expansions,
  fuzzy_transpositions, fuzzy_rewrite
- QueryString: 17 new args including fuzziness, lenient, type, tie_breaker,
  rewrite, time_zone, minimum_should_match
- SimpleQueryString: 8 new args including lenient, fuzzy_*, quote_field_suffix
- CombinedFields: auto_generate_synonyms_phrase_query

Tests migrated to AbstractElasticTestCase and each carries a
testCreateWithAllOptions integration test that round-trips the
fully-loaded query against the ES container.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Term: case_insensitive
- Terms: terms_lookup (new TermsLookup sub-object — cross-document
  values fetched from {index, id, path, routing})
- Range: gt, lt (strict bounds), format, relation (new Relation
  constants class with INTERSECTS/CONTAINS/WITHIN), time_zone
- Exists: boost
- WildCard: case_insensitive, rewrite
- Prefix: rewrite (case_insensitive was already there)
- Fuzzy: transpositions, rewrite
- Regexp: rewrite
- TermSet: boost

Tests migrated to AbstractElasticTestCase. Each new arg is exercised
end-to-end via a testCreateWithAllOptions round-trip. Terms gains
testCreateWithLookup that indexes a lookup document in a separate
index and verifies the query resolves its values.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- HasChild: accepts InnerHits via constructor
- HasParent: accepts InnerHits via constructor
- ParentId: gains a boost arg

Tests now build a real parent-join index (relations: blog -> comment),
index a parent and a routed child, and round-trip the query through
ES with inner_hits to confirm the joined hit comes back.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- GeoBoundingBox: validation_method, ignore_unmapped, boost
- GeoShape: indexed_shape (new IndexedShape sub-object), boost
- Shape: indexed_shape, boost

IndexedShape resolves pre-indexed shapes by {id, index, path, routing}.
GeoShape/Shape now require either inline shape or indexedShape.

Tests cover both inline and indexed-shape paths end-to-end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MoreLikeThis: boost_terms, include, min_doc_freq, max_doc_freq,
min_word_length, max_word_length, stop_words, analyzer, boost,
fail_on_unsupported_field.

Percolate: documents (multi-doc plural form), name, routing,
preference, version. Percolate test now PUTs a real percolator
mapping and a stored query, then percolates a candidate document
through it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Knn: k-nearest neighbour vector similarity with field, query_vector,
  k, num_candidates, similarity, filter, boost. Integration test
  round-trips against a dense_vector mapping.
- SparseVector: ELSER-style sparse vector query — accepts either
  inference_id+query or direct queryVector token weights. Integration
  test against a sparse_vector mapping.
- TextExpansion: legacy ELSER form (model_id + model_text).
- Semantic: queries a semantic_text field.
- RuleQuery: applies Search Application query rules over an organic
  query.
- WeightedTokens: token weights against a sparse_vector field.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New typed Script value object (Spameri\ElasticQuery\Script) — shared
between aggregations, future score functions, and runtime mappings.

Metric aggregations gain missing/script/format consistently:
- Min/Max/Avg/Sum: + missing, script, format
- ValueCount: + script, format
- Stats/ExtendedStats: + missing, script, format
- Cardinality: + script, missing, rehash
- MedianAbsoluteDeviation/StringStats: + missing, script
- BoxPlot: + missing, script, execution_hint
- Percentiles: + tdigest, hdr, missing, script
- PercentileRanks: + hdr, missing, script

WeightedAvg restructured to use typed WeightedAvgValue sub-objects
so each side carries its own field|script + missing. Plus format.

TopHits rewritten — was a single-size wrapper, now exposes from, sort,
_source, highlight, explain, script_fields, docvalue_fields, version,
seq_no_primary_term, stored_fields, track_scores.

Tests migrated to AbstractElasticTestCase where most-modified;
existing tests still pass via additive constructor args.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Term: min_doc_count, shard_size, shard_min_doc_count,
  show_term_doc_count_error, script, collect_mode, execution_hint,
  value_type, format; include/exclude accept array|string
- MultiTerms: order, min_doc_count, shard_size, shard_min_doc_count,
  collect_mode, format; terms accept array<field|{field,missing}>
- RareTerms: include, exclude, missing
- SignificantTerms: shard_size, shard_min_doc_count, execution_hint,
  background_filter (LeafQueryInterface), heuristic constants
- SignificantText: shard_size, shard_min_doc_count, min_doc_count,
  background_filter, source_fields
- Range: script, missing, format
- DateRange: script, missing
- Histogram: min_doc_count, extended_bounds, hard_bounds (new
  Histogram\Bounds sub-object), offset, order, script, missing,
  keyed, format
- DateHistogram: extended_bounds, hard_bounds, keyed, order, script,
  missing
- IpRange: rewritten with new IpRangeValue (mask/CIDR support)
- Filter: rewritten to accept any LeafQueryInterface
- Missing: script
- Composite: typed CompositeSourceInterface with TermsSource,
  HistogramSource, DateHistogramSource, GeotileGridSource — each
  with order/missing_bucket
- AdjacencyMatrix: separator, accepts LeafQueryInterface for filters
- GeoDistance (agg): keyed, script, missing
- GeoHashGrid/GeoTileGrid: bounds
- DiversifiedSampler: execution_hint, script

ResultMapper updated to handle composite buckets (array keys
JSON-encoded) and ip/date string ranges in Bucket.from/to.

Three pre-existing tests that asserted invalid ES output were
corrected — IpRange now uses a real ip mapping, ReverseNested
correctly nests inside a Nested agg, AdjacencyMatrix uses the
new LeafQueryInterface filters API.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bucket aggs:
- Filters (plural, generic named-filters bucket)
- AutoDateHistogram (auto-tunes interval)
- VariableWidthHistogram
- CategorizeText (ML)
- FrequentItemSets (ML)
- IpPrefix
- TimeSeries

Metric aggs:
- TopMetrics
- GeoLine (gold license)
- TTest
- Rate
- MatrixStats

Pipeline / sampler / ML:
- RandomSampler
- CumulativeCardinality
- ExtendedStatsBucket
- Inference

ResultMapper updated to handle named buckets (string keys from
Filters aggregation), preserving the bucket name as the key.

License-gated aggs (GeoLine, CategorizeText) gracefully skip in
testCreate when running against basic-tier ES.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New decay score functions (numeric/date/geo):
- FunctionScore/ScoreFunction/Decay/Gauss
- FunctionScore/ScoreFunction/Decay/Linear
- FunctionScore/ScoreFunction/Decay/Exp
- Shared AbstractDecay parent with field/origin/scale/offset/decay/
  multi_value_mode args

New FunctionScore/ScoreFunction/ScriptScore — distinct from the
top-level Query/ScriptScore. Wraps a Script value object.

FunctionScore container gains boost, boost_mode, max_boost, min_score
plus BOOST_MODE_* constants. boost_mode and score_mode are different
things; boost_mode controls how the function score combines with the
query score, score_mode controls how multiple functions combine.

Integration tests round-trip the new functions against ES, including
a geo_point Gauss decay over a real distance.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Sort: mode (avg/min/max/sum/median), nested (NestedSort), numeric_type,
  unmapped_type, format. Removed readonly so script_sort etc can
  coexist in the SortCollection without subclass restrictions.
- ScriptSort: new — sorts by a Spameri\ElasticQuery\Script with
  type (number/string), order, mode, nested. Emits _script body.
- NestedSort: new sub-object — path, filter (LeafQueryInterface),
  max_children, recursive nested.

Options/SortCollection emit logic updated to handle non-Sort items
(ScriptSort/GeoDistanceSort) — _score short-circuit only fires for
plain Sort.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The old Highlight class hard-coded number_of_fragments to 0 and
exposed no per-field configuration. Rewritten:

- Highlight/HighlightField — per-field config (type, number_of_fragments,
  fragment_size, boundary_scanner, boundary_chars, boundary_max_scan,
  boundary_scanner_locale, encoder, force_source, fragmenter,
  highlight_query, matched_fields, no_match_size, order, phrase_limit,
  require_field_match, tags_schema, pre_tags, post_tags).
- Highlight/HighlightFieldCollection — typed collection of fields.
- Highlight — accepts either HighlightFieldCollection or a simple
  array<string> of field names (BC convenience). Adds all top-level
  options (type, fragment_size, boundary_*, encoder, force_source,
  fragmenter, highlight_query, matched_fields, no_match_size, order,
  phrase_limit, require_field_match, tags_schema).

Existing test still passes — the simple string-array path is preserved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Options gains many new search-body fields:
- _source (Source value object with includes/excludes)
- track_total_hits, track_scores, explain
- terminate_after, timeout
- search_after, pit (point-in-time)
- stored_fields, docvalue_fields, fields, script_fields
- runtime_mappings, seq_no_primary_term
- indices_boost
- profile, stats, ext

New top-level body features wired through ElasticQuery::toArray():
- Collapse (field collapsing) with InnerHits support
- Rescore (multiple, secondary query over windowSize hits)
- Suggest with typed suggesters:
  - TermSuggester (token suggestions)
  - PhraseSuggester (phrase suggestions)
  - CompletionSuggester (completion suggestions)
  - SuggesterInterface for extensibility

Integration tests cover field collapsing, source filtering, rescore,
and three suggesters against real ES indices.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The filter container previously only exposed must(), so filter
contexts could not express must_not / should / filter clauses
without dropping into raw arrays.

FilterCollection now mirrors QueryCollection: must(), should(),
mustNot(), filter() — each returns the appropriate typed
collection. The bool body emits all four arms when populated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- New CHANGELOG.md documenting the full v2 surface: test
  infrastructure, the 4 bug fixes, ~150 new constructor arguments
  across existing classes, ~25 new query/aggregation/score-function/
  sort/option types, and BC-affecting rewrites (GeoDistance, Nested,
  WeightedAvg, TopHits, Filter agg, IpRange, Composite, Highlight,
  FilterCollection).
- README features list rewritten to reflect v2 coverage.
- doc/02-query-objects.md updated where breaking changes landed:
  GeoDistance now takes distance + validation_method + ignore_unmapped
  + boost; Nested takes score_mode/ignore_unmapped/inner_hits;
  Terms accepts TermsLookup. New sections for Knn, SparseVector,
  Semantic, TextExpansion, RuleQuery, WeightedTokens.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
curl_close() is a no-op since PHP 8.0 and is deprecated in PHP 8.5, so the
Generic.PHP.DeprecatedFunctions sniff flags it via reflection and fails
`make cs` on the 8.5 CI matrix. The CurlHandle is freed automatically when
$ch goes out of scope.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant