[bot] Merge master/c028c311 into rel/dev by yenkins-admin · Pull Request #1641 · gooddata/gooddata-python-sdk

yenkins-admin · 2026-06-03T14:27:02Z

🚀 Automated PR to perform merge from master into rel/dev with changes up to c028c31 (created by https://github.com/gooddata/gooddata-python-sdk/actions/runs/26891358313).

… + Langfuse) New public package `gooddata-eval` with a `gd-eval` CLI that evaluates the GoodData AI agent against a dataset of natural-language questions. Phase 1 — visualization evaluation: - Layered core + thin argparse CLI; SSE agentic chat client (httpx); workspace LLM provider/model resolution and activation via GoodData SDK; local-folder and Langfuse dataset sources; visualization evaluator with strict checks (metrics/dimensions/filters/type, cross-ref, pass@K); console + JSON reports. - Streaming per-item progress with latency (total, avg) and quality score. - Provider flag accepts name or id; auto-switches workspace to the provider that offers the requested model. - SSE fallback: captures visualization from create_adhoc_visualization tool call args when the data source is inaccessible. Phase 2 — remaining agentic test kinds: - metric_skill, alert_skill, search_tool: scored via tool call arguments. - general_question + guardrail: LLM-as-judge via openai [llm-judge] extra, lazily imported so CLI starts without openai installed. - Shared helpers: _deep_subset, LLMJudge, _text_utils. Langfuse integration: - Dataset source uses REST API via httpx (no Langfuse SDK — broken on Python 3.14). Requires LANGFUSE_PUBLIC_KEY / SECRET_KEY / HOST env vars. - Scoring sink (--langfuse, requires --langfuse-dataset): posts trace + 4 scores + dataset-run-item per evaluated item, creating the named experiment run automatically in Langfuse. - Scores: pass_at_k, quality_score, value_score, latency_s. Bug fixes from code review: - general_question/guardrail items SKIPPED (not ERRORED) when openai absent: supported_test_kinds() now checks openai availability via find_spec(). - Guardrail quality_score was inverted: visualization_returned renamed to no_visualization (True=good); judge_passed added so prose compliance scores 0.5 rather than 1.0. - _coerce_number truncated float thresholds: float(int(x)) -> float(x). - Falsy-zero threshold: 'or' fallback replaced with 'in' key check. - conversationId KeyError on malformed 200: raises ValueError with body. - Scoring math in sink.py was duplicated inline: now calls compute_scores(). - _deep_subset docstring corrected: greedy first-fit, not bipartite match. Infra wiring: - Add to fossa.yaml matrix, build-release/dev-release COMPONENTS, codecov. - Add Makefile (include ../../project_common.mk). 102 tests, ruff + ty clean. CLI starts without openai installed. JIRA: GDAI-1766 Risk: low — new isolated package; no changes to existing packages.

feat(eval): add gooddata-eval model-evaluation CLI

codecov · 2026-06-03T14:40:56Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 79.10%. Comparing base (a21f854) to head (c028c31).
⚠️ Report is 517 commits behind head on rel/dev.

Additional details and impacted files

@@           Coverage Diff            @@
##           rel/dev    #1641   +/-   ##
========================================
  Coverage    79.10%   79.10%           
========================================
  Files          231      231           
  Lines        15718    15718           
========================================
  Hits         12433    12433           
  Misses        3285     3285

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

zdenekmusil-gd added 2 commits June 3, 2026 15:58

Merge pull request #1639 from gooddata/zmu/gdai-1766-gooddata-eval-cli

c028c31

feat(eval): add gooddata-eval model-evaluation CLI

yenkins-admin requested review from hkad98, jaceksan, lupko and pcerny as code owners June 3, 2026 14:27

yenkins-admin merged commit 4b0c738 into rel/dev Jun 3, 2026
1 of 2 checks passed

yenkins-admin deleted the snapshot-master-c028c311-to-rel/dev branch June 3, 2026 14:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bot] Merge master/c028c311 into rel/dev#1641

[bot] Merge master/c028c311 into rel/dev#1641
yenkins-admin merged 2 commits into
rel/devfrom
snapshot-master-c028c311-to-rel/dev

yenkins-admin commented Jun 3, 2026

Uh oh!

Uh oh!

codecov Bot commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yenkins-admin commented Jun 3, 2026

Uh oh!

Uh oh!

codecov Bot commented Jun 3, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants