TRT-2703: update jobRunsReportMatView post partitioning by neisw · Pull Request #3616 · openshift/sippy

neisw · 2026-06-12T15:36:12Z

Updates matview to be aware of release and prow_job_run_timestamps
Adds fallback to prow_job_runs.test_failures if we are missing failed_test_results.test_count
Adds option to enable partitionwise planning optimizations

Summary by CodeRabbit

New Features
- Added a flag/option to enable PostgreSQL partitionwise optimizations for improved query performance.
Improvements
- Refined job-runs report to combine failed and flaked test results, limit input to the last 90 days, and produce more accurate test-failure counts.
Tests
- Added benchmarks for materialized-view operations and data-refresh performance.

openshift-merge-bot · 2026-06-12T15:36:15Z

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

openshift-ci-robot · 2026-06-12T15:36:16Z

@neisw: This pull request references TRT-2703 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Updates matview to be aware of release and prow_job_run_timestamps
Adds fallback to prow_job_runs.test_failures if we are missing failed_test_results.test_count
Adds option to enable partitionwise planning optimizations

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai · 2026-06-12T15:36:26Z

Walkthrough

Adds a functional-option (WithPartitionwise) to DB creation to toggle Postgres partitionwise runtime params, updates a job-run materialized view to aggregate by release, wires a --enable-partitionwise flag, and adds benchmarks exercising the changes.

Changes

PostgreSQL Partitionwise Optimization Feature

Layer / File(s)	Summary
Database configuration option mechanism `pkg/db/db.go`, `pkg/flags/postgres.go`	Adds exported `Option` type and `WithPartitionwise` constructor; `New` accepts `opts ...Option` and conditionally sets `pgxConfig.RuntimeParams` to enable partitionwise aggregate/join; `PostgresFlags` gains `EnablePartitionwise`, binds `--enable-partitionwise`, and passes the option into `db.New` in `GetDBClient`.
Materialized view schema refinement `pkg/db/views.go`	Replaces failed/flaked CTEs with a single `test_results` CTE grouped by `prow_job_run_id` and `prow_job_run_release`, limits input to last 90 days, tightens JOINs, and uses COALESCE for failure counts.
Benchmark test for partitionwise feature `pkg/flags/postgres_benchmarking_test.go`	Imports `sippyserver`, reads `enable_partitionwise` env to set benchmark flags and logs when enabled, adds `Test_BenchmarkJobRunsReportMatview` (materialized view lifecycle + queries) and `Test_BenchmarkRefreshData` (schema update + sippyserver.RefreshData) benchmarks.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

ready-for-human-review, ok-to-test

Suggested reviewers

smg247
stbenjam

Caution

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

Ignore

❌ Failed checks (2 errors, 4 warnings)

Check name	Status	Explanation	Resolution
Sql Injection Prevention	❌ Error	pkg/db/views.go concatenates reportEnd (from pinned-date-time flag) into DDL SQL: `TO_TIMESTAMP('"+reportEnd.UTC().Format(...)"', ...)`, violating the no user-input SQL formatting rule.	Refactor syncPostgresViews/syncPostgresMaterializedViews to avoid string concatenation: use a SQL placeholder (e.g., `TO_TIMESTAMP($1, ...)`) and pass reportEnd as an Exec argument.
No-Sensitive-Data-In-Logs	❌ Error	PR logs DB connection errors via log.WithError(err).Error("could not connect to db") and benchmark code logs full file paths containing connName derived from the DSN host.	Avoid logging raw connection errors that may include DSN/host; sanitize/redact errors and prevent logging benchmark file paths/names that embed connName (DSN host).
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Go Error Handling	⚠️ Warning	Errors aren’t consistently wrapped and pointers are dereferenced without nil checks: pkg/flags/postgres.go PinnedTime.Set returns `err` directly; pkg/db/db.go New returns `nil, err` from ParseConfi...	Wrap returned errors with `fmt.Errorf(...: %w, err)` (e.g., time.Parse, pgx.ParseConfig, gorm.Open) and add nil guards before dereferencing pointer receivers (e.g., `if l==nil` / validate Option not nil).
Excessive Css In React Should Use Styles	⚠️ Warning	Inline React style={{...}} objects with 5 style properties found in sippy-ng/src/chat/ChatMessage.js and sippy-ng/src/tests/TestAnalysis.js (violates >3–4 props rule).	Refactor those JSX inline style objects into a useStyles() hook or CSS module/styled component; keep only simple/dynamic inline styles.
Test Coverage For New Features	⚠️ Warning	PR adds exported db.Option/WithPartitionwise + updated jobRunsReportMatView SQL fallback, but repo has no unit tests covering either (only integration benchmarks gated by env).	Add unit tests for db.New option application (partitionwise runtime params) and a regression test validating jobRunsReportMatView computes test_failures via failed_test_results or prow_job_runs fallback.

✅ Passed checks (14 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Single Responsibility And Clear Naming	✅ Passed	Changes keep responsibilities focused: db.New now uses a clear functional-option API (Option/WithPartitionwise), PostgresFlags/GetDBClient stay cohesive, and the matview SQL uses descriptive CTE na...
Stable And Deterministic Test Names	✅ Passed	Checked repo for Ginkgo usage (no *_test.go imports of onsi/ginkgo or ginkgo/v2); therefore no Ginkgo test titles in this PR to flag.
Test Structure And Quality	✅ Passed	PR changes only pkg/db/.go and pkg/flags/.go (incl. postgres_benchmarking_test.go); that file contains no Ginkgo Describe/It/BeforeEach/Eventually blocks, so quality check is N/A.
Microshift Test Compatibility	✅ Passed	PR `#3616` modifies only pkg/db/* and pkg/flags/* (incl. a Go benchmark); no new Ginkgo e2e tests detected (no ginkgo/Describe patterns found).
Single Node Openshift (Sno) Test Compatibility	✅ Passed	PR `#3616` changes only pkg/db and pkg/flags (incl. Go benchmarks in postgres_benchmarking_test.go); no new Ginkgo e2e tests were added, so no SNO multi-node/HA assumptions to flag.
Topology-Aware Scheduling Compatibility	✅ Passed	PR changes only DB/SQL and Postgres connection/flags; no manifests, controllers, or scheduling constraints (affinity/topology spread/replicas/node selectors) were added.
Ote Binary Stdout Contract	✅ Passed	Checked PR-touching files (pkg/db/db.go, pkg/db/views.go, pkg/flags/postgres.go, pkg/flags/postgres_benchmarking_test.go) for fmt/os.Stdout/klog stdout writes in main/init/TestMain/BeforeSuite/RunS...
Ipv6 And Disconnected Network Test Compatibility	✅ Passed	PR adds Go benchmark/matview changes (pkg/db/db.go, pkg/db/views.go, pkg/flags/postgres.go, pkg/flags/postgres_benchmarking_test.go) and no Ginkgo e2e tests; scanned postgres_benchmarking_test.go f...
No-Weak-Crypto	✅ Passed	Scanned PR’s changed files (9 total) for md5/sha1/rc4/DES/3DES/blowfish/ECB and crypto/custom crypto + ConstantTimeCompare; none found. Only non-ct compare was schema-hash strings (non-secret).
Container-Privileges	✅ Passed	PR `#3616` only changes Go files; scanning the PR diff shows no Kubernetes/container privilege fields (privileged, hostPID/hostNetwork/hostIPC, SYS_ADMIN, allowPrivilegeEscalation, runAsUser/root).
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title specifically references the main change: updating jobRunsReportMatView post partitioning, which aligns with the primary modifications across the changeset.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci · 2026-06-12T15:36:33Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: neisw

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [neisw]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

pkg/flags/postgres_benchmarking_test.go (1)

502-524: 💤 Low value

Consider verifying partitionwise settings are actually applied.

The benchmark logs when partitionwise mode is requested, but doesn't verify the PostgreSQL runtime parameters were successfully set. Consider adding a verification query to confirm the settings:

if dbFlags.EnablePartitionwise {
    var aggValue, joinValue string
    dbc.DB.Raw("SHOW enable_partitionwise_aggregate").Scan(&aggValue)
    dbc.DB.Raw("SHOW enable_partitionwise_join").Scan(&joinValue)
    t.Logf("partitionwise settings: aggregate=%s join=%s", aggValue, joinValue)
}

This would catch any issues with parameter application and provide confirmation in benchmark output.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/flags/postgres_benchmarking_test.go` around lines 502 - 524, When
dbFlags.EnablePartitionwise is true (in the test that constructs PostgresFlags
and calls GetDBClient), add a verification step that queries the runtime
parameters and logs them: use the returned DB client (dbc.DB) to run Raw("SHOW
enable_partitionwise_aggregate") and Raw("SHOW enable_partitionwise_join"), Scan
the results into local variables (e.g., aggValue, joinValue), check/handle any
scan errors, and t.Logf the retrieved values so the test confirms the settings
were actually applied.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/db/views.go`:
- Line 166: The new 90-day filter (prow_job_run_tests.prow_job_run_timestamp >=
CURRENT_TIMESTAMP - interval '90 days') in the CTEs changes behavior for older
job runs (making failed_test_results.test_count and
flaked_test_results.test_count NULL and causing the COALESCE fallback to
prow_job_runs.test_failures) and must be documented or reverted; update the SQL
file by adding a concise comment above the CTEs explaining whether this 90-day
window is intentional for partition/performance optimization, a data retention
policy, or an accidental change, and update the PR description to state the
rationale and any expected data/behavioral consequences (also apply the same
documentation to the second occurrence of the filter that mirrors this logic).

---

Nitpick comments:
In `@pkg/flags/postgres_benchmarking_test.go`:
- Around line 502-524: When dbFlags.EnablePartitionwise is true (in the test
that constructs PostgresFlags and calls GetDBClient), add a verification step
that queries the runtime parameters and logs them: use the returned DB client
(dbc.DB) to run Raw("SHOW enable_partitionwise_aggregate") and Raw("SHOW
enable_partitionwise_join"), Scan the results into local variables (e.g.,
aggValue, joinValue), check/handle any scan errors, and t.Logf the retrieved
values so the test confirms the settings were actually applied.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 00d19967-86fc-4af0-8f77-4655fd06ad77

📥 Commits

Reviewing files that changed from the base of the PR and between b6d8159 and f9e69d9.

📒 Files selected for processing (4)

pkg/db/db.go
pkg/db/views.go
pkg/flags/postgres.go
pkg/flags/postgres_benchmarking_test.go

openshift-merge-bot · 2026-06-12T16:28:45Z

Scheduling required tests:
/test e2e

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/db/db.go`:
- Around line 69-70: The New constructor currently iterates options with "for _,
o := range opts { o(&cfg) }" which will panic if any Option is nil; update the
loop in New (pkg/db/db.go) to guard each Option by skipping nils (e.g., "if o ==
nil { continue }" before calling o(&cfg)) so the constructor is crash-safe when
given malformed or conditionally composed option slices.

In `@pkg/flags/postgres_benchmarking_test.go`:
- Line 1155: The DROP INDEX Exec call currently ignores errors; change the call
to capture the result and check the error returned from
dbc.DB.Exec(fmt.Sprintf("DROP INDEX IF EXISTS %s", indexName)) (e.g., assign _,
err := dbc.DB.Exec(...)) and, consistent with surrounding checks (lines
~1120,1124,1134), fail the test on error (use t.Fatalf or the same
error-handling helper used elsewhere), including the indexName and the error in
the message.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 72b6d1b8-4035-449f-8cda-cfd155e1dcbe

📥 Commits

Reviewing files that changed from the base of the PR and between f9e69d9 and 1cb9932.

📒 Files selected for processing (4)

pkg/db/db.go
pkg/db/views.go
pkg/flags/postgres.go
pkg/flags/postgres_benchmarking_test.go

🚧 Files skipped from review as they are similar to previous changes (2)

pkg/flags/postgres.go
pkg/db/views.go

openshift-merge-bot · 2026-06-12T17:11:49Z

Scheduling required tests:
/test e2e

neisw · 2026-06-12T18:05:04Z

@coderabbitai - In general reviews of postgres_benchmarking_test.go are not necessary as these tests are not run by default and limited to explicit functional benchmark comparisons. But the db error logging was a valid concern and has been addressed.

coderabbitai · 2026-06-12T18:05:23Z

Tip

For best results, initiate chat on the files or code changes.

@neisw: Thanks for the context! I'll keep that in mind for future reviews — I'll skip detailed review of postgres_benchmarking_test.go since those tests are not run by default and are scoped to explicit functional benchmark comparisons. Glad the db error logging concern was useful and has been addressed.

✏️ Learnings added

Learnt from: neisw
URL: https://github.com/openshift/sippy/pull/3616

Timestamp: 2026-06-12T18:05:21.953Z
Learning: In the `github.com/openshift/sippy` repository, `pkg/flags/postgres_benchmarking_test.go` does not need to be reviewed. These tests are not run by default and are limited to explicit functional benchmark comparisons. Skip detailed review of this file in future PRs.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

openshift-merge-bot · 2026-06-12T18:17:46Z

Scheduling required tests:
/test e2e

openshift-ci · 2026-06-12T18:33:26Z

@neisw: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

neisw · 2026-06-12T18:36:43Z

@coderabbitai can you do a followup review and update your responses based on the current state. Have the DB connection errors logging issues been resolved?

coderabbitai · 2026-06-12T18:37:33Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
cat pkg/flags/postgres.go
echo "---"
cat pkg/db/db.go
echo "---"
cat pkg/db/views.go

Length of output: 35350

@neisw: Here's the follow-up assessment:

DB connection error logging — mostly addressed, with one minor nit

In pkg/flags/postgres.go, GetDBClient() now reads:

if err != nil {
    log.Error("could not connect to db")
    return nil, fmt.Errorf("could not connect to db: %w", err)
}

The original concern is substantially addressed — an error is now logged and the error is wrapped before being returned. However, log.Error("could not connect to db") doesn't attach the underlying error to the log entry itself. For easier debugging (e.g., in log aggregators that surface structured fields), it would be slightly more informative to use:

log.WithError(err).Error("could not connect to db")

The wrapped return value (%w) does preserve the error for callers, so this is a minor nit rather than a blocking issue.

Overall state of the other changes looks good:

pkg/db/db.go — The functional-options pattern (WithPartitionwise) is clean and the pgxConfig.RuntimeParams approach for enable_partitionwise_aggregate/enable_partitionwise_join is appropriate.
pkg/db/views.go — The consolidated test_results CTE grouping by both prow_job_run_id and prow_job_run_release, the 90-day window constraint, the COALESCE(test_results.failed_test_count, prow_job_runs.test_failures) fallback, and the tightened LEFT JOIN are all correct and well-motivated by the partitioning changes.

mstaeble · 2026-06-12T18:49:40Z

+   test_results.flaked_test_names AS flaked_test_names,
+   test_results.flaked_test_count AS test_flakes,
+   test_results.failed_test_names AS failed_test_names,
+   COALESCE(test_results.failed_test_count, prow_job_runs.test_failures) AS test_failures,


Can you help me to understand the purpose of this COALESCE? Are we trying to avoid a NULL in the scenario where there are no failed tests in the past 90 days? Or are we trying to use prow_job_runs.test_failures as a fallback in the case of data quality issues?

I see it as fallback. If we don't have a value in our our query result then fallback to prow_job_runs value. The main issue in the bug was a lack of explanation as to why the data wasn't present in the matview when we did have the data in the tables.

How does the value in prow_job_runs.test_failures compare with the value in test_results.failed_test_count in the happy scenario where the data is present? If the data is the same, can we always use prow_job_runs.test_failures instead? If the data is not the same, what are the implications for the consumer of the data if they get one versus the other?

Moreover, I am concerned that we are putting a bandaid in for an issue that we are fixing via other means. If an issue of this nature happens again in the future for another reason, will we lose visibility into their even being an issue by serving data from the other table? In other words, if this COALESCE had been present already, would we have even known that there was a problem with the matview that needed to be fixed?

mstaeble · 2026-06-12T19:01:49Z

        FROM
                prow_pull_requests
        INNER JOIN
                prow_job_run_prow_pull_requests ON prow_job_run_prow_pull_requests.prow_pull_request_id = prow_pull_requests.id


Have you looked into whether there would be improvements from adding the 90-day time constraint here, to allow the seq scan to eliminate rows that could not possibly be contenders?

No but we want to prevent any full scans going forward as we look to bring in more historical data so I will look into that.

mstaeble · 2026-06-12T19:09:19Z

 	dbc, err := dbFlags.GetDBClient()
 	if err != nil {
-		t.Fatalf("couldn't get DB client: %v", err)
+		t.Fatal("couldn't get DB client")


What is the rationale behind removing the error from the log event?

This came from the concern that dsn might be leaked via the error. I am open to err on the side of caution though not overly concerned by the previous fatal with error.

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 12, 2026

openshift-ci Bot requested review from deepsm007 and smg247 June 12, 2026 15:36

openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 12, 2026

coderabbitai Bot requested changes Jun 12, 2026

View reviewed changes

Comment thread pkg/db/views.go Outdated

neisw added 2 commits June 12, 2026 12:48

TRT-2703: update jobRunsReportMatView post partitioning

ba6e724

TRT-2703: update jobRunsReportMatView optimizations

1cb9932

neisw force-pushed the trt-2703-matview-partitions branch from f9e69d9 to 1cb9932 Compare June 12, 2026 16:54

coderabbitai Bot requested changes Jun 12, 2026

View reviewed changes

Comment thread pkg/db/db.go

Comment thread pkg/flags/postgres_benchmarking_test.go

TRT-2703: address logging concerns

147be3a

mstaeble reviewed Jun 12, 2026

View reviewed changes

Conversation

neisw commented Jun 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

openshift-merge-bot Bot commented Jun 12, 2026

Uh oh!

openshift-ci-robot commented Jun 12, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Pre-merge checks failed

❌ Failed checks (2 errors, 4 warnings)

Uh oh!

openshift-ci Bot commented Jun 12, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

openshift-merge-bot Bot commented Jun 12, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

openshift-merge-bot Bot commented Jun 12, 2026

Uh oh!

neisw commented Jun 12, 2026

Uh oh!

coderabbitai Bot commented Jun 12, 2026

Uh oh!

openshift-merge-bot Bot commented Jun 12, 2026

Uh oh!

openshift-ci Bot commented Jun 12, 2026

Uh oh!

neisw commented Jun 12, 2026

Uh oh!

coderabbitai Bot commented Jun 12, 2026

Uh oh!

mstaeble Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

neisw Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

mstaeble Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

mstaeble Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

neisw Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

mstaeble Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

neisw Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

neisw commented Jun 12, 2026 •

edited by coderabbitai Bot

Loading

openshift-ci-robot commented Jun 12, 2026 •

edited by openshift-ci Bot

Loading

coderabbitai Bot commented Jun 12, 2026 •

edited

Loading