Skip to content

TRT-2703: update jobRunsReportMatView post partitioning#3616

Open
neisw wants to merge 3 commits into
openshift:mainfrom
neisw:trt-2703-matview-partitions
Open

TRT-2703: update jobRunsReportMatView post partitioning#3616
neisw wants to merge 3 commits into
openshift:mainfrom
neisw:trt-2703-matview-partitions

Conversation

@neisw

@neisw neisw commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Updates matview to be aware of release and prow_job_run_timestamps
Adds fallback to prow_job_runs.test_failures if we are missing failed_test_results.test_count
Adds option to enable partitionwise planning optimizations

Summary by CodeRabbit

  • New Features

    • Added a flag/option to enable PostgreSQL partitionwise optimizations for improved query performance.
  • Improvements

    • Refined job-runs report to combine failed and flaked test results, limit input to the last 90 days, and produce more accurate test-failure counts.
  • Tests

    • Added benchmarks for materialized-view operations and data-refresh performance.

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 12, 2026
@openshift-ci-robot

openshift-ci-robot commented Jun 12, 2026

Copy link
Copy Markdown

@neisw: This pull request references TRT-2703 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Updates matview to be aware of release and prow_job_run_timestamps
Adds fallback to prow_job_runs.test_failures if we are missing failed_test_results.test_count
Adds option to enable partitionwise planning optimizations

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Walkthrough

Adds a functional-option (WithPartitionwise) to DB creation to toggle Postgres partitionwise runtime params, updates a job-run materialized view to aggregate by release, wires a --enable-partitionwise flag, and adds benchmarks exercising the changes.

Changes

PostgreSQL Partitionwise Optimization Feature

Layer / File(s) Summary
Database configuration option mechanism
pkg/db/db.go, pkg/flags/postgres.go
Adds exported Option type and WithPartitionwise constructor; New accepts opts ...Option and conditionally sets pgxConfig.RuntimeParams to enable partitionwise aggregate/join; PostgresFlags gains EnablePartitionwise, binds --enable-partitionwise, and passes the option into db.New in GetDBClient.
Materialized view schema refinement
pkg/db/views.go
Replaces failed/flaked CTEs with a single test_results CTE grouped by prow_job_run_id and prow_job_run_release, limits input to last 90 days, tightens JOINs, and uses COALESCE for failure counts.
Benchmark test for partitionwise feature
pkg/flags/postgres_benchmarking_test.go
Imports sippyserver, reads enable_partitionwise env to set benchmark flags and logs when enabled, adds Test_BenchmarkJobRunsReportMatview (materialized view lifecycle + queries) and Test_BenchmarkRefreshData (schema update + sippyserver.RefreshData) benchmarks.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

ready-for-human-review, ok-to-test

Suggested reviewers

  • smg247
  • stbenjam

Caution

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

  • Ignore

❌ Failed checks (2 errors, 4 warnings)

Check name Status Explanation Resolution
Sql Injection Prevention ❌ Error pkg/db/views.go concatenates reportEnd (from pinned-date-time flag) into DDL SQL: TO_TIMESTAMP('"+reportEnd.UTC().Format(...)"', ...), violating the no user-input SQL formatting rule. Refactor syncPostgresViews/syncPostgresMaterializedViews to avoid string concatenation: use a SQL placeholder (e.g., TO_TIMESTAMP($1, ...)) and pass reportEnd as an Exec argument.
No-Sensitive-Data-In-Logs ❌ Error PR logs DB connection errors via log.WithError(err).Error("could not connect to db") and benchmark code logs full file paths containing connName derived from the DSN host. Avoid logging raw connection errors that may include DSN/host; sanitize/redact errors and prevent logging benchmark file paths/names that embed connName (DSN host).
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Go Error Handling ⚠️ Warning Errors aren’t consistently wrapped and pointers are dereferenced without nil checks: pkg/flags/postgres.go PinnedTime.Set returns err directly; pkg/db/db.go New returns nil, err from ParseConfi... Wrap returned errors with fmt.Errorf(...: %w, err) (e.g., time.Parse, pgx.ParseConfig, gorm.Open) and add nil guards before dereferencing pointer receivers (e.g., if l==nil / validate Option not nil).
Excessive Css In React Should Use Styles ⚠️ Warning Inline React style={{...}} objects with 5 style properties found in sippy-ng/src/chat/ChatMessage.js and sippy-ng/src/tests/TestAnalysis.js (violates >3–4 props rule). Refactor those JSX inline style objects into a useStyles() hook or CSS module/styled component; keep only simple/dynamic inline styles.
Test Coverage For New Features ⚠️ Warning PR adds exported db.Option/WithPartitionwise + updated jobRunsReportMatView SQL fallback, but repo has no unit tests covering either (only integration benchmarks gated by env). Add unit tests for db.New option application (partitionwise runtime params) and a regression test validating jobRunsReportMatView computes test_failures via failed_test_results or prow_job_runs fallback.
✅ Passed checks (14 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Single Responsibility And Clear Naming ✅ Passed Changes keep responsibilities focused: db.New now uses a clear functional-option API (Option/WithPartitionwise), PostgresFlags/GetDBClient stay cohesive, and the matview SQL uses descriptive CTE na...
Stable And Deterministic Test Names ✅ Passed Checked repo for Ginkgo usage (no *_test.go imports of onsi/ginkgo or ginkgo/v2); therefore no Ginkgo test titles in this PR to flag.
Test Structure And Quality ✅ Passed PR changes only pkg/db/.go and pkg/flags/.go (incl. postgres_benchmarking_test.go); that file contains no Ginkgo Describe/It/BeforeEach/Eventually blocks, so quality check is N/A.
Microshift Test Compatibility ✅ Passed PR #3616 modifies only pkg/db/* and pkg/flags/* (incl. a Go benchmark); no new Ginkgo e2e tests detected (no ginkgo/Describe patterns found).
Single Node Openshift (Sno) Test Compatibility ✅ Passed PR #3616 changes only pkg/db and pkg/flags (incl. Go benchmarks in postgres_benchmarking_test.go); no new Ginkgo e2e tests were added, so no SNO multi-node/HA assumptions to flag.
Topology-Aware Scheduling Compatibility ✅ Passed PR changes only DB/SQL and Postgres connection/flags; no manifests, controllers, or scheduling constraints (affinity/topology spread/replicas/node selectors) were added.
Ote Binary Stdout Contract ✅ Passed Checked PR-touching files (pkg/db/db.go, pkg/db/views.go, pkg/flags/postgres.go, pkg/flags/postgres_benchmarking_test.go) for fmt/os.Stdout/klog stdout writes in main/init/TestMain/BeforeSuite/RunS...
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PR adds Go benchmark/matview changes (pkg/db/db.go, pkg/db/views.go, pkg/flags/postgres.go, pkg/flags/postgres_benchmarking_test.go) and no Ginkgo e2e tests; scanned postgres_benchmarking_test.go f...
No-Weak-Crypto ✅ Passed Scanned PR’s changed files (9 total) for md5/sha1/rc4/DES/3DES/blowfish/ECB and crypto/custom crypto + ConstantTimeCompare; none found. Only non-ct compare was schema-hash strings (non-secret).
Container-Privileges ✅ Passed PR #3616 only changes Go files; scanning the PR diff shows no Kubernetes/container privilege fields (privileged, hostPID/hostNetwork/hostIPC, SYS_ADMIN, allowPrivilegeEscalation, runAsUser/root).
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title specifically references the main change: updating jobRunsReportMatView post partitioning, which aligns with the primary modifications across the changeset.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from deepsm007 and smg247 June 12, 2026 15:36
@openshift-ci

openshift-ci Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: neisw

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 12, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
pkg/flags/postgres_benchmarking_test.go (1)

502-524: 💤 Low value

Consider verifying partitionwise settings are actually applied.

The benchmark logs when partitionwise mode is requested, but doesn't verify the PostgreSQL runtime parameters were successfully set. Consider adding a verification query to confirm the settings:

if dbFlags.EnablePartitionwise {
    var aggValue, joinValue string
    dbc.DB.Raw("SHOW enable_partitionwise_aggregate").Scan(&aggValue)
    dbc.DB.Raw("SHOW enable_partitionwise_join").Scan(&joinValue)
    t.Logf("partitionwise settings: aggregate=%s join=%s", aggValue, joinValue)
}

This would catch any issues with parameter application and provide confirmation in benchmark output.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/flags/postgres_benchmarking_test.go` around lines 502 - 524, When
dbFlags.EnablePartitionwise is true (in the test that constructs PostgresFlags
and calls GetDBClient), add a verification step that queries the runtime
parameters and logs them: use the returned DB client (dbc.DB) to run Raw("SHOW
enable_partitionwise_aggregate") and Raw("SHOW enable_partitionwise_join"), Scan
the results into local variables (e.g., aggValue, joinValue), check/handle any
scan errors, and t.Logf the retrieved values so the test confirms the settings
were actually applied.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/db/views.go`:
- Line 166: The new 90-day filter (prow_job_run_tests.prow_job_run_timestamp >=
CURRENT_TIMESTAMP - interval '90 days') in the CTEs changes behavior for older
job runs (making failed_test_results.test_count and
flaked_test_results.test_count NULL and causing the COALESCE fallback to
prow_job_runs.test_failures) and must be documented or reverted; update the SQL
file by adding a concise comment above the CTEs explaining whether this 90-day
window is intentional for partition/performance optimization, a data retention
policy, or an accidental change, and update the PR description to state the
rationale and any expected data/behavioral consequences (also apply the same
documentation to the second occurrence of the filter that mirrors this logic).

---

Nitpick comments:
In `@pkg/flags/postgres_benchmarking_test.go`:
- Around line 502-524: When dbFlags.EnablePartitionwise is true (in the test
that constructs PostgresFlags and calls GetDBClient), add a verification step
that queries the runtime parameters and logs them: use the returned DB client
(dbc.DB) to run Raw("SHOW enable_partitionwise_aggregate") and Raw("SHOW
enable_partitionwise_join"), Scan the results into local variables (e.g.,
aggValue, joinValue), check/handle any scan errors, and t.Logf the retrieved
values so the test confirms the settings were actually applied.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 00d19967-86fc-4af0-8f77-4655fd06ad77

📥 Commits

Reviewing files that changed from the base of the PR and between b6d8159 and f9e69d9.

📒 Files selected for processing (4)
  • pkg/db/db.go
  • pkg/db/views.go
  • pkg/flags/postgres.go
  • pkg/flags/postgres_benchmarking_test.go

Comment thread pkg/db/views.go Outdated
@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e

@neisw neisw force-pushed the trt-2703-matview-partitions branch from f9e69d9 to 1cb9932 Compare June 12, 2026 16:54

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/db/db.go`:
- Around line 69-70: The New constructor currently iterates options with "for _,
o := range opts { o(&cfg) }" which will panic if any Option is nil; update the
loop in New (pkg/db/db.go) to guard each Option by skipping nils (e.g., "if o ==
nil { continue }" before calling o(&cfg)) so the constructor is crash-safe when
given malformed or conditionally composed option slices.

In `@pkg/flags/postgres_benchmarking_test.go`:
- Line 1155: The DROP INDEX Exec call currently ignores errors; change the call
to capture the result and check the error returned from
dbc.DB.Exec(fmt.Sprintf("DROP INDEX IF EXISTS %s", indexName)) (e.g., assign _,
err := dbc.DB.Exec(...)) and, consistent with surrounding checks (lines
~1120,1124,1134), fail the test on error (use t.Fatalf or the same
error-handling helper used elsewhere), including the indexName and the error in
the message.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 72b6d1b8-4035-449f-8cda-cfd155e1dcbe

📥 Commits

Reviewing files that changed from the base of the PR and between f9e69d9 and 1cb9932.

📒 Files selected for processing (4)
  • pkg/db/db.go
  • pkg/db/views.go
  • pkg/flags/postgres.go
  • pkg/flags/postgres_benchmarking_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • pkg/flags/postgres.go
  • pkg/db/views.go

Comment thread pkg/db/db.go
Comment thread pkg/flags/postgres_benchmarking_test.go
@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e

@neisw

neisw commented Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

@coderabbitai - In general reviews of postgres_benchmarking_test.go are not necessary as these tests are not run by default and limited to explicit functional benchmark comparisons. But the db error logging was a valid concern and has been addressed.

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Tip

For best results, initiate chat on the files or code changes.

@neisw: Thanks for the context! I'll keep that in mind for future reviews — I'll skip detailed review of postgres_benchmarking_test.go since those tests are not run by default and are scoped to explicit functional benchmark comparisons. Glad the db error logging concern was useful and has been addressed.


✏️ Learnings added
Learnt from: neisw
URL: https://github.com/openshift/sippy/pull/3616

Timestamp: 2026-06-12T18:05:21.953Z
Learning: In the `github.com/openshift/sippy` repository, `pkg/flags/postgres_benchmarking_test.go` does not need to be reviewed. These tests are not run by default and are limited to explicit functional benchmark comparisons. Skip detailed review of this file in future PRs.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e

@openshift-ci

openshift-ci Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

@neisw: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@neisw

neisw commented Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

@coderabbitai can you do a followup review and update your responses based on the current state. Have the DB connection errors logging issues been resolved?

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
cat pkg/flags/postgres.go
echo "---"
cat pkg/db/db.go
echo "---"
cat pkg/db/views.go

Length of output: 35350



@neisw: Here's the follow-up assessment:

DB connection error logging — mostly addressed, with one minor nit

In pkg/flags/postgres.go, GetDBClient() now reads:

if err != nil {
    log.Error("could not connect to db")
    return nil, fmt.Errorf("could not connect to db: %w", err)
}

The original concern is substantially addressed — an error is now logged and the error is wrapped before being returned. However, log.Error("could not connect to db") doesn't attach the underlying error to the log entry itself. For easier debugging (e.g., in log aggregators that surface structured fields), it would be slightly more informative to use:

log.WithError(err).Error("could not connect to db")

The wrapped return value (%w) does preserve the error for callers, so this is a minor nit rather than a blocking issue.


Overall state of the other changes looks good:

  • pkg/db/db.go — The functional-options pattern (WithPartitionwise) is clean and the pgxConfig.RuntimeParams approach for enable_partitionwise_aggregate/enable_partitionwise_join is appropriate.
  • pkg/db/views.go — The consolidated test_results CTE grouping by both prow_job_run_id and prow_job_run_release, the 90-day window constraint, the COALESCE(test_results.failed_test_count, prow_job_runs.test_failures) fallback, and the tightened LEFT JOIN are all correct and well-motivated by the partitioning changes.

Comment thread pkg/db/views.go
test_results.flaked_test_names AS flaked_test_names,
test_results.flaked_test_count AS test_flakes,
test_results.failed_test_names AS failed_test_names,
COALESCE(test_results.failed_test_count, prow_job_runs.test_failures) AS test_failures,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me to understand the purpose of this COALESCE? Are we trying to avoid a NULL in the scenario where there are no failed tests in the past 90 days? Or are we trying to use prow_job_runs.test_failures as a fallback in the case of data quality issues?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it as fallback. If we don't have a value in our our query result then fallback to prow_job_runs value. The main issue in the bug was a lack of explanation as to why the data wasn't present in the matview when we did have the data in the tables.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the value in prow_job_runs.test_failures compare with the value in test_results.failed_test_count in the happy scenario where the data is present? If the data is the same, can we always use prow_job_runs.test_failures instead? If the data is not the same, what are the implications for the consumer of the data if they get one versus the other?

Moreover, I am concerned that we are putting a bandaid in for an issue that we are fixing via other means. If an issue of this nature happens again in the future for another reason, will we lose visibility into their even being an issue by serving data from the other table? In other words, if this COALESCE had been present already, would we have even known that there was a problem with the matview that needed to be fixed?

Comment thread pkg/db/views.go
FROM
prow_pull_requests
INNER JOIN
prow_job_run_prow_pull_requests ON prow_job_run_prow_pull_requests.prow_pull_request_id = prow_pull_requests.id

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you looked into whether there would be improvements from adding the 90-day time constraint here, to allow the seq scan to eliminate rows that could not possibly be contenders?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No but we want to prevent any full scans going forward as we look to bring in more historical data so I will look into that.

dbc, err := dbFlags.GetDBClient()
if err != nil {
t.Fatalf("couldn't get DB client: %v", err)
t.Fatal("couldn't get DB client")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the rationale behind removing the error from the log event?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This came from the concern that dsn might be leaked via the error. I am open to err on the side of caution though not overly concerned by the previous fatal with error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants