TRT-2703: update jobRunsReportMatView post partitioning#3616
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
|
@neisw: This pull request references TRT-2703 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughAdds a functional-option ( ChangesPostgreSQL Partitionwise Optimization Feature
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested labels
Suggested reviewers
Caution Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional.
❌ Failed checks (2 errors, 4 warnings)
✅ Passed checks (14 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: neisw The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
pkg/flags/postgres_benchmarking_test.go (1)
502-524: 💤 Low valueConsider verifying partitionwise settings are actually applied.
The benchmark logs when partitionwise mode is requested, but doesn't verify the PostgreSQL runtime parameters were successfully set. Consider adding a verification query to confirm the settings:
if dbFlags.EnablePartitionwise { var aggValue, joinValue string dbc.DB.Raw("SHOW enable_partitionwise_aggregate").Scan(&aggValue) dbc.DB.Raw("SHOW enable_partitionwise_join").Scan(&joinValue) t.Logf("partitionwise settings: aggregate=%s join=%s", aggValue, joinValue) }This would catch any issues with parameter application and provide confirmation in benchmark output.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/flags/postgres_benchmarking_test.go` around lines 502 - 524, When dbFlags.EnablePartitionwise is true (in the test that constructs PostgresFlags and calls GetDBClient), add a verification step that queries the runtime parameters and logs them: use the returned DB client (dbc.DB) to run Raw("SHOW enable_partitionwise_aggregate") and Raw("SHOW enable_partitionwise_join"), Scan the results into local variables (e.g., aggValue, joinValue), check/handle any scan errors, and t.Logf the retrieved values so the test confirms the settings were actually applied.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/db/views.go`:
- Line 166: The new 90-day filter (prow_job_run_tests.prow_job_run_timestamp >=
CURRENT_TIMESTAMP - interval '90 days') in the CTEs changes behavior for older
job runs (making failed_test_results.test_count and
flaked_test_results.test_count NULL and causing the COALESCE fallback to
prow_job_runs.test_failures) and must be documented or reverted; update the SQL
file by adding a concise comment above the CTEs explaining whether this 90-day
window is intentional for partition/performance optimization, a data retention
policy, or an accidental change, and update the PR description to state the
rationale and any expected data/behavioral consequences (also apply the same
documentation to the second occurrence of the filter that mirrors this logic).
---
Nitpick comments:
In `@pkg/flags/postgres_benchmarking_test.go`:
- Around line 502-524: When dbFlags.EnablePartitionwise is true (in the test
that constructs PostgresFlags and calls GetDBClient), add a verification step
that queries the runtime parameters and logs them: use the returned DB client
(dbc.DB) to run Raw("SHOW enable_partitionwise_aggregate") and Raw("SHOW
enable_partitionwise_join"), Scan the results into local variables (e.g.,
aggValue, joinValue), check/handle any scan errors, and t.Logf the retrieved
values so the test confirms the settings were actually applied.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 00d19967-86fc-4af0-8f77-4655fd06ad77
📒 Files selected for processing (4)
pkg/db/db.gopkg/db/views.gopkg/flags/postgres.gopkg/flags/postgres_benchmarking_test.go
|
Scheduling required tests: |
f9e69d9 to
1cb9932
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/db/db.go`:
- Around line 69-70: The New constructor currently iterates options with "for _,
o := range opts { o(&cfg) }" which will panic if any Option is nil; update the
loop in New (pkg/db/db.go) to guard each Option by skipping nils (e.g., "if o ==
nil { continue }" before calling o(&cfg)) so the constructor is crash-safe when
given malformed or conditionally composed option slices.
In `@pkg/flags/postgres_benchmarking_test.go`:
- Line 1155: The DROP INDEX Exec call currently ignores errors; change the call
to capture the result and check the error returned from
dbc.DB.Exec(fmt.Sprintf("DROP INDEX IF EXISTS %s", indexName)) (e.g., assign _,
err := dbc.DB.Exec(...)) and, consistent with surrounding checks (lines
~1120,1124,1134), fail the test on error (use t.Fatalf or the same
error-handling helper used elsewhere), including the indexName and the error in
the message.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 72b6d1b8-4035-449f-8cda-cfd155e1dcbe
📒 Files selected for processing (4)
pkg/db/db.gopkg/db/views.gopkg/flags/postgres.gopkg/flags/postgres_benchmarking_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
- pkg/flags/postgres.go
- pkg/db/views.go
|
Scheduling required tests: |
|
@coderabbitai - In general reviews of postgres_benchmarking_test.go are not necessary as these tests are not run by default and limited to explicit functional benchmark comparisons. But the db error logging was a valid concern and has been addressed. |
|
Tip For best results, initiate chat on the files or code changes.
✏️ Learnings added
|
|
Scheduling required tests: |
|
@neisw: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
@coderabbitai can you do a followup review and update your responses based on the current state. Have the |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
cat pkg/flags/postgres.go
echo "---"
cat pkg/db/db.go
echo "---"
cat pkg/db/views.goLength of output: 35350
DB connection error logging — mostly addressed, with one minor nit In if err != nil {
log.Error("could not connect to db")
return nil, fmt.Errorf("could not connect to db: %w", err)
}The original concern is substantially addressed — an error is now logged and the error is wrapped before being returned. However, log.WithError(err).Error("could not connect to db")The wrapped return value ( Overall state of the other changes looks good:
|
| test_results.flaked_test_names AS flaked_test_names, | ||
| test_results.flaked_test_count AS test_flakes, | ||
| test_results.failed_test_names AS failed_test_names, | ||
| COALESCE(test_results.failed_test_count, prow_job_runs.test_failures) AS test_failures, |
There was a problem hiding this comment.
Can you help me to understand the purpose of this COALESCE? Are we trying to avoid a NULL in the scenario where there are no failed tests in the past 90 days? Or are we trying to use prow_job_runs.test_failures as a fallback in the case of data quality issues?
There was a problem hiding this comment.
I see it as fallback. If we don't have a value in our our query result then fallback to prow_job_runs value. The main issue in the bug was a lack of explanation as to why the data wasn't present in the matview when we did have the data in the tables.
There was a problem hiding this comment.
How does the value in prow_job_runs.test_failures compare with the value in test_results.failed_test_count in the happy scenario where the data is present? If the data is the same, can we always use prow_job_runs.test_failures instead? If the data is not the same, what are the implications for the consumer of the data if they get one versus the other?
Moreover, I am concerned that we are putting a bandaid in for an issue that we are fixing via other means. If an issue of this nature happens again in the future for another reason, will we lose visibility into their even being an issue by serving data from the other table? In other words, if this COALESCE had been present already, would we have even known that there was a problem with the matview that needed to be fixed?
| FROM | ||
| prow_pull_requests | ||
| INNER JOIN | ||
| prow_job_run_prow_pull_requests ON prow_job_run_prow_pull_requests.prow_pull_request_id = prow_pull_requests.id |
There was a problem hiding this comment.
Have you looked into whether there would be improvements from adding the 90-day time constraint here, to allow the seq scan to eliminate rows that could not possibly be contenders?
There was a problem hiding this comment.
No but we want to prevent any full scans going forward as we look to bring in more historical data so I will look into that.
| dbc, err := dbFlags.GetDBClient() | ||
| if err != nil { | ||
| t.Fatalf("couldn't get DB client: %v", err) | ||
| t.Fatal("couldn't get DB client") |
There was a problem hiding this comment.
What is the rationale behind removing the error from the log event?
There was a problem hiding this comment.
This came from the concern that dsn might be leaked via the error. I am open to err on the side of caution though not overly concerned by the previous fatal with error.
Updates matview to be aware of release and prow_job_run_timestamps
Adds fallback to prow_job_runs.test_failures if we are missing failed_test_results.test_count
Adds option to enable partitionwise planning optimizations
Summary by CodeRabbit
New Features
Improvements
Tests