OPNET-602: Fail haproxy monitor test when log collection fails#31292
OPNET-602: Fail haproxy monitor test when log collection fails#31292mkowalski wants to merge 1 commit into
Conversation
So far the on-prem haproxy monitor test could not distinguish between "no data because the platform has no haproxy" and "no data because collection silently broke". On AWS and other cloud platforms it produced vacuous test passes, and on on-prem clusters a renamed namespace or label would silently disable the test forever. Gate the monitor test on the platform: read the Infrastructure config in PrepareCollection and skip all phases with NotSupportedError unless the cluster is BareMetal/OpenStack/vSphere with an API VIP and the OpenShift-managed loadbalancer, i.e. the deployments that run the haproxy api-lb static pods. Clusters without the infrastructure config (e.g. MicroShift) are skipped as well. On supported platforms finding no haproxy pods at all is now reported as an error. Together with the errors already returned when reading the logs fails, the framework turns those into a failed junit for the collection phase, so a failed test now always means collection errors. Implements Vadim's suggestion from openshift#29149 (comment)
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
|
@mkowalski: This pull request references OPNET-602 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughThis PR adds platform detection to skip the on-premises HA proxy monitor test when the cluster infrastructure does not support OpenShift-managed on-prem API loadbalancing. It introduces a helper function to check platform eligibility, adds comprehensive test coverage, and integrates the check into the monitor test collection pipeline to short-circuit early when unsupported. ChangesOn-Premises Monitor Test Platform Gating
🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 14 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (14 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mkowalski The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
pkg/monitortests/network/onpremhaproxy/platform_test.go (1)
67-142: ⚡ Quick winAdd LB-type coverage for OpenStack and vSphere branches.
Line 81 validates
LoadBalancerTypeUserManagedonly for BareMetal, butnotSupportedPlatformReasonapplies the same LB-type gating to OpenStack and vSphere. Add equivalent cases so branch behavior can’t regress silently.Proposed test additions
{ name: "openstack without API VIP", platformStatus: &configv1.PlatformStatus{ Type: configv1.OpenStackPlatformType, OpenStack: &configv1.OpenStackPlatformStatus{}, }, expectSupported: false, }, + { + name: "openstack with user-managed loadbalancer", + platformStatus: &configv1.PlatformStatus{ + Type: configv1.OpenStackPlatformType, + OpenStack: &configv1.OpenStackPlatformStatus{ + APIServerInternalIPs: []string{"10.0.0.5"}, + LoadBalancer: &configv1.OpenStackPlatformLoadBalancer{ + Type: configv1.LoadBalancerTypeUserManaged, + }, + }, + }, + expectSupported: false, + }, { name: "vsphere IPI with API VIPs", platformStatus: &configv1.PlatformStatus{ Type: configv1.VSpherePlatformType, VSphere: &configv1.VSpherePlatformStatus{ APIServerInternalIPs: []string{"10.0.0.5"}, }, }, expectSupported: true, }, + { + name: "vsphere with user-managed loadbalancer", + platformStatus: &configv1.PlatformStatus{ + Type: configv1.VSpherePlatformType, + VSphere: &configv1.VSpherePlatformStatus{ + APIServerInternalIPs: []string{"10.0.0.5"}, + LoadBalancer: &configv1.VSpherePlatformLoadBalancer{ + Type: configv1.LoadBalancerTypeUserManaged, + }, + }, + }, + expectSupported: false, + },🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/monitortests/network/onpremhaproxy/platform_test.go` around lines 67 - 142, The table-driven tests in platform_test.go add load-balancer type checks for BareMetal but miss equivalent cases for OpenStack and VSphere even though notSupportedPlatformReason enforces LB-type gating for those platforms; update the test vector in platform_test.go to include cases for OpenStack and VSphere with a LoadBalancer set to LoadBalancerTypeUserManaged (expectSupported: false) and with LoadBalancerTypeOpenShiftManagedDefault or an APIServerInternalIPs entry (expectSupported: true) so the test covers LB-type gating the same way as BareMetal and prevents silent regressions in notSupportedPlatformReason.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/monitortests/network/onpremhaproxy/monitortest.go`:
- Around line 106-111: The check currently only fails when len(infraPods) == 0,
but if pods exist and none include a haproxy container the test still passes;
update the logic after the haproxy container filter (the code that iterates
infraPods and picks containers named "haproxy") to count total haproxy
containers (e.g., totalHaproxyContainers) and return the same fmt.Errorf when
that count == 0; locate the filtering loop that references infraPods and the
haproxy container name and add the zero-container check there so the monitor
fails when no haproxy containers are scanned.
---
Nitpick comments:
In `@pkg/monitortests/network/onpremhaproxy/platform_test.go`:
- Around line 67-142: The table-driven tests in platform_test.go add
load-balancer type checks for BareMetal but miss equivalent cases for OpenStack
and VSphere even though notSupportedPlatformReason enforces LB-type gating for
those platforms; update the test vector in platform_test.go to include cases for
OpenStack and VSphere with a LoadBalancer set to LoadBalancerTypeUserManaged
(expectSupported: false) and with LoadBalancerTypeOpenShiftManagedDefault or an
APIServerInternalIPs entry (expectSupported: true) so the test covers LB-type
gating the same way as BareMetal and prevents silent regressions in
notSupportedPlatformReason.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 7a591e67-ccf4-424e-acd3-8bc551c392eb
📒 Files selected for processing (3)
pkg/monitortests/network/onpremhaproxy/monitortest.gopkg/monitortests/network/onpremhaproxy/platform.gopkg/monitortests/network/onpremhaproxy/platform_test.go
|
Scheduling required tests: |
|
@mkowalski: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Implements https://issues.redhat.com/browse/OPNET-602 (mirrored at https://redhat.atlassian.net/browse/OPNET-602).
As suggested in #29149 (comment), the haproxy reachability test itself should flake, but a failed test should mean collection errors. So far the on-prem haproxy monitor test could not distinguish between "no data because the platform has no haproxy" and "no data because collection silently broke": on AWS and other cloud platforms it produced vacuous test passes, and on on-prem clusters a renamed namespace or label would silently disable the test forever. This also answers the follow-up question in that thread about a programmatic
platform.isOnPrem()check.Changes:
notSupportedPlatformReason()(newplatform.go): the monitor test is supported only where the haproxy api-lb static pods are actually deployed - platform BareMetal/OpenStack/vSphere with an API VIP configured and theOpenShiftManagedDefaultloadbalancer. vSphere UPI (no VIP), user-managed LB deployments and all other platforms are not.PrepareCollectionreads the Infrastructure config and stores amonitortestframework.NotSupportedError, which all phases return on unsupported platforms - the framework turns this into junit skips instead of today's vacuous passes. Clusters without the infrastructure config (e.g. MicroShift) are skipped via theIsNotFoundcheck.scanAllOperatorPodsnow returns an error when it finds no haproxy pods at all, guarding against silent rot of the namespace/label assumptions. Together with the errors already returned when reading pod logs fails, the framework converts those into a failed junit for the collection phase ("failed during collection"), which fails the run - a failed test now always means collection errors.Verified with
go build,go vetandgo test ./pkg/monitortests/network/onpremhaproxy/...(all tests pass).Summary by CodeRabbit
Bug Fixes
Tests