OPNET-602: Fail haproxy monitor test when log collection fails by mkowalski · Pull Request #31292 · openshift/origin

mkowalski · 2026-06-11T16:12:12Z

Implements https://issues.redhat.com/browse/OPNET-602 (mirrored at https://redhat.atlassian.net/browse/OPNET-602).

As suggested in #29149 (comment), the haproxy reachability test itself should flake, but a failed test should mean collection errors. So far the on-prem haproxy monitor test could not distinguish between "no data because the platform has no haproxy" and "no data because collection silently broke": on AWS and other cloud platforms it produced vacuous test passes, and on on-prem clusters a renamed namespace or label would silently disable the test forever. This also answers the follow-up question in that thread about a programmatic platform.isOnPrem() check.

Changes:

notSupportedPlatformReason() (new platform.go): the monitor test is supported only where the haproxy api-lb static pods are actually deployed - platform BareMetal/OpenStack/vSphere with an API VIP configured and the OpenShiftManagedDefault loadbalancer. vSphere UPI (no VIP), user-managed LB deployments and all other platforms are not.
PrepareCollection reads the Infrastructure config and stores a monitortestframework.NotSupportedError, which all phases return on unsupported platforms - the framework turns this into junit skips instead of today's vacuous passes. Clusters without the infrastructure config (e.g. MicroShift) are skipped via the IsNotFound check.
On supported platforms, scanAllOperatorPods now returns an error when it finds no haproxy pods at all, guarding against silent rot of the namespace/label assumptions. Together with the errors already returned when reading pod logs fails, the framework converts those into a failed junit for the collection phase ("failed during collection"), which fails the run - a failed test now always means collection errors.
Unit tests for the platform gating.

Verified with go build, go vet and go test ./pkg/monitortests/network/onpremhaproxy/... (all tests pass).

Summary by CodeRabbit

Bug Fixes
- Improved platform detection for on-premises API loadbalancer monitoring to properly identify unsupported platforms and report reasons accordingly.
- Enhanced error handling to fail collection when HAProxy pods are not found, replacing silent failures.
Tests
- Added comprehensive test coverage for platform compatibility validation across various cluster configurations.

So far the on-prem haproxy monitor test could not distinguish between "no data because the platform has no haproxy" and "no data because collection silently broke". On AWS and other cloud platforms it produced vacuous test passes, and on on-prem clusters a renamed namespace or label would silently disable the test forever. Gate the monitor test on the platform: read the Infrastructure config in PrepareCollection and skip all phases with NotSupportedError unless the cluster is BareMetal/OpenStack/vSphere with an API VIP and the OpenShift-managed loadbalancer, i.e. the deployments that run the haproxy api-lb static pods. Clusters without the infrastructure config (e.g. MicroShift) are skipped as well. On supported platforms finding no haproxy pods at all is now reported as an error. Together with the errors already returned when reading the logs fails, the framework turns those into a failed junit for the collection phase, so a failed test now always means collection errors. Implements Vadim's suggestion from openshift#29149 (comment)

openshift-merge-bot · 2026-06-11T16:12:15Z

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

openshift-ci-robot · 2026-06-11T16:12:16Z

@mkowalski: This pull request references OPNET-602 which is a valid jira issue.

Details

In response to this:

Implements https://issues.redhat.com/browse/OPNET-602 (mirrored at https://redhat.atlassian.net/browse/OPNET-602).

As suggested in #29149 (comment), the haproxy reachability test itself should flake, but a failed test should mean collection errors. So far the on-prem haproxy monitor test could not distinguish between "no data because the platform has no haproxy" and "no data because collection silently broke": on AWS and other cloud platforms it produced vacuous test passes, and on on-prem clusters a renamed namespace or label would silently disable the test forever. This also answers the follow-up question in that thread about a programmatic platform.isOnPrem() check.

Changes:

notSupportedPlatformReason() (new platform.go): the monitor test is supported only where the haproxy api-lb static pods are actually deployed - platform BareMetal/OpenStack/vSphere with an API VIP configured and the OpenShiftManagedDefault loadbalancer. vSphere UPI (no VIP), user-managed LB deployments and all other platforms are not.

PrepareCollection reads the Infrastructure config and stores a monitortestframework.NotSupportedError, which all phases return on unsupported platforms - the framework turns this into junit skips instead of today's vacuous passes. Clusters without the infrastructure config (e.g. MicroShift) are skipped via the IsNotFound check.

On supported platforms, scanAllOperatorPods now returns an error when it finds no haproxy pods at all, guarding against silent rot of the namespace/label assumptions. Together with the errors already returned when reading pod logs fails, the framework converts those into a failed junit for the collection phase ("failed during collection"), which fails the run - a failed test now always means collection errors.

Unit tests for the platform gating.

Verified with go build, go vet and go test ./pkg/monitortests/network/onpremhaproxy/... (all tests pass).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai · 2026-06-11T16:12:34Z

Walkthrough

This PR adds platform detection to skip the on-premises HA proxy monitor test when the cluster infrastructure does not support OpenShift-managed on-prem API loadbalancing. It introduces a helper function to check platform eligibility, adds comprehensive test coverage, and integrates the check into the monitor test collection pipeline to short-circuit early when unsupported.

Changes

On-Premises Monitor Test Platform Gating

Layer / File(s)	Summary
Platform detection helper and unit tests `pkg/monitortests/network/onpremhaproxy/platform.go`, `pkg/monitortests/network/onpremhaproxy/platform_test.go`	`notSupportedPlatformReason` inspects cluster infrastructure platform type and API VIP configuration to determine on-prem HA proxy monitor eligibility. Table-driven test covers BareMetal, OpenStack, vSphere, AWS, Nutanix variants, API VIP fields, and load balancer configurations.
Monitor test pipeline integration `pkg/monitortests/network/onpremhaproxy/monitortest.go`	`operatorLogAnalyzer` fetches cluster `Infrastructure` config during `PrepareCollection` and uses platform detection to set `notSupportedReason`. All downstream phases (`StartCollection`, `CollectData`, `ConstructComputedIntervals`, `EvaluateTestsFromConstructedIntervals`, `WriteContentToStorage`) return early when unsupported. Pod scanning now fails explicitly when no haproxy pods are found.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 14 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (14 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the main change: making the haproxy monitor test fail when log collection fails, which aligns with the PR's primary objective of distinguishing failures from unsupported platforms.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names	✅ Passed	PR contains no Ginkgo tests. The only test added is a standard Go table-driven test with static, deterministic names unrelated to Ginkgo framework.
Test Structure And Quality	✅ Passed	PR adds standard Go unit tests, not Ginkgo tests. Table-driven test with 14 focused subtests; each tests one platform scenario. Assertion messages are descriptive. No cluster resources to setup/cle...
Microshift Test Compatibility	✅ Passed	No Ginkgo e2e tests are added in this PR. The only new test file (platform_test.go) uses standard Go testing (testing.T), not Ginkgo framework. The check does not apply.
Single Node Openshift (Sno) Test Compatibility	✅ Passed	No Ginkgo e2e tests (It(), Describe(), Context(), When()) added; PR contains only standard Go unit tests and monitor framework code, so SNO compatibility check is not applicable.
Topology-Aware Scheduling Compatibility	✅ Passed	PR contains only monitoring test code with no deployment manifests, scheduling constraints, replica assumptions, or topology-specific configurations that would affect workload scheduling.
Ote Binary Stdout Contract	✅ Passed	No stdout writes found in process-level code. Files contain only interface implementations, helper functions, and unit tests with no main(), init(), BeforeSuite(), or similar process-level functions.
Ipv6 And Disconnected Network Test Compatibility	✅ Passed	No Ginkgo e2e tests are added in this PR. The new platform_test.go is a standard Go unit test (using testing.T), not a Ginkgo e2e test, so the check does not apply.
No-Weak-Crypto	✅ Passed	No weak cryptographic algorithms (MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB), custom crypto implementations, or non-constant-time secret comparisons detected. The PR modifies haproxy monitoring logi...
Container-Privileges	✅ Passed	PR contains only Go source files with no container/Kubernetes manifests or privileged container configurations to flag.
No-Sensitive-Data-In-Logs	✅ Passed	All logging in the PR uses safe, non-sensitive data: only node names, platform/LB types, and parsed hostnames/IPs via restricted regex. No passwords, tokens, API keys, PII, or customer data are log...

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci · 2026-06-11T16:13:37Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mkowalski

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [mkowalski]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

pkg/monitortests/network/onpremhaproxy/platform_test.go (1)

67-142: ⚡ Quick win

Add LB-type coverage for OpenStack and vSphere branches.

Line 81 validates LoadBalancerTypeUserManaged only for BareMetal, but notSupportedPlatformReason applies the same LB-type gating to OpenStack and vSphere. Add equivalent cases so branch behavior can’t regress silently.

Proposed test additions

 		{
 			name: "openstack without API VIP",
 			platformStatus: &configv1.PlatformStatus{
 				Type:      configv1.OpenStackPlatformType,
 				OpenStack: &configv1.OpenStackPlatformStatus{},
 			},
 			expectSupported: false,
 		},
+		{
+			name: "openstack with user-managed loadbalancer",
+			platformStatus: &configv1.PlatformStatus{
+				Type: configv1.OpenStackPlatformType,
+				OpenStack: &configv1.OpenStackPlatformStatus{
+					APIServerInternalIPs: []string{"10.0.0.5"},
+					LoadBalancer: &configv1.OpenStackPlatformLoadBalancer{
+						Type: configv1.LoadBalancerTypeUserManaged,
+					},
+				},
+			},
+			expectSupported: false,
+		},
 		{
 			name: "vsphere IPI with API VIPs",
 			platformStatus: &configv1.PlatformStatus{
 				Type: configv1.VSpherePlatformType,
 				VSphere: &configv1.VSpherePlatformStatus{
 					APIServerInternalIPs: []string{"10.0.0.5"},
 				},
 			},
 			expectSupported: true,
 		},
+		{
+			name: "vsphere with user-managed loadbalancer",
+			platformStatus: &configv1.PlatformStatus{
+				Type: configv1.VSpherePlatformType,
+				VSphere: &configv1.VSpherePlatformStatus{
+					APIServerInternalIPs: []string{"10.0.0.5"},
+					LoadBalancer: &configv1.VSpherePlatformLoadBalancer{
+						Type: configv1.LoadBalancerTypeUserManaged,
+					},
+				},
+			},
+			expectSupported: false,
+		},

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/monitortests/network/onpremhaproxy/platform_test.go` around lines 67 -
142, The table-driven tests in platform_test.go add load-balancer type checks
for BareMetal but miss equivalent cases for OpenStack and VSphere even though
notSupportedPlatformReason enforces LB-type gating for those platforms; update
the test vector in platform_test.go to include cases for OpenStack and VSphere
with a LoadBalancer set to LoadBalancerTypeUserManaged (expectSupported: false)
and with LoadBalancerTypeOpenShiftManagedDefault or an APIServerInternalIPs
entry (expectSupported: true) so the test covers LB-type gating the same way as
BareMetal and prevents silent regressions in notSupportedPlatformReason.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/monitortests/network/onpremhaproxy/monitortest.go`:
- Around line 106-111: The check currently only fails when len(infraPods) == 0,
but if pods exist and none include a haproxy container the test still passes;
update the logic after the haproxy container filter (the code that iterates
infraPods and picks containers named "haproxy") to count total haproxy
containers (e.g., totalHaproxyContainers) and return the same fmt.Errorf when
that count == 0; locate the filtering loop that references infraPods and the
haproxy container name and add the zero-container check there so the monitor
fails when no haproxy containers are scanned.

---

Nitpick comments:
In `@pkg/monitortests/network/onpremhaproxy/platform_test.go`:
- Around line 67-142: The table-driven tests in platform_test.go add
load-balancer type checks for BareMetal but miss equivalent cases for OpenStack
and VSphere even though notSupportedPlatformReason enforces LB-type gating for
those platforms; update the test vector in platform_test.go to include cases for
OpenStack and VSphere with a LoadBalancer set to LoadBalancerTypeUserManaged
(expectSupported: false) and with LoadBalancerTypeOpenShiftManagedDefault or an
APIServerInternalIPs entry (expectSupported: true) so the test covers LB-type
gating the same way as BareMetal and prevents silent regressions in
notSupportedPlatformReason.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 7a591e67-ccf4-424e-acd3-8bc551c392eb

📥 Commits

Reviewing files that changed from the base of the PR and between bde16f5 and b4a2d0f.

📒 Files selected for processing (3)

pkg/monitortests/network/onpremhaproxy/monitortest.go
pkg/monitortests/network/onpremhaproxy/platform.go
pkg/monitortests/network/onpremhaproxy/platform_test.go

openshift-merge-bot · 2026-06-11T16:40:50Z

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

openshift-ci · 2026-06-11T20:43:52Z

@mkowalski: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-gcp-ovn-upgrade	`b4a2d0f`	link	true	`/test e2e-gcp-ovn-upgrade`
ci/prow/e2e-aws-ovn-microshift	`b4a2d0f`	link	true	`/test e2e-aws-ovn-microshift`
ci/prow/e2e-vsphere-ovn	`b4a2d0f`	link	true	`/test e2e-vsphere-ovn`
ci/prow/e2e-vsphere-ovn-upi	`b4a2d0f`	link	true	`/test e2e-vsphere-ovn-upi`
ci/prow/e2e-aws-ovn-serial-2of2	`b4a2d0f`	link	true	`/test e2e-aws-ovn-serial-2of2`
ci/prow/e2e-gcp-ovn	`b4a2d0f`	link	true	`/test e2e-gcp-ovn`
ci/prow/e2e-aws-ovn-serial-1of2	`b4a2d0f`	link	true	`/test e2e-aws-ovn-serial-1of2`
ci/prow/e2e-aws-ovn-fips	`b4a2d0f`	link	true	`/test e2e-aws-ovn-fips`
ci/prow/e2e-aws-ovn-microshift-serial	`b4a2d0f`	link	true	`/test e2e-aws-ovn-microshift-serial`
ci/prow/e2e-metal-ipi-ovn-ipv6	`b4a2d0f`	link	true	`/test e2e-metal-ipi-ovn-ipv6`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 11, 2026

openshift-ci Bot requested review from deads2k and sjenning June 11, 2026 16:13

openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 11, 2026

coderabbitai Bot requested changes Jun 11, 2026

View reviewed changes

Comment thread pkg/monitortests/network/onpremhaproxy/monitortest.go

coderabbitai Bot approved these changes Jun 11, 2026

View reviewed changes

openshift-ci Bot added the ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review label Jun 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OPNET-602: Fail haproxy monitor test when log collection fails#31292

OPNET-602: Fail haproxy monitor test when log collection fails#31292
mkowalski wants to merge 1 commit into
openshift:mainfrom
mkowalski:opnet-602-haproxy-collection-failure

mkowalski commented Jun 11, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

openshift-merge-bot Bot commented Jun 11, 2026

Uh oh!

openshift-ci-robot commented Jun 11, 2026 •

edited by openshift-ci Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 11, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

openshift-ci Bot commented Jun 11, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

openshift-merge-bot Bot commented Jun 11, 2026

Uh oh!

openshift-ci Bot commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mkowalski commented Jun 11, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

openshift-merge-bot Bot commented Jun 11, 2026

Uh oh!

openshift-ci-robot commented Jun 11, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

openshift-ci Bot commented Jun 11, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

openshift-merge-bot Bot commented Jun 11, 2026

Uh oh!

openshift-ci Bot commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mkowalski commented Jun 11, 2026 •

edited by coderabbitai Bot

Loading

openshift-ci-robot commented Jun 11, 2026 •

edited by openshift-ci Bot

Loading

coderabbitai Bot commented Jun 11, 2026 •

edited

Loading