feat(k8s): add k8s_wait_for_condition tool#58
Open
mesutoezdil wants to merge 42 commits into
Open
Conversation
Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com>
Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com>
Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com>
* fix all linter errors * add buildx --------- Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com>
Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com>
# Conflicts: # Makefile
Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com>
Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com>
Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com>
Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com>
Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com>
Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com>
Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com>
Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com>
Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com>
Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com>
Signed-off-by: Eitan Yarmush <eitan.yarmush@solo.io>
- added telemetry - security validations - structured logging - e2e tests Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com>
Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com>
* - 🐛 Fix stdio implementation - 🚀 Add quickstart guide for agentgateway - 📝 Update cursor MCP documentation Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com> * - 🐛 Fix stdio implementation - 🚀 Add quickstart guide for agentgateway - 📝 Update cursor MCP documentation Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com> * add homebrew path Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com> * increase default timeout Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com> * quickstart updated Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com> --------- Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com>
* set json format optional Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com> * set json format optional Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com> * set json format optional Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com> --------- Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com>
Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com>
* updated dependencies Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com> * ci go-version: "1.25" Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com> * fix agentgateway config Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com> --------- Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com>
Signed-off-by: Sara Qasmi <saraqasmi@Saras-MacBook-Pro.local> Co-authored-by: Sara Qasmi <saraqasmi@Saras-MacBook-Pro.local>
* dependencies update Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com> * readme Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com> * go mod Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com> * check latest GO version Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com> * actions/setup-go@v6 Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com> * actions/setup-go@v6 Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com> --------- Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com>
* TOOLS_ISTIO_VERSION ?= 1.28.3 TOOLS_KUBECTL_VERSION ?= 1.35.0 TOOLS_HELM_VERSION ?= 4.1.0 TOOLS_CILIUM_VERSION ?= 0.19.0 Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com> * helm-unittest install --verify=false Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com> --------- Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com>
* Add Kubescape integration - Introduced Kubescape tool support, including registration of various tools for health checks, vulnerability manifests, and configuration scans. - Implemented specific error handling for Kubescape-related operations, providing detailed suggestions based on error types. Signed-off-by: Ben <ben@armosec.io> * Enhance Kubescape tool by adding runtime observability features - Introduced checks for ApplicationProfiles and NetworkNeighborhoods CRDs in health checks, with corresponding recommendations for enabling runtime observability. - Added handlers for listing and retrieving ApplicationProfiles and NetworkNeighborhoods, capturing runtime behavior and network communication patterns of workloads. Signed-off-by: Ben <ben@armosec.io> * Fix linter errors: remove unused SBOM functions and suppress deprecated test warnings Signed-off-by: Ben <ben@armosec.io> * ci: increase golangci-lint timeout to 5m to prevent context deadline errors Signed-off-by: Ben <ben@armosec.io> * Updating timeouts for golint Signed-off-by: Ben <ben@armosec.io> --------- Signed-off-by: Ben <ben@armosec.io>
…ent-dev#43) * feat(helm): add enabledTools and extraArgs configuration options Add support for configuring tool-server CLI arguments via Helm values: - `tools.enabledTools`: List of tool providers to enable (maps to --tools flag) - `tools.extraArgs`: Additional command-line arguments for future flags Example usage: ```yaml tools: enabledTools: - k8s - helm - prometheus extraArgs: - "--some-future-flag" ``` This is a non-breaking change - empty lists (default) preserve current behavior. Signed-off-by: Matteo Mori <matteo.mori@rvu.co.uk> * refactor(helm): rename tools.extraArgs to tools.args Simplify the Helm values key name for additional CLI arguments. Signed-off-by: Matteo Mori <matteo.mori@rvu.co.uk> --------- Signed-off-by: Matteo Mori <matteo.mori@rvu.co.uk>
…ev#41) * feat: add --read-only flag to disable write operations Add a new `--read-only` CLI flag that disables tools which perform write operations (delete, patch, scale, create, apply, etc.). This enables deploying the MCP server in read-only mode for: - Observability-only use cases (monitoring, troubleshooting) - Environments with read-only service accounts - Compliance requirements separating read/write capabilities Tools are categorized as read-only or write operations: - K8s: 8 read-only, 14 write tools - Helm: 3 read-only, 3 write tools - Istio: 9 read-only, 3 write tools - Cilium: ~25 read-only, ~15 write tools - Argo: 4 read-only, 4 write tools - Prometheus/Kubescape/Utils: all read-only (unchanged)Signed-off-by: Matteo Mori <matteo.mori@rvu.co.uk> * fix: disable shell tool in read-only mode The utils provider exposes a `shell` tool that executes arbitrary commands, bypassing read-only restrictions. In read-only mode, this tool is now disabled. Also pass readOnly to all providers (kubescape, prometheus, utils) for consistency with the existing providers. Signed-off-by: Matteo Mori <matteo.mori@rvu.co.uk> --------- Signed-off-by: Matteo Mori <matteo.mori@rvu.co.uk>
…gent-dev#44) Upgrade all Go dependencies to latest versions and bump bundled CLI tools (kubectl 1.35.1, helm 4.1.1) to address HIGH severity vulnerabilities flagged by security scanning. Pin kubescape/storage to v0.0.239 (latest compatible release) as v0.2.0 removed APIs we depend on. 8 remaining HIGHs cannot be addressed as they originate from upstream pre-compiled binaries (istioctl 1.28.3, kubectl-argo-rollouts 1.8.3) which are already at their latest releases: ✅ TOOLS_ARGO_ROLLOUTS_VERSION=1.8.3 == v1.8.3 ✅ TOOLS_CILIUM_VERSION=0.19.0 == v0.19.0 ✅ TOOLS_ISTIO_VERSION=1.28.3 == 1.28.3 ❌ TOOLS_HELM_VERSION=4.1.0 != v4.1.1 (bumped) ❌ TOOLS_KUBECTL_VERSION=1.35.0 != v1.35.1 (bumped) Signed-off-by: Matteo Mori <matteo.mori@rvu.co.uk>
* feat: add token support for kubectl commands Signed-off-by: Eitan Yarmush <eitan.yarmush@solo.io> * use pre-v4 helm version Signed-off-by: Eitan Yarmush <eitan.yarmush@solo.io> * Add configuration to disable service token automount Signed-off-by: Jeremy Alvis <jeremy.alvis@solo.io> * Remove automountServiceAccountToken config Signed-off-by: Jeremy Alvis <jeremy.alvis@solo.io> * helm config for using default service account Signed-off-by: Jeremy Alvis <jeremy.alvis@solo.io> * Add tools.k8s.tokenPassthrough for requiring token from auth header Signed-off-by: Jeremy Alvis <jeremy.alvis@solo.io> * Fix helm version Signed-off-by: Jeremy Alvis <jeremy.alvis@solo.io> * Remove automountServiceAccountToken from helm test Signed-off-by: Jeremy Alvis <jeremy.alvis@solo.io> * Redact tokens Signed-off-by: Jeremy Alvis <jeremy.alvis@solo.io> --------- Signed-off-by: Eitan Yarmush <eitan.yarmush@solo.io> Signed-off-by: Jeremy Alvis <jeremy.alvis@solo.io> Co-authored-by: Jeremy Alvis <jeremy.alvis@solo.io>
…ent-dev#46) Signed-off-by: Matteo Mori <matteo.mori@rvu.co.uk>
* feat(metrics): implement Prometheus observability with dedicated server Replace generateRuntimeMetrics() with prometheus/client_golang and add flexible metrics server architecture supporting same-port or dedicated port deployment. Changes: - Add internal/metrics package with custom Prometheus registry - Configurable metrics port via --metrics-port flag (default: 8084) - Two-server architecture with proper WaitGroup coordination - Graceful shutdown for both main and metrics servers - Export kagent_tools_mcp_server_info (version metadata) - Export kagent_tools_mcp_registered_tools (tool providers) - Include Go runtime metrics (goroutines, memory, GC stats) - Include process metrics (CPU, memory, file descriptors) Architecture improvement: Move http.Server instantiation outside goroutines to prevent race condition between assignment and shutdown. Test coverage: 5 unit tests validating registry, collectors, and metrics.Signed-off-by: MatteoMori <morimatteo14@gmail.com> * feat(metrics): auto-register tool metrics using ListTools() diff Use MCPServer.ListTools() to automatically detect which tools each provider registers, eliminating the need to modify individual tool packages. The approach snapshots the tool list before and after each provider's RegisterTools() call, then records the newly added tools in Prometheus with the correct tool_provider label. This means: - Zero changes required in any pkg/ file - Future tools are automatically tracked - No risk of forgetting to add a metric for a new toolSigned-off-by: MatteoMori <morimatteo14@gmail.com> * feat(metrics): instrument tool handlers with invocation counters Add kagent_tools_mcp_invocations_total and kagent_tools_mcp_invocations_failure_total counters using the wrapper/middleware pattern. All handlers are centrally instrumented in wrapToolHandlersWithMetrics with zero changes to pkg/ files. Update README with Observability section and CLI flags reference.Signed-off-by: MatteoMori <morimatteo14@gmail.com> * feat(observability): add Helm chart support and Grafana dashboard Add comprehensive Prometheus Operator integration via Helm chart: - ServiceMonitor resource for automatic target discovery - Dedicated metrics service (kagent-tools-metrics) - Deployment args for --metrics-port configuration - Configurable scrape interval, timeout, and labels Include Grafana dashboard with 8 panels visualizing: - Server version and health metrics - Tool invocation rates by provider - Success/failure rates and trends - Top invoked tools table with heat mapping Add CLAUDE.md with architecture documentation covering: - Tool provider pattern and MCP server lifecycle - Observability architecture (metrics wrapper pattern) - Development commands and key implementation patterns - Helm chart structure and troubleshooting guideSigned-off-by: MatteoMori <morimatteo14@gmail.com> * fix(metrics): default metrics-port to 0 (same as --port) Previously --metrics-port defaulted to 8084, causing a mismatch when the server ran on any other port (e.g. E2E tests use port 18190). The metrics server would start on 8084 instead of sharing the main port, so /metrics was unreachable at the expected address. Change the default to 0, resolved at runtime as "same as --port". Update Helm templates to fall back to the main targetPort when tools.metrics.port is unset. Signed-off-by: MatteoMori <morimatteo14@gmail.com> * fix(metrics): count result.IsError as invocation failure The failure counter previously only incremented on non-nil Go errors. Handlers in this codebase signal tool-level failures by returning NewToolResultError(...), nil — result.IsError=true, err=nil — a pattern used 214 times across pkg/. This meant the failure metric was always 0 for tool-level errors. Fix the wrapper condition to check both: err != nil || (result != nil && result.IsError) Add three tests in cmd/metrics_wrap_test.go: - IsError=true increments failure counter (regression test) - Successful call does not increment failure counter - Real Go error increments failure counter Remove CLAUDE.md from the repository. Signed-off-by: MatteoMori <morimatteo14@gmail.com> --------- Signed-off-by: MatteoMori <morimatteo14@gmail.com>
…rade (kagent-dev#47) * fix(helm): use fullname in selector labels to prevent mismatch on upgrade Use kagent.fullname instead of kagent.name in selectorLabels so that changing nameOverride does not alter the app.kubernetes.io/name selector label. Deployment spec.selector.matchLabels is immutable in Kubernetes, so any label change causes a Service/Deployment selector mismatch after helm upgrade, leaving the Service with zero endpoints. With this fix, both the old config (fullnameOverride: kagent-tools) and the new config (nameOverride: tools) resolve to the same fullname "kagent-tools" for the default release name, keeping selectors stable across upgrades. Fixes kagent-dev/kagent#1427 Signed-off-by: Jaison Paul <paul.jaison@gmail.com> * fix(e2e): update label selectors to match fullname-based selector labels Update E2E tests to use app.kubernetes.io/instance label selector instead of app.kubernetes.io/name since the PR changes selectorLabels to use kagent.fullname. The fullname template returns the release name (kagent-tools-e2e), so the tests now use app.kubernetes.io/instance=<releaseName> which remains stable and matches the updated selector labels in the Helm chart. This fixes the E2E test failures where pods weren't being found because the label selector no longer matched after the selectorLabels change.Signed-off-by: Eitan Yarmush <eitan.yarmush@solo.io> --------- Signed-off-by: Jaison Paul <paul.jaison@gmail.com> Signed-off-by: Eitan Yarmush <eitan.yarmush@solo.io> Co-authored-by: Eitan Yarmush <eitan.yarmush@solo.io>
) Renames all helper templates from kagent.* to kagent-tools.* prefix to prevent naming conflicts with the parent kagent chart. When Helm renders subcharts, template definitions are global, causing the parent chart's helpers to override the subchart's helpers with the same names. This fixes: - Selector label mismatch when using nameOverride (was using parent's logic instead of subchart's fullname logic) - Helm upgrade failures due to immutable selector field changes - Enables proper use of nameOverride instead of requiring fullnameOverride workaround All helper references updated across all template files: - _helpers.tpl: Renamed 10 helper definitions - deployment.yaml, service.yaml, serviceaccount.yaml: Updated references - clusterrole.yaml, clusterrolebinding.yaml: Updated references - servicemonitor.yaml, NOTES.txt: Updated references Backward compatible: existing fullnameOverride usage continues to work. Signed-off-by: Eitan Yarmush <eitan.yarmush@solo.io>
…resource (kagent-dev#50) Signed-off-by: Felipe Vicens <felipejose.vicensgonzalez@telefonica.com>
…t-dev#52) Bump google.golang.org/grpc v1.78.0 -> v1.79.3 to fix CRITICAL CVE-2026-33186 (authorization bypass). Bump all bundled CLI tools to latest releases (kubectl 1.35.3, helm 4.1.3, istioctl 1.28.5, argo-rollouts 1.8.4, cilium 0.19.2) to reduce CVE surface area. Signed-off-by: Eitan Yarmush <eitan.yarmush@solo.io>
* namespaced rbac Signed-off-by: Jet Chiang <pokyuen.jetchiang-ext@solo.io> * oops forgot i renamed it Signed-off-by: Jet Chiang <pokyuen.jetchiang-ext@solo.io> --------- Signed-off-by: Jet Chiang <pokyuen.jetchiang-ext@solo.io>
Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com> * Fix incorrect cilium-dbg subcommands * Bump outdated tools: - Argo Rollouts: 1.8.4 → 1.9.0 - Istio: 1.28.5 → 1.29.1
…c.namespaces (kagent-dev#57) * namespaced rbac update with kagent Signed-off-by: Jet Chiang <pokyuen.jetchiang-ext@solo.io> * use proper helath check Signed-off-by: Jet Chiang <pokyuen.jetchiang-ext@solo.io> --------- Signed-off-by: Jet Chiang <pokyuen.jetchiang-ext@solo.io>
b8a975c to
33efb25
Compare
Wraps kubectl wait so agents can block on a resource condition in one call instead of polling with repeated kubectl get turns. Closes kagent-dev#56 Co-authored-by: alexis-brettes <133014848+alexis-brettes@users.noreply.github.com> Signed-off-by: mesutoezdil <mesudozdil@gmail.com>
33efb25 to
d157f4f
Compare
6 tasks
Author
|
@EItanya any news? |
d157f4f to
e1d968e
Compare
e1d968e to
2a72881
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds a new MCP tool
k8s_wait_for_conditionthat wrapskubectl waitand blocks until a Kubernetes resource reaches a specified condition or the timeout expires.Agents that deploy resources currently have to poll with repeated
kubectl getcalls in a loop. Each iteration is a full LLM turn, wasting tokens and adding latency.With this tool, a single blocking call replaces the loop:
Before:
After:
resource_typeresource_nameconditionnamespacetimeout_secondsSeven unit tests cover: success path, custom namespace and timeout, missing required parameters, zero timeout rejection, and kubectl timeout propagation.
Closes #56