[MLI-6891] feat: wire forwarder_max_concurrency through K8s forwarder template#836
Open
diazagasatya wants to merge 2 commits into
Open
[MLI-6891] feat: wire forwarder_max_concurrency through K8s forwarder template#836diazagasatya wants to merge 2 commits into
diazagasatya wants to merge 2 commits into
Conversation
Adds `--set "max_concurrency=${FORWARDER_MAX_CONCURRENCY}"` to all 6
http-forwarder containers in the service template (CircleCI + chart
variants). The flag is rendered conditionally — when the kwarg is
None, a new `_strip_optional_set_pairs` preprocessor in
`load_k8s_yaml` drops the `--set` arg-pair entirely, so existing
endpoints fall back to the forwarder's config-file default (100)
with zero behavior change.
Also fixes the description on `forwarder_max_concurrency` to reflect
the actual fallback (config-file default, not per_worker).
Follow-up to MLI-6876 (#835). Without this, the API field is
silently dropped at the manifest-rendering layer.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lilyz-ai
approved these changes
May 29, 2026
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Linear: MLI-6891
Follow-up to: #835 (MLI-6876)
Why
MLI-6876 added
forwarder_max_concurrencyto the API schema and plumbed it through toResourceArguments(11 injection sites). But the K8s manifest templates never referenced${FORWARDER_MAX_CONCURRENCY}— so the field was silently dropped at the rendering layer. API callers could set the field with no actual effect on the running forwarder.This PR is the load-bearing piece that makes the field actually work end-to-end on the llm-engine API path.
What changes
--set "max_concurrency=${FORWARDER_MAX_CONCURRENCY}"to all 6 http-forwarder containers inservice_template_config_map_circleci.yamland the corresponding 4 forwarder containers incharts/model-engine/templates/service_template_config_map.yaml_strip_optional_set_pairspreprocessor ink8s_endpoint_resource_delegate.pythat drops any- --set/- "...${KEY}..."arg pair from the template when KEY's value is None. Generic — usable by any future optional kwarg using the same patternforwarder_max_concurrency's description in the DTOs to reflect the actual fallback (config-file default, notper_worker— earlier docstring was inaccurate)Backward compatibility
Zero behavior change for any existing endpoint that doesn't explicitly set
forwarder_max_concurrency:forwarder_max_concurrency=None(current default)--set max_concurrencylineforwarder_max_concurrency=5--set "max_concurrency=5"Tests
Added three focused unit tests for
_strip_optional_set_pairs:test_strip_optional_set_pairs_drops_when_nonetest_strip_optional_set_pairs_keeps_when_settest_strip_optional_set_pairs_handles_unknown_keyAlso ran the full
test_k8s_endpoint_resource_delegate.pysuite locally — 59 passed, 0 failed, including the cross-cuttingtest_resource_arguments_type_and_add_datadog_env_to_main_containerthat renders every forwarder container with full substitution.Out of scope
descriptionstring on existing fields (no schema-level changes), so I skipped the regen step. Happy to add a follow-up commit with regen if reviewers prefer.Rollout chain status
After MLI-6891 + MLI-6876 are both in the next llm-engine release,
forwarder_max_concurrencyis functional end-to-end on the API path.🤖 Generated with Claude Code
Greptile Summary
This PR completes the
forwarder_max_concurrencyend-to-end wire-up by adding--set "max_concurrency=${FORWARDER_MAX_CONCURRENCY}"to forwarder container templates and introducing a_strip_optional_set_pairspreprocessor that cleanly removes the arg-pair from the rendered manifest when the value isNone, preserving backward compatibility.max_concurrencyarg to 6 of 7 http-forwarder containers in the CircleCI template and all 4 in the Helm chart; the LWS streaming GPU container in the CircleCI template was missed (see comment)._strip_optional_set_pairsas a generic preprocessor for optional--setkwargs — correctly placed beforesafe_substituteand covered by three new unit tests.forwarder_max_concurrencydocstring on both Create and Update DTOs to reflect the actual config-file fallback default (100) rather than the previously incorrectper_workerdescription.Confidence Score: 4/5
Safe to merge for all production traffic; only the CircleCI test template for LWS streaming GPU endpoints would render an incorrect manifest when
forwarder_max_concurrencyis explicitly set.The Helm chart (production) is correctly updated for all four forwarder containers. The one gap is the LWS streaming http-forwarder in the CircleCI template, which was not updated to include
max_concurrency. Any CI test that renders an LWS streaming GPU deployment with a non-Noneforwarder_max_concurrencywould produce a manifest that diverges from the production Helm output, potentially masking the omission during review.model-engine/model_engine_server/infra/gateways/resources/templates/service_template_config_map_circleci.yaml — the LWS streaming GPU forwarder block (~line 2797) is missing the two-line
max_concurrencyaddition present in every other updated container.Important Files Changed
_strip_optional_set_pairspreprocessor that removes--set "...${KEY}..."arg-pairs for None-valued kwargs before template substitution; correctly placed beforesafe_substitute. The generic nature of the preprocessor also silently changes behavior for other existing Optional kwargs (e.g.FORWARDER_TYPE=None), though that prior behavior was already broken.max_concurrency=${FORWARDER_MAX_CONCURRENCY}to 6 http-forwarder containers, but misses the LWS streaming GPU forwarder container (~line 2797), leaving it inconsistent with the Helm chart which correctly updated all 4 equivalent containers including its LWS streaming path.max_concurrency=${FORWARDER_MAX_CONCURRENCY}to all 4 forwarder containers (sync, streaming, LWS sync, LWS streaming), matching the expected coverage.forwarder_max_concurrencyon both Create and Update DTOs to say the fallback is the config-file default (100), notper_worker._strip_optional_set_pairscovering the drop-when-None, keep-when-set, and unknown-key no-op cases.Comments Outside Diff (1)
model-engine/model_engine_server/infra/gateways/resources/templates/service_template_config_map_circleci.yaml, line 2797-2800 (link)max_concurrencyin LWS streaming GPU forwarderThe LeaderWorkerSet streaming GPU http-forwarder container (starting ~line 2769) was not updated, while the Helm chart's equivalent path (
{{- else if eq $mode "streaming" }}in the LWS section, line 677) was correctly updated.LeaderWorkerSetRunnableImageStreamingGpuArgumentsextends_BaseDeploymentArguments, which declaresFORWARDER_MAX_CONCURRENCY: Optional[int], so when a non-None value is passed the rendered CI manifest will silently omit the--set max_concurrencyarg while the production manifest includes it. Add the two lines below after line 2799 to match every other forwarder container in this file:Prompt To Fix With AI
Prompt To Fix All With AI
Reviews (2): Last reviewed commit: "[MLI-6891] style: apply black 24.8.0 for..." | Re-trigger Greptile