Skip to content

improvement(mcp): bound MCP memory and lifecycle concurrency#4751

Merged
icecrasher321 merged 5 commits into
stagingfrom
improvement/mcp-mem-perf
May 27, 2026
Merged

improvement(mcp): bound MCP memory and lifecycle concurrency#4751
icecrasher321 merged 5 commits into
stagingfrom
improvement/mcp-mem-perf

Conversation

@icecrasher321
Copy link
Copy Markdown
Collaborator

Summary

MCP memory load caused high memory on ecs task and almost killed it

Enforce memory bounds similar to rest of platform

Type of Change

  • Other: Performance

Testing

Tested manually

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel
Copy link
Copy Markdown

vercel Bot commented May 27, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped May 27, 2026 6:59pm

Request Review

@cursor
Copy link
Copy Markdown

cursor Bot commented May 27, 2026

PR Summary

High Risk
Touches authentication bridging, workflow execute, and large payload paths across MCP serve and management APIs—mistakes could break tool calls or leak trust boundaries.

Overview
This PR hardens MCP and workflow execution against unbounded memory and races by adding 10MB-class limits on request/response bodies, tool metadata, and serialized tool results, with streaming reads that cancel on oversize or client disconnect (413 / 499).

The workflow MCP serve route (/api/mcp/serve/[serverId]) gains bounded JSON-RPC parsing, paginated tools/list, duplicate tool name detection (409), clearer transport/auth behavior (public GET metadata, SSE 405, DELETE always authenticated), and internal JWT bridge calls to workflow execute instead of forwarding client API keys. tools/call propagates upstream HTTP statuses, preserves falsy outputs, and aborts in-flight workflow fetches when the MCP client disconnects.

Workflow execute applies the same body limits, treats only trusted internal JWT + bridge headers as MCP bridge traffic (ignoring spoofed bridge headers on API keys), rejects large inline MCP outputs without large-value refs, and improves client-cancel handling for sync and SSE paths.

Management MCP routes use shared readMcpJsonBodyWithLimit / mcpBodyReadErrorResponse; refresh discovery is capped and concurrency-limited. Workflow MCP lifecycle adds per-server/per-workflow tool counts, metadata budgets, transactional advisory locks, and stricter validation on create/update/delete. Connection manager adds connect timeouts, jittered reconnect, and fixes cleanup so intentional disconnects do not reconnect. Execution payload compaction can reject oversized values instead of spilling when configured for MCP responses.

Reviewed by Cursor Bugbot for commit 54af13e. Configure here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 27, 2026

Greptile Summary

This PR addresses real memory pressure on ECS tasks by enforcing 10 MB caps at every MCP boundary: incoming request bodies, tools/list responses, workflow execution request/response, and tool call results. It also tightens lifecycle concurrency by wrapping sync operations in explicit transactions with advisory locks and SET LOCAL lock_timeout, adding cursor-based pagination to bulk queries, and introducing a per-server metadata budget enforced both at write time and at read time.

  • Body/response size limits are applied consistently across the MCP serve route, workflow execute route, and all MCP management routes via shared stream-limits utilities; the MCP bridge now strips raw API keys from forwarded headers and replaces them with short-lived internal JWTs plus an X-Sim-MCP-Tool-Actor header (trusted only on INTERNAL_JWT auth).
  • Lifecycle concurrency is guarded by FOR UPDATE row locks on workflow rows inside transactions, optimistic snapshot validation to detect races, WorkflowMcpServerFanoutError for membership budget overflows, and setWorkflowMcpTransactionLockTimeout added to all relevant transactions to prevent indefinite lock waits.
  • Abort-signal propagation is threaded through the entire MCP tool-call path and the synchronous workflow execute path, returning 499 responses on client disconnect rather than completing unnecessary work.

Confidence Score: 5/5

Safe to merge — the changes are well-bounded, thoroughly tested, and the behavioral impact is limited to adding enforcement that was previously absent.

The memory-bounding logic is implemented through shared utilities with good test coverage for all limit paths (content-length fast-path, streaming body, response size). The MCP bridge security gate correctly ties the trusted headers to INTERNAL_JWT auth. The abort-signal plumbing is consistent and all acquired resources are released in finally blocks. The two observations flagged are minor accuracy/documentation concerns with no runtime impact on correctness.

No files require special attention; the most complex file is route.ts for the MCP serve endpoint, but its behavior is well-covered by the expanded test suite.

Important Files Changed

Filename Overview
apps/sim/app/api/mcp/serve/[serverId]/route.ts Large rewrite adding body-size enforcement (10 MB cap on request, response, and tool-call), cursor-based tools/list pagination, duplicate-tool detection, MCP bridge actor semantics, and abort-signal propagation through the entire handler chain.
apps/sim/app/api/workflows/[id]/execute/route.ts Adds MCP bridge trust gate (INTERNAL_JWT + X-Sim-MCP-Tool-Call header required), 10 MB body/response limits, abort-signal propagation with 499 responses for client disconnect, and compacted output rejection for large-value refs when called from MCP.
apps/sim/lib/mcp/middleware.ts Replaces ad-hoc body parsing with readMcpJsonBodyWithLimit (10 MB cap with WeakMap caching to avoid double-reads) and adds mcpBodyReadErrorResponse for uniform 413/499/400 handling across all MCP management routes.
apps/sim/lib/mcp/connection-manager.ts Adds 15-second connection timeout with graceful in-flight promise cleanup, moves state deletion before disconnect to avoid phantom state, and replaces manual exponential backoff with the shared backoffWithJitter utility (equivalent delays, adds ±20% jitter).
apps/sim/lib/mcp/orchestration/workflow-mcp-lifecycle.ts Adds pre-flight budget checks, optimistic snapshot validation inside FOR UPDATE transactions, per-server membership limits, and a WorkflowMcpExpectedError type hierarchy to eliminate silent skips and turn validation failures into explicit errors.
apps/sim/lib/mcp/tool-limits.ts New file centralising MCP tool metadata budget logic (per-tool name/description/schema byte limits, per-server aggregate schema and metadata caps, DB usage row helpers).
apps/sim/lib/mcp/workflow-mcp-sync.ts Converts direct bulk queries to cursor-paginated scans with advisory locks, adds fanout guard (MAX_MCP_SERVERS_PER_WORKFLOW), enforces schema size limits before sync, and wraps callers in explicit transactions when none is provided.
apps/sim/lib/core/utils/stream-limits.ts Adds abort-signal integration to readStreamToBufferWithLimit: registers a cancel-on-abort listener before reading, checks signal after each read chunk, and cleans up the listener in the finally block.
apps/sim/lib/execution/payloads/serializer.ts Adds rejectLargeValues/rejectLargeValueLabel options; compactEntriesWithEarlyReject processes object/array entries sequentially with incremental size checks using assertRejectSize, allowing the MCP path to fail fast before spilling large values to storage.
apps/sim/lib/mcp/constants.ts New shared constants file for MCP byte limits, tool counts, and bridge header names, eliminating scattered magic numbers across routes.

Sequence Diagram

sequenceDiagram
    participant Client
    participant MCPServe as MCP Serve Route
    participant WorkflowExec as Workflow Execute Route
    participant DB

    Client->>MCPServe: POST (tools/call)
    MCPServe->>MCPServe: assertContentLengthWithinLimit (10 MB)
    MCPServe->>MCPServe: readStreamToBufferWithLimit (10 MB)
    MCPServe->>MCPServe: authorizeMcpServeRequest
    MCPServe->>DB: SELECT workflowMcpTool (LIMIT 2)
    Note over MCPServe: duplicate check → 409 if >1
    MCPServe->>DB: SELECT workflow (isDeployed check)
    MCPServe->>MCPServe: createManagedAbortSignal (timeout + client disconnect)
    MCPServe->>MCPServe: assertKnownSizeWithinLimit request body (10 MB)
    MCPServe->>WorkflowExec: POST /execute (Bearer internal-token, X-Sim-MCP-Tool-Call: true)
    WorkflowExec->>WorkflowExec: "isMcpBridgeRequest = INTERNAL_JWT and header"
    WorkflowExec->>WorkflowExec: "rejectLargeInlineOutput = true"
    WorkflowExec-->>MCPServe: Response (streaming or JSON)
    MCPServe->>MCPServe: readResponseTextWithLimit (10 MB)
    MCPServe->>MCPServe: createJsonRpcResponseWithLimit (10 MB)
    MCPServe-->>Client: JSON-RPC result or 413/408/499
Loading

Reviews (3): Last reviewed commit: "address comments" | Re-trigger Greptile

Comment thread apps/sim/app/api/mcp/serve/[serverId]/route.ts
Comment thread apps/sim/lib/mcp/workflow-mcp-sync.ts Outdated
Comment thread apps/sim/lib/mcp/server-locks.ts
@icecrasher321
Copy link
Copy Markdown
Collaborator Author

@greptile

@icecrasher321
Copy link
Copy Markdown
Collaborator Author

@greptile

@icecrasher321 icecrasher321 merged commit 59792c0 into staging May 27, 2026
9 checks passed
@waleedlatif1 waleedlatif1 deleted the improvement/mcp-mem-perf branch May 27, 2026 23:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant