fix(tunnel-node): bound per-session memory to prevent OOM#1401
Open
yyoyoian-pixel wants to merge 1 commit into
Open
fix(tunnel-node): bound per-session memory to prevent OOM#1401yyoyoian-pixel wants to merge 1 commit into
yyoyoian-pixel wants to merge 1 commit into
Conversation
The reader_task appended upstream data into each session's read_buf with no size limit. With Apps Script RTT of 2-7s, fast upstreams (video, downloads) could push tens of MB between drains; multiple sessions compounded to exhaust the VM and trigger an OOM kill. - READ_BUF_CAP (32 MB): reader_task pauses when the buffer is full and resumes once the client drains it, bounding per-session memory. - Conditional last_active bump: TCP data ops only refresh last_active on real uplink writes or when a drain returns data, so empty polls no longer keep idle sessions alive past the reaper (matches udp_data). - abort_all() in reaper and batch-drain cleanup: previously only reader_handle was aborted, leaking udpgw_handle on virtual sessions. - Diagnostic logging in cleanup_task: logs tcp session count and total read_buf size every 30s so memory pressure is observable. Validated in production: memory held at ~26% over 21h / 72 GB of traffic where the prior build OOM-killed itself at the same age. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The tunnel-node
reader_taskappended upstream data into each session'sread_bufwith no size limit. With Apps Script RTT of 2-7s, fast upstreams (video, downloads) could push tens of MB between drains. Multiple concurrent sessions compounded until the VM was exhausted and the process was OOM-killed.Changes
READ_BUF_CAP(32 MB) backpressure —reader_taskpauses reads when the per-session buffer reaches the cap and resumes once the client drains it. Bounds per-session memory regardless of upstream throughput.last_activebump — TCPdataops now only refreshlast_activeon real uplink writes or when a drain returns downstream data. Empty long-poll batches no longer keep idle sessions alive past the 300s reaper (matches the existingudp_datahad_uplinkpattern).abort_all()in reaper + batch-drain cleanup — previously onlyreader_handle.abort()was called, leakingudpgw_handleon virtual (udpgw) sessions.cleanup_task— logs TCP session count and totalread_bufsize every 30s so memory pressure is observable in logs.Validation
cargo testintunnel-node: 38 passed, 0 failed.Test plan
cargo test)🤖 Generated with Claude Code