feat(quota): per-account Apps Script quota tracking with hard-stop and UI display by CaptainMirage · Pull Request #1396 · therealaleph/MasterHttpRelayVPN-RUST

CaptainMirage · 2026-05-25T13:45:26Z

Summary

Adds a full per-account quota tracking system to protect the proxy against
Apps Script daily quota exhaustion. When an account is running low or fully
exhausted, it is blocked before the next request — not after — so the proxy
never silently fails mid-session. When all accounts are exhausted, a global
hard stop fires and returns 503 to the client immediately rather than
burning remaining quota on requests that will fail anyway.

What changed

`src/quota_tracker.rs` — new module

New module. Implements AccountBucket (per-account rolling 24h window state),
QuotaState (serializable snapshot of all buckets + persistent relay counter),
and QuotaTracker (the live runtime handle shared across tasks).

Key behaviour:

Rolling 24h windows — each account tracks usage in a rolling window
anchored to the first call of the day, not calendar midnight. Reset time
is per-account, not global.
Safety buffer pre-check — before each relay call the tracker checks
remaining < safety_buffer. If true, the account is hard-stopped and
removed from the dispatch rotation. This means an account goes dark while
it still has headroom, rather than failing at zero.
Startup safety check — check_all_safety_buffers() runs at load time
so near-limit accounts from the previous session are blocked before the
first request of the new session.
Persistent relay counter — total_relay_calls is written to
quota_state.json and restored on restart. Resets at UTC midnight via
a stored day number. Drives the "fetches today" UI counter.
Startup summary line — startup_summary() builds a human-readable
line that logs quota state right after the Listening lines on startup.
impl Drop — saves state to disk on process exit so no in-memory
data is lost on clean shutdown.

`src/config.rs`

Added two fields with sane defaults — no TOML changes are required,
the proxy works out of the box:

Field	Default	Meaning
`quota_daily_limit`	`20000`	Apps Script quota per account per day
`quota_safety_buffer`	`500`	Hard-stop N calls before the limit

These can be overridden in config.toml if needed, but the defaults match
the standard Apps Script quota ceiling with a reasonable safety margin.
Documentation will be updated in a separate docs revision pass.

`src/domain_fronter.rs`

record_relay() called at the top of relay(), before any early
return. Every proxied request is counted regardless of path (exit node,
Apps Script, or hard stop).
Global hard stop checked before the exit node path — exit node cannot
bypass a fully exhausted quota state.
Exit node byte tracking — bytes_relayed now accumulates
body.len() + response.len() on exit node success, not just Apps Script
responses. This makes the "data transferred" estimate accurate across
both paths.
relay_calls and relay_failures counters remain on DomainFronter
and are used for per-session stats (not persisted).

`src/proxy_server.rs`

Startup summary logged once after the Listening lines, showing quota
state for all configured accounts.
Stats task reduced from 60s → 15s interval, fires immediately on
start. Calls roll_expired_windows() so 24h windows reset even when
the proxy is idle (no traffic required to trigger a reset).
1-second save task — separate tokio task that flushes dirty quota
state every second. Decoupled from the stats log so disk writes are not
held hostage to the 15s interval.
Exhaustion detail logging — on the stats cycle after a global hard
stop, logs each exhausted account's masked ID and remaining count. Uses
a was_hard_stopped flag so this fires once on transition, not every
15s.

`src/bin/ui.rs`

New Usage Today grid in the UI sidebar:

Row	Left	Right
1	fetches today	X / Y (Z%) · resets in Xh Ym
2	relay calls	N (M failed) · cache —
3	PT day	YYYY-MM-DD · accounts N/N active
4 (conditional)	data transferred	X MB / Y GB est. (shown after 5+ calls)

"Fetches today" is driven by the persisted total_relay_calls, not the
in-memory relay_calls counter, so it survives restarts.
"Resets in" shows the rolling 24h countdown to the earliest account reset.
"Data transferred" only renders after 5+ calls to avoid a meaningless
estimate on a cold start. Clamped average per-request size of 50 KB–500 KB.
A red QUOTA HARD STOP banner appears above the grid when all accounts
are exhausted.

Compatibility note with PR #1346

PR #1346 (large download resilience, stream timeout decoupling, compact log
timestamps) and this PR both touch src/bin/ui.rs, src/config.rs,
src/domain_fronter.rs, and src/proxy_server.rs — but in completely
separate areas of each file. #1346 adds a stream timeout config field and
log timestamp formatting; this PR adds quota fields and a new UI section.
There is no logical overlap. Whichever merges second will have a trivial
conflict that resolves cleanly by keeping both diffs.

Test plan

Start proxy with no quota_state.json — verify file is created within
1 second of the first relay call
Stop and restart — verify total_relay_calls is restored from disk and
"fetches today" in the UI matches the pre-restart value
Set quota_safety_buffer high enough to trigger a blocked account at
startup — verify that account is excluded before the first request
Exhaust all accounts (or set quota_daily_limit = 1) — verify global
hard stop fires, subsequent requests return 503, and the red banner appears
in the UI
Let the proxy idle past a 24h window — verify roll_expired_windows
resets the account without needing any traffic
Confirm UI grid renders: correct counts, resets-in countdown, data
estimate only visible after 5+ calls

P.S. - i had around 20 commits that i squashed down to 7 thats why the commits are all in the same exact time lol

…lay cleanup

CaptainMirage · 2026-05-25T13:53:08Z

Related work -- PR #1388

After opening this PR I noticed #1388 ("feat(relay): prioritize mux dispatch
and expose script health") independently implements a local rolling 24h call
ledger per Apps Script deployment inside domain_fronter.rs, along with a
Script health panel in the UI showing masked deployment IDs, usage, saturation
status, and failure quarantine.

There is conceptual overlap worth flagging so you can decide how these two interact:

	This PR (feat/quota-tracking)	#1388
Scope	Per-account (Google account)	Per-deployment (script URL)
Persistence	Written to quota_state.json, survives restarts	In-memory, resets on restart
Hard stop	Global hard stop blocks all traffic when exhausted	Steering: prefers non-saturated, falls through
Failure handling	Hard stop + was_hard_stopped flag, 503 to client	429/403 -> 24h quarantine; 5xx -> cooldown
UI	Usage Today grid (fetches, resets, relay calls, etc)	Script health panel (per-deployment stats)

These are not mutually exclusive -- tracking quota per-account (this PR) and
per-deployment (#1388) are complementary. But the deployment selection logic
in #1388 and the global hard stop in this PR would need to be aware of each
other if both land. Flagging it early so you can coordinate aleph, i don't think there is a single other maintainer is there?

CaptainMirage · 2026-05-25T13:59:21Z

also one thing i noticed, when you merge PRs you tend to squash all the commits into a single one, and the message isn't always super detailed either. it would be really helpful to keep the individual commits from each PR so contributors like me can see exactly what changed at each step without having to open the PR itself, and rolling back something specific becomes a lot more precise rather than having to revert an entire PR at once. just something to think about, many thanks!

therealaleph

Thanks for the very detailed PR and the notes about #1388. I tested the branch locally with cargo test --all-targets --features ui, and the suite is green, but I cannot merge this one yet because I found one quota-state correctness bug.

QuotaTracker::is_globally_hard_stopped() computes the secondary aggregate check from st.buckets.values(), while total_cap is based only on the currently configured script_ids. Since load() preserves buckets for script IDs that were removed from config, a user who rotates away old exhausted IDs can still carry those stale requests_used / quota_error_count values in quota_state.json. That can trip the global hard stop even when the currently configured IDs are fresh.

Please either filter the aggregate sums to self.script_ids or prune removed buckets on load, and add a regression test for this shape: one stale exhausted persisted bucket that is no longer configured, plus one fresh configured ID, should not globally hard-stop.

Small follow-up while you are in there: the hard-stop response currently returns 502, while the PR description says 503. 503 Service Unavailable fits this case better.

On commit preservation: for user-facing release history I usually squash, but for a feature-sized PR like this I can keep a more detailed merge commit/message once the blocker is fixed.

Answered via LLM, Supervised @therealaleph

CaptainMirage · 2026-05-25T17:49:40Z

i see, ill fix them right away

and on the commit preservation, the JSON to TOML change wouldve been a nice thing to keep since it changed around many docs and added alot of stuff, i did make sure everything is automatically translated but i mostly mean for others in case they are doing something specific that i happened to touch, now sure blame exists but i mean in a more of a version rewind situation than a see who changed what and when

…Ds, fix 503 response is_globally_hard_stopped() was summing quota_error_count and requests_used over all persisted buckets, including stale ones from script IDs removed from config. A user rotating away exhausted IDs would still carry their usage in quota_state.json, causing the aggregate check to falsely trip a global hard stop against fresh accounts. Fixed by filtering both sums to self.script_ids only. The all_stopped primary check was already correct (iterates script_ids, not bucket values). Also corrects the hard-stop HTTP response from 502 Bad Gateway to 503 Service Unavailable, which is the accurate status for a deliberately refused request due to resource exhaustion. Regression test: one stale exhausted persisted bucket not in the current config plus one fresh configured bucket must not trigger a global hard stop.

CaptainMirage · 2026-05-25T17:59:43Z

i wanted to keep those for a later update im working on but it makes more sense to just remove it here so it doesnt confuse others and add it straight up with the other stuff

therealaleph

Thanks, this fixes the blocker I found.

I rechecked head f05601619ac550484f70300801fb2b45077b29bf: is_globally_hard_stopped() now filters the aggregate quota/error sums to the currently configured script_ids, the regression test covers the stale-removed-bucket case, and the hard-stop HTTP response is now 503.

Local verification:

cargo test --all-targets --features ui
249 passed

Approved. I am not merging it this minute only because the v1.9.35 release workflow is currently running from the previous merge; I do not want to stack another main change while that release is mid-flight.

Answered via LLM, Supervised @therealaleph

CaptainMirage · 2026-05-25T18:04:59Z

alrighty then, in the meantime, could we discuss a way to communicate better? could we have the discussions tab open on this repo so people stop using the issues tab for help? and also what are your thoughts on a discord channel? and maybe a contributor chat too so we can communicate a little ezier so we dont do extra work if someone is already doing something and stuff like that

maybeknott · 2026-05-26T11:52:42Z

Related work -- PR #1388

After opening this PR I noticed #1388 ("feat(relay): prioritize mux dispatch and expose script health") independently implements a local rolling 24h call ledger per Apps Script deployment inside domain_fronter.rs, along with a Script health panel in the UI showing masked deployment IDs, usage, saturation status, and failure quarantine.

There is conceptual overlap worth flagging so you can decide how these two interact:

This PR (feat/quota-tracking) #1388
Scope Per-account (Google account) Per-deployment (script URL)
Persistence Written to quota_state.json, survives restarts In-memory, resets on restart
Hard stop Global hard stop blocks all traffic when exhausted Steering: prefers non-saturated, falls through
Failure handling Hard stop + was_hard_stopped flag, 503 to client 429/403 -> 24h quarantine; 5xx -> cooldown
UI Usage Today grid (fetches, resets, relay calls, etc) Script health panel (per-deployment stats)
These are not mutually exclusive -- tracking quota per-account (this PR) and per-deployment (#1388) are complementary. But the deployment selection logic in #1388 and the global hard stop in this PR would need to be aware of each other if both land. Flagging it early so you can coordinate aleph, i don't think there is a single other maintainer is there?

Hey Mirage, thanks for flagging this clearly.

If you agree I would like to keep your quota tracker as the canon. Persistent per-account quota state, safety-buffer hard stops, startup restoration, and the 503 global stop is a proper bundle and cleanly implemented, and QuotaTracker is the better home for that than the lightweight in-memory ledger I had.

My plan is to not push #1388 as-is. I’ll split it up and remove the overlapping quota pieces:

keep the TunnelMux interactive-priority work as a separate small PR, since that is transport scheduling and does not depend on quota tracking;
drop/supersede the local rolling 24h quota ledger from feat(relay): prioritize mux dispatch and expose script health #1388, since your quota_state.json tracker covers the durable quota/account model better;
rework the failure-classification part, if still useful, so quota-like failures feed into QuotaTracker instead of maintaining a second quota path;
keep transient deployment/route failures separate from quota hard-stops, so a bad route can cool down without marking the account exhausted;
rename any future UI concept from “quota/script health” toward “deployment route health” if it only shows transient network route state, cooldowns, timeout strikes, and last failure class.

So the intended layering would be:

Quota/account truth: your QuotaTracker
Dispatch hard stop: your global hard-stop / per-account hard-stop checks
Transient deployment health: a small route-health layer only for non-quota network failures, if needed later
Mux scheduling: separate TunnelMux priority PR, independent from quota

That avoids two ledgers trying to answer the same question.

If you agree, I can build a small follow-up directly on top of your feat/quota-tracking branch instead of waiting for it to land. I would keep it narrow and additive: probably extra quota-vs-transient failure classification coverage, a reusable helper cleanup if needed, or a tiny integration point that makes later route-health work consume QuotaTracker cleanly. Then I will separately commit TunnelMux priority, clearer quota-vs-transient failure boundaries, and possibly a later deployment route-health view that only describes network health rather than quota capacity.

So I think the clean path is: your PR owns quota truth, I split my PR into smaller non-overlapping parts, and any quota-adjacent follow-up is either built on your branch with your okay or stacked after your PR lands.

CaptainMirage · 2026-05-26T12:59:16Z

It would make more sense if you wait for my PR to be merged, then from your own fork make a branch off main/upstream for your TunnelMux PR. Keeps both our histories clean, and I'd rather it that way - no stacking.

so please don't branch off my PR or build on top of it without my go-ahead. Wait for it to merge, then work from main.

CaptainMirage · 2026-05-26T13:01:21Z

alrighty then, in the meantime, could we discuss a way to communicate better? could we have the discussions tab open on this repo so people stop using the issues tab for help? and also what are your thoughts on a discord channel? and maybe a contributor chat too so we can communicate a little ezier so we dont do extra work if someone is already doing something and stuff like that

also aleph i would love an answer for this, and if you are able to it would be nice to merge this commit till the end of the day, im working on some optimizations on the relay logic and i need this to be merged before i change a few things, the merge diff would be a mess if i edit the files this PR has edited, thank you!

CaptainMirage added 7 commits May 25, 2026 16:59

feat(quota_tracker): add per-account quota tracking with config fields

16aab7d

feat: wire quota tracking into relay dispatch and startup logging

b010a6a

feat(ui): add quota usage display to Usage Today section

425e309

fix: quota safety buffer on load, idle window rollover, startup flush

a80a69e

fix(ui): usage today section — hard-stop banner, resets, data estimate

c7e860c

fix: global hard stop coverage, 1s save task, exhaustion detail logging

92b20d0

fix: persist relay call count, exit node bytes tracking, UI data disp…

957208a

…lay cleanup

github-actions Bot added the type: feature feat: PR — auto-applied by release-drafter label May 25, 2026

therealaleph requested changes May 25, 2026

View reviewed changes

therealaleph approved these changes May 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(quota): per-account Apps Script quota tracking with hard-stop and UI display#1396

feat(quota): per-account Apps Script quota tracking with hard-stop and UI display#1396
CaptainMirage wants to merge 8 commits into
therealaleph:mainfrom
CaptainMirage:feat/quota-tracking

CaptainMirage commented May 25, 2026 •

edited

Loading

Uh oh!

CaptainMirage commented May 25, 2026

Uh oh!

CaptainMirage commented May 25, 2026

Uh oh!

therealaleph left a comment

Uh oh!

CaptainMirage commented May 25, 2026

Uh oh!

CaptainMirage commented May 25, 2026

Uh oh!

therealaleph left a comment

Uh oh!

CaptainMirage commented May 25, 2026

Uh oh!

maybeknott commented May 26, 2026

Uh oh!

CaptainMirage commented May 26, 2026 •

edited

Loading

Uh oh!

CaptainMirage commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

CaptainMirage commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

src/quota_tracker.rs — new module

src/config.rs

src/domain_fronter.rs

src/proxy_server.rs

src/bin/ui.rs

Compatibility note with PR #1346

Test plan

Uh oh!

CaptainMirage commented May 25, 2026

Uh oh!

CaptainMirage commented May 25, 2026

Uh oh!

therealaleph left a comment

Choose a reason for hiding this comment

Uh oh!

CaptainMirage commented May 25, 2026

Uh oh!

CaptainMirage commented May 25, 2026

Uh oh!

therealaleph left a comment

Choose a reason for hiding this comment

Uh oh!

CaptainMirage commented May 25, 2026

Uh oh!

maybeknott commented May 26, 2026

Uh oh!

CaptainMirage commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CaptainMirage commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CaptainMirage commented May 25, 2026 •

edited

Loading

`src/quota_tracker.rs` — new module

`src/config.rs`

`src/domain_fronter.rs`

`src/proxy_server.rs`

`src/bin/ui.rs`

CaptainMirage commented May 26, 2026 •

edited

Loading