feat(quota): per-account Apps Script quota tracking with hard-stop and UI display#1396
feat(quota): per-account Apps Script quota tracking with hard-stop and UI display#1396CaptainMirage wants to merge 8 commits into
Conversation
|
Related work -- PR #1388 After opening this PR I noticed #1388 ("feat(relay): prioritize mux dispatch There is conceptual overlap worth flagging so you can decide how these two interact:
These are not mutually exclusive -- tracking quota per-account (this PR) and |
|
also one thing i noticed, when you merge PRs you tend to squash all the commits into a single one, and the message isn't always super detailed either. it would be really helpful to keep the individual commits from each PR so contributors like me can see exactly what changed at each step without having to open the PR itself, and rolling back something specific becomes a lot more precise rather than having to revert an entire PR at once. just something to think about, many thanks! |
therealaleph
left a comment
There was a problem hiding this comment.
Thanks for the very detailed PR and the notes about #1388. I tested the branch locally with cargo test --all-targets --features ui, and the suite is green, but I cannot merge this one yet because I found one quota-state correctness bug.
QuotaTracker::is_globally_hard_stopped() computes the secondary aggregate check from st.buckets.values(), while total_cap is based only on the currently configured script_ids. Since load() preserves buckets for script IDs that were removed from config, a user who rotates away old exhausted IDs can still carry those stale requests_used / quota_error_count values in quota_state.json. That can trip the global hard stop even when the currently configured IDs are fresh.
Please either filter the aggregate sums to self.script_ids or prune removed buckets on load, and add a regression test for this shape: one stale exhausted persisted bucket that is no longer configured, plus one fresh configured ID, should not globally hard-stop.
Small follow-up while you are in there: the hard-stop response currently returns 502, while the PR description says 503. 503 Service Unavailable fits this case better.
On commit preservation: for user-facing release history I usually squash, but for a feature-sized PR like this I can keep a more detailed merge commit/message once the blocker is fixed.
Answered via LLM, Supervised @therealaleph
|
i see, ill fix them right away and on the commit preservation, the JSON to TOML change wouldve been a nice thing to keep since it changed around many docs and added alot of stuff, i did make sure everything is automatically translated but i mostly mean for others in case they are doing something specific that i happened to touch, now sure blame exists but i mean in a more of a version rewind situation than a see who changed what and when |
…Ds, fix 503 response is_globally_hard_stopped() was summing quota_error_count and requests_used over all persisted buckets, including stale ones from script IDs removed from config. A user rotating away exhausted IDs would still carry their usage in quota_state.json, causing the aggregate check to falsely trip a global hard stop against fresh accounts. Fixed by filtering both sums to self.script_ids only. The all_stopped primary check was already correct (iterates script_ids, not bucket values). Also corrects the hard-stop HTTP response from 502 Bad Gateway to 503 Service Unavailable, which is the accurate status for a deliberately refused request due to resource exhaustion. Regression test: one stale exhausted persisted bucket not in the current config plus one fresh configured bucket must not trigger a global hard stop.
|
i wanted to keep those for a later update im working on but it makes more sense to just remove it here so it doesnt confuse others and add it straight up with the other stuff |
therealaleph
left a comment
There was a problem hiding this comment.
Thanks, this fixes the blocker I found.
I rechecked head f05601619ac550484f70300801fb2b45077b29bf: is_globally_hard_stopped() now filters the aggregate quota/error sums to the currently configured script_ids, the regression test covers the stale-removed-bucket case, and the hard-stop HTTP response is now 503.
Local verification:
cargo test --all-targets --features ui
249 passed
Approved. I am not merging it this minute only because the v1.9.35 release workflow is currently running from the previous merge; I do not want to stack another main change while that release is mid-flight.
Answered via LLM, Supervised @therealaleph
|
alrighty then, in the meantime, could we discuss a way to communicate better? could we have the discussions tab open on this repo so people stop using the issues tab for help? and also what are your thoughts on a discord channel? and maybe a contributor chat too so we can communicate a little ezier so we dont do extra work if someone is already doing something and stuff like that |
Hey Mirage, thanks for flagging this clearly. If you agree I would like to keep your quota tracker as the canon. Persistent per-account quota state, safety-buffer hard stops, startup restoration, and the My plan is to not push #1388 as-is. I’ll split it up and remove the overlapping quota pieces:
So the intended layering would be:
That avoids two ledgers trying to answer the same question. If you agree, I can build a small follow-up directly on top of your So I think the clean path is: your PR owns quota truth, I split my PR into smaller non-overlapping parts, and any quota-adjacent follow-up is either built on your branch with your okay or stacked after your PR lands. |
|
It would make more sense if you wait for my PR to be merged, then from your own fork make a branch off so please don't branch off my PR or build on top of it without my go-ahead. Wait for it to merge, then work from |
also aleph i would love an answer for this, and if you are able to it would be nice to merge this commit till the end of the day, im working on some optimizations on the relay logic and i need this to be merged before i change a few things, the merge diff would be a mess if i edit the files this PR has edited, thank you! |
Summary
Adds a full per-account quota tracking system to protect the proxy against
Apps Script daily quota exhaustion. When an account is running low or fully
exhausted, it is blocked before the next request — not after — so the proxy
never silently fails mid-session. When all accounts are exhausted, a global
hard stop fires and returns 503 to the client immediately rather than
burning remaining quota on requests that will fail anyway.
What changed
src/quota_tracker.rs— new moduleNew module. Implements
AccountBucket(per-account rolling 24h window state),QuotaState(serializable snapshot of all buckets + persistent relay counter),and
QuotaTracker(the live runtime handle shared across tasks).Key behaviour:
anchored to the first call of the day, not calendar midnight. Reset time
is per-account, not global.
remaining < safety_buffer. If true, the account is hard-stopped andremoved from the dispatch rotation. This means an account goes dark while
it still has headroom, rather than failing at zero.
check_all_safety_buffers()runs at load timeso near-limit accounts from the previous session are blocked before the
first request of the new session.
total_relay_callsis written toquota_state.jsonand restored on restart. Resets at UTC midnight viaa stored day number. Drives the "fetches today" UI counter.
startup_summary()builds a human-readableline that logs quota state right after the Listening lines on startup.
impl Drop— saves state to disk on process exit so no in-memorydata is lost on clean shutdown.
src/config.rsAdded two fields with sane defaults — no TOML changes are required,
the proxy works out of the box:
quota_daily_limit20000quota_safety_buffer500These can be overridden in
config.tomlif needed, but the defaults matchthe standard Apps Script quota ceiling with a reasonable safety margin.
Documentation will be updated in a separate docs revision pass.
src/domain_fronter.rsrecord_relay()called at the top ofrelay(), before any earlyreturn. Every proxied request is counted regardless of path (exit node,
Apps Script, or hard stop).
bypass a fully exhausted quota state.
bytes_relayednow accumulatesbody.len() + response.len()on exit node success, not just Apps Scriptresponses. This makes the "data transferred" estimate accurate across
both paths.
relay_callsandrelay_failurescounters remain onDomainFronterand are used for per-session stats (not persisted).
src/proxy_server.rsstate for all configured accounts.
start. Calls
roll_expired_windows()so 24h windows reset even whenthe proxy is idle (no traffic required to trigger a reset).
state every second. Decoupled from the stats log so disk writes are not
held hostage to the 15s interval.
stop, logs each exhausted account's masked ID and remaining count. Uses
a
was_hard_stoppedflag so this fires once on transition, not every15s.
src/bin/ui.rsNew Usage Today grid in the UI sidebar:
total_relay_calls, not thein-memory
relay_callscounter, so it survives restarts.estimate on a cold start. Clamped average per-request size of 50 KB–500 KB.
are exhausted.
Compatibility note with PR #1346
PR #1346 (large download resilience, stream timeout decoupling, compact log
timestamps) and this PR both touch
src/bin/ui.rs,src/config.rs,src/domain_fronter.rs, andsrc/proxy_server.rs— but in completelyseparate areas of each file. #1346 adds a stream timeout config field and
log timestamp formatting; this PR adds quota fields and a new UI section.
There is no logical overlap. Whichever merges second will have a trivial
conflict that resolves cleanly by keeping both diffs.
Test plan
quota_state.json— verify file is created within1 second of the first relay call
total_relay_callsis restored from disk and"fetches today" in the UI matches the pre-restart value
quota_safety_bufferhigh enough to trigger a blocked account atstartup — verify that account is excluded before the first request
quota_daily_limit = 1) — verify globalhard stop fires, subsequent requests return 503, and the red banner appears
in the UI
roll_expired_windowsresets the account without needing any traffic
estimate only visible after 5+ calls
P.S. - i had around 20 commits that i squashed down to 7 thats why the commits are all in the same exact time lol