Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
89 commits
Select commit Hold shift + click to select a range
d92ae6b
feat: Hugging Face model search and thinking-capability detection
quiet-node Jun 19, 2026
da15b51
feat: restructure Settings to a premium left sidebar with a running-m…
quiet-node Jun 19, 2026
699d50e
feat: add the Models segmented Library/Discover/Providers control
quiet-node Jun 19, 2026
afad609
feat: wire the Models segmented control into the Model tab
quiet-node Jun 19, 2026
0b1b1c0
feat: build the Models surface with Active-Hero providers, Library, a…
quiet-node Jun 19, 2026
55db7c7
style: reskin the standard settings tabs to the premium tokens
quiet-node Jun 19, 2026
f58f6f1
feat: HF search text-gen default, paginated Load-more, and RAM-fit an…
quiet-node Jun 19, 2026
9a85500
style: restyle the Models segmented control to icon-above-label tabs
quiet-node Jun 19, 2026
cce84ee
feat: redesign the Library pane with quiet rows, a popover menu, and …
quiet-node Jun 19, 2026
60ad330
feat: redesign the Discover pane with RAM-fit, HF links, icon downloa…
quiet-node Jun 19, 2026
a1dbcd3
fix: Providers pane row padding, switch confirmation, single prompt h…
quiet-node Jun 19, 2026
96151f9
style: apply prettier and rustfmt, and align the search hook setter name
quiet-node Jun 19, 2026
f4aa41f
refactor: drop unused est_runtime_gb from search rows and share the R…
quiet-node Jun 19, 2026
f82b5ed
fix: Discover search returns downloadable GGUF chat repos and add rev…
quiet-node Jun 19, 2026
7ae917a
fix: refresh the provider model dropdown on switch, consistent names,…
quiet-node Jun 19, 2026
ab09401
feat: RAM-fit tooltips, capability pills, clickable Discover titles, …
quiet-node Jun 19, 2026
d29e16a
fix: lift config on Ollama model switch so Running card updates
quiet-node Jun 19, 2026
ac5adcc
refactor: drop unreliable row-level RAM-fit, keep accurate per-quant fit
quiet-node Jun 19, 2026
b179d9c
polish: shorten RAM-fit hover tooltips to one clean line each
quiet-node Jun 19, 2026
868b6d9
polish: calm capability pills, add Text, rename Reasoning to Thinking…
quiet-node Jun 19, 2026
1e08d34
polish: restyle Models tabs to match Settings nav, drop the container…
quiet-node Jun 19, 2026
3d831a6
fix: make built-in reasoning opt-in via /think with honest thinking UX
quiet-node Jun 19, 2026
bf14fdc
fix: drop reasoning output when thinking is off for a model-agnostic …
quiet-node Jun 19, 2026
d4363f1
fix: extend reasoning-off to all kwarg-controllable model families
quiet-node Jun 19, 2026
891bf9b
fix: show model reasoning instead of hiding it when thinking is off
quiet-node Jun 19, 2026
58f897c
feat: badge models whose reasoning cannot be turned off
quiet-node Jun 19, 2026
c872929
feat: dynamically classify reasoning capability of downloaded GGUF mo…
quiet-node Jun 19, 2026
2a0ff75
feat: add family grouping field to the curated starter registry
quiet-node Jun 20, 2026
ca582bc
refactor: relocate the raw Hugging Face browser to BrowseAllPane
quiet-node Jun 20, 2026
15d2db1
feat: add curated Staff picks family accordion for Discover
quiet-node Jun 20, 2026
ee13401
feat: front Discover with a Staff picks and Browse all pathway toggle
quiet-node Jun 20, 2026
7afffc1
refactor: derive Staff picks open state without a seeding effect
quiet-node Jun 20, 2026
4da28ab
fix: chevron disclosure on repo rows, download icon on quant rows in …
quiet-node Jun 20, 2026
57b047e
feat: flatten Staff picks to an alphabetical list of rich model cards
quiet-node Jun 20, 2026
e6507f5
feat: group Staff picks into compact use-case sections, drop capabili…
quiet-node Jun 20, 2026
35ca942
test: match the Discover host probe to the new Staff picks hint
quiet-node Jun 20, 2026
8926018
fix: hover-activate the Settings and update panels after defocus
quiet-node Jun 20, 2026
f5c33b3
feat: rework the Generation settings rows and provider hero
quiet-node Jun 20, 2026
ae1e418
polish: underline Discover toggle, drop the staff hint and fit dots
quiet-node Jun 20, 2026
ea70228
polish: warm capability-pill colors, drop the installed badge on Disc…
quiet-node Jun 20, 2026
ff7f4ab
polish: remove the Running model sidebar card, hug RAM-fit tooltips t…
quiet-node Jun 20, 2026
97778ce
polish: render RAM-fit tooltips on a single line
quiet-node Jun 20, 2026
6d8418a
feat: decouple Staff Picks onto an id-keyed catalog over the starter …
quiet-node Jun 20, 2026
0877bfa
feat: repoint Staff Picks UI onto the id-keyed catalog via useStaffPi…
quiet-node Jun 20, 2026
e3274e3
feat: expand the Staff Picks catalog with seven deeply-vetted models
quiet-node Jun 20, 2026
8f9fb50
refactor: derive the manifest id through a single installed_model_id …
quiet-node Jun 20, 2026
7e4b71c
feat: balance the Staff Picks catalog to three models per category
quiet-node Jun 20, 2026
765931e
fix: word built-in keep-warm status like Ollama (model in VRAM / Load…
quiet-node Jun 20, 2026
1af58d7
feat: warm-load the built-in engine on summon and first keystroke, ma…
quiet-node Jun 20, 2026
913aae4
fix: spawn llama-server with --parallel 1 so the warm-up prime and fi…
quiet-node Jun 20, 2026
9ea6467
polish: quiet the confirm dialog's primary action to a tinted accent …
quiet-node Jun 20, 2026
51678a2
fix: evict the Ollama model from VRAM when switching away from Ollama…
quiet-node Jun 20, 2026
e2eef81
polish: drop the status dot from the keep-warm line, leaving just the…
quiet-node Jun 20, 2026
ad7c705
feat: record each catalog model's vetted context window in the registry
quiet-node Jun 20, 2026
b089030
feat: show each model's context window as a pill in Staff Picks
quiet-node Jun 20, 2026
9e4fa82
feat: surface the context window for Browse-all repos from sanitized …
quiet-node Jun 20, 2026
5884ed5
feat: move the Staff Picks context window into the size and maker sub…
quiet-node Jun 20, 2026
da8fa49
feat: show the Browse-all context window on the repo row via the sear…
quiet-node Jun 20, 2026
ec7f8fc
feat: show the context window on Library rows, healed from the registry
quiet-node Jun 20, 2026
2ee29e8
feat: redesign the model download into an inline hairline with a unif…
quiet-node Jun 20, 2026
32d6a63
fix: keep the Browse all quant list visible while one gguf downloads
quiet-node Jun 20, 2026
c35c545
feat: render the failed download state as an inline hairline
quiet-node Jun 20, 2026
f53fde0
feat: show interrupted partials as a quiet Paused row in Staff picks
quiet-node Jun 20, 2026
40cc0b1
fix: flip Staff picks to Paused on cancel and keep the fit hint
quiet-node Jun 20, 2026
bceb0ed
feat: surface interrupted partials as Paused/Resume/Discard in Browse…
quiet-node Jun 20, 2026
9bac5f0
fix: keep Discover model downloads alive across tab switches and the …
quiet-node Jun 20, 2026
f5c94c8
fix: flip the Library model menu above its trigger when space is tight
quiet-node Jun 20, 2026
dcea71c
fix: name the built-in engine's actual resident model in keep-warm st…
quiet-node Jun 20, 2026
7f47e2f
feat: download multiple models in parallel from Settings Discover
quiet-node Jun 21, 2026
8163e59
feat(settings): gate Library and Discover for non-built-in providers
quiet-node Jun 21, 2026
af052c2
fix(models): hide the download control for installed quants in Browse…
quiet-node Jun 21, 2026
6dab3c7
feat: unify Models info, link names to Hugging Face, fix menu clipping
quiet-node Jun 21, 2026
150ea81
fix: truncate long resident-model name in keep-warm status
quiet-node Jun 21, 2026
43f64f9
feat: capability pills in Browse-all, shared capability derivation, a…
quiet-node Jun 21, 2026
a79e359
refactor(discover): remove Browse-all result count label
quiet-node Jun 21, 2026
837cab3
feat(discover): show per-family download status pills on Browse-all rows
quiet-node Jun 21, 2026
dd1e571
fix: broadcast active-model changes so the Settings panel and overlay…
quiet-node Jun 21, 2026
839fd04
fix: dedup built-in warm-up primes and surface a warming status
quiet-node Jun 21, 2026
0a4af79
fix: allow discarding a paused download while another download runs
quiet-node Jun 21, 2026
c30d69a
feat(settings): themed model picker popover for Providers
quiet-node Jun 21, 2026
3bb13b0
fix(engine): surface llama-server load failures and flag unsupported …
quiet-node Jun 21, 2026
6724386
feat(settings): caution notice and per-download confirm in Browse all
quiet-node Jun 21, 2026
9030528
chore(onboarding): note more models live in Settings
quiet-node Jun 21, 2026
2f180d4
docs(openai): correct reasoning note; off-mode thinking is shown, not…
quiet-node Jun 21, 2026
750cf8b
fix(discover): stop HF search Load more refetching the capped page fo…
quiet-node Jun 21, 2026
f9b1d88
fix(engine): reset built-in warm-up dedup when the engine unloads
quiet-node Jun 21, 2026
cda5025
fix(engine): prefer the actionable stderr line in engine-start errors
quiet-node Jun 21, 2026
9187d9a
fix(discover): ignore re-entrant download starts for an in-flight model
quiet-node Jun 21, 2026
14d81ae
fix(discover): cap the Hugging Face search input at the backend query…
quiet-node Jun 21, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 12 additions & 1 deletion docs/configurations.md
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,7 @@ Upgrading from an older version is automatic: a pre-providers config with a flat
| Constant | Default | Tunable? | Bounds | Description |
| :---------------- | :--------- | :------- | :------------------ | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `active_provider` | `"builtin"` | Yes | id of a provider | Which provider receives inference. Must match the `id` of one of the `[[inference.providers]]` entries; an empty or dangling value resets to `builtin`. Exception: a config that predates the providers list is pinned to `ollama` on load, because no working built-in provider existed when that file was written. |
| `num_ctx` | `16384` | Yes | `[2048, 1048576]` | Context window size in tokens sent to the active provider with every request. For the built-in engine, the value becomes `--ctx-size` when the `llama-server` process starts, so changing it restarts the engine. For Ollama, warmup and chat share this value so the same runner instance and its cached KV prefix for the system prompt are reused: they must match or Ollama creates a second runner and the warmup saves nothing. Ollama silently clamps this to the model's physical maximum. For OpenAI-compatible providers the value is informational only; the server controls the actual context. Raise to fit longer conversations: each doubling roughly doubles VRAM for the KV cache; lower to reclaim GPU memory. See [Tuning the Context Window](./tuning-context-window.md). |
| `num_ctx` | `16384` | Yes | `[2048, 1048576]` | Context window size in tokens sent to the active provider with every request. For the built-in engine, the value becomes `--ctx-size` when the `llama-server` process starts, so changing it restarts the engine. For Ollama, warmup and chat share this value so the same runner instance and its cached KV prefix for the system prompt are reused: they must match or Ollama creates a second runner and the warmup saves nothing. Ollama silently clamps this to the model's physical maximum. For OpenAI-compatible providers the value is informational only; the server controls the actual context. Raise to fit longer conversations: the KV cache grows roughly linearly with the context size (the model weights stay the same), so each doubling roughly doubles its memory footprint; benchmark on your hardware before pushing it high, and lower to reclaim memory. See [Tuning the Context Window](./tuning-context-window.md). |
| `keep_warm_inactivity_minutes` | `0` | Yes | `-1` or `[0, 1440]` | Minutes of inactivity before Thuki releases the active model from memory. Governs both local providers: the built-in engine stops its sidecar to free RAM, and Ollama is told to release the model from VRAM. Not applicable to a remote OpenAI-compatible server, whose residency Thuki does not manage. `0` uses the provider's natural short default (about 5 minutes): Ollama defers to its own timer, the built-in engine applies its own ~5-minute timer (`DEFAULT_BUILTIN_IDLE_MINUTES`). `-1` keeps the model resident forever. Raise for longer sessions between uses; lower to reclaim memory sooner. |

Each `[[inference.providers]]` block has these fields:
Expand Down Expand Up @@ -185,11 +185,22 @@ The table below also lists the baked-in safety limits that govern Thuki's commun
| `DEFAULT_BUILTIN_IDLE_MINUTES` | `5 min` | No | The fixed translation of the `keep_warm_inactivity_minutes = 0` sentinel for the built-in engine, not a separate preference. The built-in engine has no external daemon to defer to, so `0` ("use the provider's natural short default") resolves to this value. Users who want a different timeout set `keep_warm_inactivity_minutes` directly (`N` minutes, or `-1` for forever). | — | The idle window the built-in engine applies when `keep_warm_inactivity_minutes` is `0`. After this many minutes of inactivity the sidecar is stopped to free RAM. |
| `ENGINE_HEALTH_PROBE_TIMEOUT_SECS` | `5 s` | No | Internal lifecycle contract between the runner and the engine process. A wedged-but-connected server must not park the poll loop forever; loopback probes are normally instant so 5 s is generous. The poll interval and deadline are the user-facing knobs. | — | How long a single `/health` GET is allowed to take inside the startup poll loop. If the engine has accepted the TCP connection but stopped responding, this timeout causes the probe to return an error (treated as Wait and retried after `ENGINE_HEALTH_POLL_INTERVAL_MS`). |
| `ENGINE_COMMAND_QUEUE_CAPACITY` | `64` | No | Bounds memory under command bursts; 64 slots is ample for all UI-driven traffic (Ensure, Touch, SetIdleMinutes, Shutdown) under any realistic usage pattern. | — | Capacity of the bounded `mpsc` channel that carries commands from `EngineHandle` to the runner actor task. Back-pressure from a full queue is not observable in normal use. |
| `ENGINE_STDERR_TAIL_LINES` | `20` | No | Defense-in-depth bound on captured subprocess output: 20 lines cover the load-error block `llama-server` prints on exit without retaining its whole log. | — | Number of trailing `llama-server` stderr lines the runner keeps so a crash can report the engine's own reason (e.g. `unknown model architecture`) instead of a generic message. |
| `ENGINE_STDERR_TAIL_LINE_MAX_BYTES` | `500` | No | Defense-in-depth bound on attacker-influenced data: a single pathological newline-less stderr line (e.g. an enormous architecture string echoed from crafted GGUF metadata) is capped during the read, so neither peak read buffering nor the retained tail can grow without limit. | — | Maximum bytes buffered and retained per captured engine stderr line. |
| `ENGINE_CRASH_FALLBACK_MESSAGE` | `"engine process exited unexpectedly"` | No | Internal diagnostic fallback surfaced only when the real reason is unavailable; not meaningful to expose. | n/a | Reason reported when the built-in engine process exits without leaving any stderr to capture (e.g. an external `SIGKILL`). |
| `DOWNLOAD_PROGRESS_MIN_INTERVAL_MS` | `500 ms` | No | Pure IPC hygiene: a fast local connection can deliver thousands of chunks per second and the UI only needs a few updates per second, so throttling below the UI refresh rate is invisible to the user. | — | Minimum interval between `Progress` events emitted while a model file downloads. An update is also emitted whenever at least 1% of the file has arrived since the last one, whichever comes first, and a final 100% update always precedes verification. |
| `BLOB_HASH_BUFFER_BYTES` | `4 MiB` | No | Internal I/O buffer with no user-visible effect beyond verify speed. A few-MB buffer turns hashing a multi-GB blob into a few hundred reads instead of hundreds of thousands. | — | Read-buffer size for streaming a downloaded blob through SHA-256 during verification. The common path hashes bytes as they download, so this applies only to a full-length partial left from a prior run or a resumed download's on-disk prefix. |
| `MAX_HF_API_BODY_BYTES` | `4 MiB` | No | Defense-in-depth bound on attacker-controlled data from a remote service, mirroring `MAX_OLLAMA_TAGS_BODY_BYTES`. | — | The largest Hugging Face API response body (repo file listings) Thuki will accept while resolving a model to download. Larger responses are rejected mid-stream and the request returns an error. |
| `MAX_GGUF_KV_COUNT` | `4096` | No | Defense-in-depth bound on a downloaded GGUF's metadata-key count. A corrupt or hostile `metadata_kv_count` could otherwise drive an unbounded scan; real models carry a few dozen entries, so 4096 never truncates legitimate metadata. | — | The most GGUF metadata key-value pairs the reasoning classifier scans when reading a downloaded model's chat template. Scanning stops at the cap. |
| `MAX_GGUF_KEY_BYTES` | `1 KiB` | No | Defense-in-depth bound on a downloaded GGUF's metadata-key length. Keys are short dotted identifiers (`tokenizer.chat_template`); capping the length stops a corrupt length field from forcing a large allocation. | — | The longest GGUF metadata key the reasoning classifier will read. A longer key stops the scan. |
| `MAX_GGUF_STRING_BYTES` | `4 MiB` | No | Defense-in-depth bound on a downloaded GGUF's string values. Real chat templates run a few KB to ~100 KB; 4 MiB never truncates one while bounding the memory a corrupt length field can demand. | — | The largest GGUF string value (the chat template or architecture) the reasoning classifier will materialize. A larger value stops the scan and the model relies on the runtime backstop instead. |
| `HF_API_TIMEOUT_SECS` | `15 s` | No | Protocol cap on a hung remote service so the download UI cannot stall on metadata resolution; 15 s is generous for a small metadata call over the internet. | — | How long Thuki waits for a Hugging Face API metadata call (repo file listing) to respond before giving up. Applies to resolving pasted repo ids and listing a repo's GGUF files, not to the model download itself. |
| `HF_BASE_URL` | `https://huggingface.co` | No | Single origin for model metadata and downloads. Provenance comes from the pinned repo revisions in the curated starter registry, and those pins are only meaningful against the canonical Hub; an arbitrary mirror could serve different content under the same revision ids. | — | The Hugging Face origin Thuki uses for all model metadata calls and blob downloads. Every starter in the registry pins a repo at an exact revision and carries a compiled-in sha256 digest checked after download; the digest catches truncation, bit rot, and resume corruption, while the pinned revision on the canonical Hub is what fixes which content is fetched. |
| `HF_SEARCH_LIMIT` | `30` | No | The per-page step for the in-app model browser. The "Load more" control raises the requested page size in multiples of this value, so it is a layout step rather than a user preference. | — | How many GGUF model repos the first page of an in-app Hugging Face search returns, most-downloaded first. |
| `HF_SEARCH_LIMIT_MAX` | `120` | No | Defense-in-depth bound on request size: "Load more" grows the requested page size in `HF_SEARCH_LIMIT` steps, and this caps the largest single request so a runaway page count cannot ask the Hub for an unbounded result set. | — | The largest page size a single in-app Hugging Face search request may ask for, regardless of how many times "Load more" was pressed. |
| `MAX_MODEL_CONTEXT_LENGTH` | `1 M` | No | Defense-in-depth bound on attacker-controlled GGUF metadata: a repo's `context_length` is editable (`gguf_set_metadata.py`) and occasionally inflated, so a value above this sane ceiling is treated as untrustworthy and dropped rather than shown. Mirrors the `num_ctx` upper bound; 1 M tokens covers every current model. | — | The largest model context window Thuki will trust and display from a Browse-all repo's parsed GGUF metadata. A larger declared value is dropped (no context window shown) rather than rendered. Curated Staff Picks models carry a hand-vetted value in the registry instead. |
| `RUNTIME_OVERHEAD_GB` | `2.0` | No | Feeds the approximate RAM-fit hint shown in Library and Discover only; the authoritative per-starter memory estimates live in the model registry. A user-tunable overhead would imply a precision the hint does not claim. | — | Resident-memory overhead added on top of a model's weights size (KV cache plus runtime buffers) when estimating whether it fits in this Mac's RAM. |
| `MAX_HF_SEARCH_QUERY_LEN` | `200 bytes` | No | Defense-in-depth bound on attacker-influenced input: the query reaches the fixed Hub host (no SSRF) and is percent-encoded by the client, but an unbounded string is still rejected to cap request size. | — | The longest search string Thuki sends to the Hugging Face model search. A longer query is rejected before any network call. |
| `OPENAI_MODELS_TIMEOUT_SECS` | `5 s` | No | Protocol cap on a hung server so the Settings model dropdown cannot stall; the OpenAI-compatible server is local or LAN-hosted in the common case, so 5 s is generous. | — | How long Thuki waits for an OpenAI-compatible server's `/v1/models` listing to respond before giving up. Applies to the Settings model dropdown for that provider, not to chat requests. |
| `MAX_SSE_LINE_BYTES` | `1 MiB` | No | Defense-in-depth bound on attacker-controlled stream data. A malicious or broken chat server could otherwise grow a single stream line without limit and exhaust memory. | — | The longest single Server-Sent-Events line Thuki accepts while streaming a chat response from an OpenAI-compatible (`/v1`) server. A stream line exceeding this aborts the response with an error. |

Expand Down
Loading