diff --git a/docs/configurations.md b/docs/configurations.md index 6d6c0d75..f47dca05 100644 --- a/docs/configurations.md +++ b/docs/configurations.md @@ -151,7 +151,7 @@ Upgrading from an older version is automatic: a pre-providers config with a flat | Constant | Default | Tunable? | Bounds | Description | | :---------------- | :--------- | :------- | :------------------ | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `active_provider` | `"builtin"` | Yes | id of a provider | Which provider receives inference. Must match the `id` of one of the `[[inference.providers]]` entries; an empty or dangling value resets to `builtin`. Exception: a config that predates the providers list is pinned to `ollama` on load, because no working built-in provider existed when that file was written. | -| `num_ctx` | `16384` | Yes | `[2048, 1048576]` | Context window size in tokens sent to the active provider with every request. For the built-in engine, the value becomes `--ctx-size` when the `llama-server` process starts, so changing it restarts the engine. For Ollama, warmup and chat share this value so the same runner instance and its cached KV prefix for the system prompt are reused: they must match or Ollama creates a second runner and the warmup saves nothing. Ollama silently clamps this to the model's physical maximum. For OpenAI-compatible providers the value is informational only; the server controls the actual context. Raise to fit longer conversations: each doubling roughly doubles VRAM for the KV cache; lower to reclaim GPU memory. See [Tuning the Context Window](./tuning-context-window.md). | +| `num_ctx` | `16384` | Yes | `[2048, 1048576]` | Context window size in tokens sent to the active provider with every request. For the built-in engine, the value becomes `--ctx-size` when the `llama-server` process starts, so changing it restarts the engine. For Ollama, warmup and chat share this value so the same runner instance and its cached KV prefix for the system prompt are reused: they must match or Ollama creates a second runner and the warmup saves nothing. Ollama silently clamps this to the model's physical maximum. For OpenAI-compatible providers the value is informational only; the server controls the actual context. Raise to fit longer conversations: the KV cache grows roughly linearly with the context size (the model weights stay the same), so each doubling roughly doubles its memory footprint; benchmark on your hardware before pushing it high, and lower to reclaim memory. See [Tuning the Context Window](./tuning-context-window.md). | | `keep_warm_inactivity_minutes` | `0` | Yes | `-1` or `[0, 1440]` | Minutes of inactivity before Thuki releases the active model from memory. Governs both local providers: the built-in engine stops its sidecar to free RAM, and Ollama is told to release the model from VRAM. Not applicable to a remote OpenAI-compatible server, whose residency Thuki does not manage. `0` uses the provider's natural short default (about 5 minutes): Ollama defers to its own timer, the built-in engine applies its own ~5-minute timer (`DEFAULT_BUILTIN_IDLE_MINUTES`). `-1` keeps the model resident forever. Raise for longer sessions between uses; lower to reclaim memory sooner. | Each `[[inference.providers]]` block has these fields: @@ -185,11 +185,22 @@ The table below also lists the baked-in safety limits that govern Thuki's commun | `DEFAULT_BUILTIN_IDLE_MINUTES` | `5 min` | No | The fixed translation of the `keep_warm_inactivity_minutes = 0` sentinel for the built-in engine, not a separate preference. The built-in engine has no external daemon to defer to, so `0` ("use the provider's natural short default") resolves to this value. Users who want a different timeout set `keep_warm_inactivity_minutes` directly (`N` minutes, or `-1` for forever). | — | The idle window the built-in engine applies when `keep_warm_inactivity_minutes` is `0`. After this many minutes of inactivity the sidecar is stopped to free RAM. | | `ENGINE_HEALTH_PROBE_TIMEOUT_SECS` | `5 s` | No | Internal lifecycle contract between the runner and the engine process. A wedged-but-connected server must not park the poll loop forever; loopback probes are normally instant so 5 s is generous. The poll interval and deadline are the user-facing knobs. | — | How long a single `/health` GET is allowed to take inside the startup poll loop. If the engine has accepted the TCP connection but stopped responding, this timeout causes the probe to return an error (treated as Wait and retried after `ENGINE_HEALTH_POLL_INTERVAL_MS`). | | `ENGINE_COMMAND_QUEUE_CAPACITY` | `64` | No | Bounds memory under command bursts; 64 slots is ample for all UI-driven traffic (Ensure, Touch, SetIdleMinutes, Shutdown) under any realistic usage pattern. | — | Capacity of the bounded `mpsc` channel that carries commands from `EngineHandle` to the runner actor task. Back-pressure from a full queue is not observable in normal use. | +| `ENGINE_STDERR_TAIL_LINES` | `20` | No | Defense-in-depth bound on captured subprocess output: 20 lines cover the load-error block `llama-server` prints on exit without retaining its whole log. | — | Number of trailing `llama-server` stderr lines the runner keeps so a crash can report the engine's own reason (e.g. `unknown model architecture`) instead of a generic message. | +| `ENGINE_STDERR_TAIL_LINE_MAX_BYTES` | `500` | No | Defense-in-depth bound on attacker-influenced data: a single pathological newline-less stderr line (e.g. an enormous architecture string echoed from crafted GGUF metadata) is capped during the read, so neither peak read buffering nor the retained tail can grow without limit. | — | Maximum bytes buffered and retained per captured engine stderr line. | +| `ENGINE_CRASH_FALLBACK_MESSAGE` | `"engine process exited unexpectedly"` | No | Internal diagnostic fallback surfaced only when the real reason is unavailable; not meaningful to expose. | n/a | Reason reported when the built-in engine process exits without leaving any stderr to capture (e.g. an external `SIGKILL`). | | `DOWNLOAD_PROGRESS_MIN_INTERVAL_MS` | `500 ms` | No | Pure IPC hygiene: a fast local connection can deliver thousands of chunks per second and the UI only needs a few updates per second, so throttling below the UI refresh rate is invisible to the user. | — | Minimum interval between `Progress` events emitted while a model file downloads. An update is also emitted whenever at least 1% of the file has arrived since the last one, whichever comes first, and a final 100% update always precedes verification. | | `BLOB_HASH_BUFFER_BYTES` | `4 MiB` | No | Internal I/O buffer with no user-visible effect beyond verify speed. A few-MB buffer turns hashing a multi-GB blob into a few hundred reads instead of hundreds of thousands. | — | Read-buffer size for streaming a downloaded blob through SHA-256 during verification. The common path hashes bytes as they download, so this applies only to a full-length partial left from a prior run or a resumed download's on-disk prefix. | | `MAX_HF_API_BODY_BYTES` | `4 MiB` | No | Defense-in-depth bound on attacker-controlled data from a remote service, mirroring `MAX_OLLAMA_TAGS_BODY_BYTES`. | — | The largest Hugging Face API response body (repo file listings) Thuki will accept while resolving a model to download. Larger responses are rejected mid-stream and the request returns an error. | +| `MAX_GGUF_KV_COUNT` | `4096` | No | Defense-in-depth bound on a downloaded GGUF's metadata-key count. A corrupt or hostile `metadata_kv_count` could otherwise drive an unbounded scan; real models carry a few dozen entries, so 4096 never truncates legitimate metadata. | — | The most GGUF metadata key-value pairs the reasoning classifier scans when reading a downloaded model's chat template. Scanning stops at the cap. | +| `MAX_GGUF_KEY_BYTES` | `1 KiB` | No | Defense-in-depth bound on a downloaded GGUF's metadata-key length. Keys are short dotted identifiers (`tokenizer.chat_template`); capping the length stops a corrupt length field from forcing a large allocation. | — | The longest GGUF metadata key the reasoning classifier will read. A longer key stops the scan. | +| `MAX_GGUF_STRING_BYTES` | `4 MiB` | No | Defense-in-depth bound on a downloaded GGUF's string values. Real chat templates run a few KB to ~100 KB; 4 MiB never truncates one while bounding the memory a corrupt length field can demand. | — | The largest GGUF string value (the chat template or architecture) the reasoning classifier will materialize. A larger value stops the scan and the model relies on the runtime backstop instead. | | `HF_API_TIMEOUT_SECS` | `15 s` | No | Protocol cap on a hung remote service so the download UI cannot stall on metadata resolution; 15 s is generous for a small metadata call over the internet. | — | How long Thuki waits for a Hugging Face API metadata call (repo file listing) to respond before giving up. Applies to resolving pasted repo ids and listing a repo's GGUF files, not to the model download itself. | | `HF_BASE_URL` | `https://huggingface.co` | No | Single origin for model metadata and downloads. Provenance comes from the pinned repo revisions in the curated starter registry, and those pins are only meaningful against the canonical Hub; an arbitrary mirror could serve different content under the same revision ids. | — | The Hugging Face origin Thuki uses for all model metadata calls and blob downloads. Every starter in the registry pins a repo at an exact revision and carries a compiled-in sha256 digest checked after download; the digest catches truncation, bit rot, and resume corruption, while the pinned revision on the canonical Hub is what fixes which content is fetched. | +| `HF_SEARCH_LIMIT` | `30` | No | The per-page step for the in-app model browser. The "Load more" control raises the requested page size in multiples of this value, so it is a layout step rather than a user preference. | — | How many GGUF model repos the first page of an in-app Hugging Face search returns, most-downloaded first. | +| `HF_SEARCH_LIMIT_MAX` | `120` | No | Defense-in-depth bound on request size: "Load more" grows the requested page size in `HF_SEARCH_LIMIT` steps, and this caps the largest single request so a runaway page count cannot ask the Hub for an unbounded result set. | — | The largest page size a single in-app Hugging Face search request may ask for, regardless of how many times "Load more" was pressed. | +| `MAX_MODEL_CONTEXT_LENGTH` | `1 M` | No | Defense-in-depth bound on attacker-controlled GGUF metadata: a repo's `context_length` is editable (`gguf_set_metadata.py`) and occasionally inflated, so a value above this sane ceiling is treated as untrustworthy and dropped rather than shown. Mirrors the `num_ctx` upper bound; 1 M tokens covers every current model. | — | The largest model context window Thuki will trust and display from a Browse-all repo's parsed GGUF metadata. A larger declared value is dropped (no context window shown) rather than rendered. Curated Staff Picks models carry a hand-vetted value in the registry instead. | +| `RUNTIME_OVERHEAD_GB` | `2.0` | No | Feeds the approximate RAM-fit hint shown in Library and Discover only; the authoritative per-starter memory estimates live in the model registry. A user-tunable overhead would imply a precision the hint does not claim. | — | Resident-memory overhead added on top of a model's weights size (KV cache plus runtime buffers) when estimating whether it fits in this Mac's RAM. | +| `MAX_HF_SEARCH_QUERY_LEN` | `200 bytes` | No | Defense-in-depth bound on attacker-influenced input: the query reaches the fixed Hub host (no SSRF) and is percent-encoded by the client, but an unbounded string is still rejected to cap request size. | — | The longest search string Thuki sends to the Hugging Face model search. A longer query is rejected before any network call. | | `OPENAI_MODELS_TIMEOUT_SECS` | `5 s` | No | Protocol cap on a hung server so the Settings model dropdown cannot stall; the OpenAI-compatible server is local or LAN-hosted in the common case, so 5 s is generous. | — | How long Thuki waits for an OpenAI-compatible server's `/v1/models` listing to respond before giving up. Applies to the Settings model dropdown for that provider, not to chat requests. | | `MAX_SSE_LINE_BYTES` | `1 MiB` | No | Defense-in-depth bound on attacker-controlled stream data. A malicious or broken chat server could otherwise grow a single stream line without limit and exhaust memory. | — | The longest single Server-Sent-Events line Thuki accepts while streaming a chat response from an OpenAI-compatible (`/v1`) server. A stream line exceeding this aborts the response with an error. | diff --git a/src-tauri/src/commands.rs b/src-tauri/src/commands.rs index 0fb7e5e5..8c1e796e 100644 --- a/src-tauri/src/commands.rs +++ b/src-tauri/src/commands.rs @@ -96,6 +96,11 @@ pub enum EngineErrorKind { /// The bundled engine's sidecar process failed to launch or crashed before /// passing its health check. EngineStartFailed, + /// The selected model's architecture is not supported by the bundled + /// engine build, so `llama-server` refused to load it. A setup nudge (try + /// another model), not a crash: the frontend renders it with the amber + /// warning accent rather than the red failure accent. + ModelUnsupported, /// The requested model has not been pulled yet (HTTP 404). ModelNotFound, /// No active model has been selected. The user must pick a model from @@ -259,6 +264,76 @@ async fn fetch_builtin_vision(client: &reqwest::Client, base_url: &str) -> bool } } +/// Condenses a multi-line engine failure detail into the single most +/// informative line for the error subtitle (which renders as one paragraph). +/// The captured stderr tail can be many timestamped lines, so this prefers the +/// FIRST line that reads like an actual error message ("error:", "error +/// loading", "failed to", "failed:") over one that merely contains the word +/// (a startup banner such as "log level: error"). llama.cpp prints the specific +/// root cause first ("error loading model: ") then generic trailers +/// ("failed to load", "exiting due to model loading error"), so the first +/// actionable match is the one to show. It falls back to any error/failure +/// mention, then to the last non-empty line; a single-line detail (e.g. a +/// health-check message) is returned unchanged. Classification upstream still +/// sees the full detail. +fn concise_detail(detail: &str) -> String { + let lines: Vec<&str> = detail + .lines() + .map(str::trim) + .filter(|line| !line.is_empty()) + .collect(); + match lines.as_slice() { + [] => detail.trim().to_string(), + [single] => (*single).to_string(), + many => many + .iter() + .find(|line| { + let lower = line.to_ascii_lowercase(); + lower.contains("error:") + || lower.contains("error loading") + || lower.contains("failed to") + || lower.contains("failed:") + }) + .or_else(|| { + many.iter().find(|line| { + let lower = line.to_ascii_lowercase(); + lower.contains("error") || lower.contains("failed") + }) + }) + .copied() + .unwrap_or(many[many.len() - 1]) + .to_string(), + } +} + +/// Maps a built-in engine start failure (the engine's own captured stderr, or +/// a health-check message) onto a user-facing [`EngineError`]. A llama.cpp +/// "unknown model architecture" failure means the bundled engine cannot run +/// this model, so it becomes a `ModelUnsupported` nudge to pick another model; +/// every other failure surfaces the concise reason under the generic +/// engine-start title so OOM, context-size, and projector mismatches stay +/// actionable. +/// +/// Pure so the classification and exact copy are unit-tested without a Tauri +/// runtime. Shared by `stream_builtin_chat` and `resolve_llm_transport`. +pub fn engine_start_error(detail: &str) -> EngineError { + let lower = detail.to_ascii_lowercase(); + if lower.contains("unknown model architecture") || lower.contains("unknown architecture") { + EngineError { + kind: EngineErrorKind::ModelUnsupported, + message: "Unsupported model\nThuki's engine doesn't support this arch yet. Try another model. Engine improves over time and may support it down the road.".to_string(), + } + } else { + EngineError { + kind: EngineErrorKind::EngineStartFailed, + message: format!( + "Thuki's engine could not start.\n{}", + concise_detail(detail) + ), + } + } +} + /// Runs the built-in-engine stage of a chat turn: mark activity, ensure the /// engine serves `target`, then stream via the `/v1` client at the engine's /// port. An engine activity guard is held for the whole turn (ensure, @@ -280,10 +355,12 @@ async fn fetch_builtin_vision(client: &reqwest::Client, base_url: &str) -> bool /// /// Returns the accumulated assistant content (empty on the error paths) so /// the caller's persistence tail treats every route identically. +#[allow(clippy::too_many_arguments)] pub(crate) async fn stream_builtin_chat( engine: &crate::engine::runner::EngineHandle, target: crate::engine::state::Target, model_id: String, + think: bool, mut messages: Vec, client: &reqwest::Client, cancel_token: CancellationToken, @@ -326,6 +403,7 @@ pub(crate) async fn stream_builtin_chat( messages, api_key: None, flavor: crate::openai::V1Flavor::Builtin, + enable_thinking: think, }, client, cancel_token, @@ -338,15 +416,71 @@ pub(crate) async fn stream_builtin_chat( String::new() } Some(Err(crate::engine::runner::EnsureError::StartFailed(detail))) => { - on_chunk(StreamChunk::Error(EngineError { - kind: EngineErrorKind::EngineStartFailed, - message: format!("Thuki's engine could not start.\n{detail}"), - })); + on_chunk(StreamChunk::Error(engine_start_error(&detail))); String::new() } } } +/// Sets `flag` when `chunk` carries reasoning output. The built-in runtime +/// backstop wires this into the chunk pump so it learns whether a model emitted +/// reasoning tokens even though reasoning was requested OFF. +pub(crate) fn observe_reasoning_chunk(chunk: &StreamChunk, flag: &std::sync::atomic::AtomicBool) { + if matches!(chunk, StreamChunk::ThinkingToken(_)) { + flag.store(true, std::sync::atomic::Ordering::Relaxed); + } +} + +/// Decides whether the runtime backstop should mark a built-in model as +/// always-reasoning. True only when reasoning was requested OFF (`!think`) yet +/// the model still streamed reasoning (`reasoning_seen`), the manifest does not +/// already record it as always (`!current_reasoning_always`), and the model is +/// not a curated starter (`!is_curated`, whose class is registry truth and must +/// never be overridden from behavior). +pub(crate) fn should_backstop_mark( + think: bool, + reasoning_seen: bool, + current_reasoning_always: bool, + is_curated: bool, +) -> bool { + !think && reasoning_seen && !current_reasoning_always && !is_curated +} + +/// Best-effort runtime backstop for the built-in engine: when a chat streamed +/// reasoning while reasoning was OFF, persist `reasoning_always` so the picker +/// badge and `/think` gate self-correct on the next read. Coverage-off: the +/// decision lives in [`should_backstop_mark`]; this wrapper only reads the row +/// and writes the flag. Never fails the turn (every error is logged and +/// swallowed). +#[cfg_attr(coverage_nightly, coverage(off))] +fn backstop_mark_reasoning_always( + db: &crate::history::Database, + model_id: &str, + think: bool, + reasoning_seen: bool, +) { + // Cheap exit before locking: only an OFF request that still saw reasoning + // can change anything. + if think || !reasoning_seen { + return; + } + let Ok(conn) = db.0.lock() else { return }; + let Ok(Some(row)) = crate::models::manifest::get(&conn, model_id) else { + return; + }; + let is_curated = crate::models::curated_reasoning_flags(&row.repo, &row.file_name).is_some(); + if should_backstop_mark(think, reasoning_seen, row.reasoning_always, is_curated) { + match crate::models::manifest::mark_reasoning_always(&conn, model_id) { + Ok(()) => { + eprintln!("thuki: [models] reasoning backstop: marked {model_id} always-reasoning") + } + Err(e) => { + eprintln!("thuki: [models] reasoning backstop: failed to mark {model_id}: {e}") + } + } + } +} + /// Reads the API key for an `openai`-kind provider from the secret store. /// Errors degrade to `None` with a stderr log: a missing or unreadable key /// must not block a keyless local `/v1` server. @@ -526,10 +660,7 @@ pub(crate) async fn resolve_llm_transport( Err(TransportError::Superseded) } Some(Err(crate::engine::runner::EnsureError::StartFailed(detail))) => { - Err(TransportError::Engine(EngineError { - kind: EngineErrorKind::EngineStartFailed, - message: format!("Thuki's engine could not start.\n{detail}"), - })) + Err(TransportError::Engine(engine_start_error(&detail))) } } } @@ -1177,16 +1308,35 @@ pub async fn ask_model( }; match target { Ok(target) => { - stream_builtin_chat( + // Observe whether reasoning streamed this turn so the + // runtime backstop can mark a model that reasons even with + // reasoning requested OFF (see `backstop_mark_reasoning_always`). + let reasoning_seen = + std::sync::Arc::new(std::sync::atomic::AtomicBool::new(false)); + let seen_for_pump = std::sync::Arc::clone(&reasoning_seen); + let backstop_model_id = model_id.clone(); + let builtin_pump = move |chunk: StreamChunk| { + observe_reasoning_chunk(&chunk, &seen_for_pump); + pump(chunk); + }; + let content = stream_builtin_chat( &engine, target, model_id, + think, messages, &client, cancel_token.clone(), - pump, + builtin_pump, ) - .await + .await; + backstop_mark_reasoning_always( + &db, + &backstop_model_id, + think, + reasoning_seen.load(std::sync::atomic::Ordering::Relaxed), + ); + content } Err(err) => { pump(StreamChunk::Error(err)); @@ -1206,6 +1356,9 @@ pub async fn ask_model( messages, api_key, flavor: crate::openai::V1Flavor::Remote, + // `/think` reasoning control is built-in only; a remote + // OpenAI-compatible server uses its own server-side defaults. + enable_thinking: false, }, &client, cancel_token.clone(), @@ -1336,6 +1489,89 @@ mod tests { ) } + #[test] + fn engine_start_error_unknown_architecture_is_model_unsupported() { + let err = engine_start_error( + "error loading model: unknown model architecture: 'deepseek4_mtp_support'", + ); + assert_eq!(err.kind, EngineErrorKind::ModelUnsupported); + assert_eq!( + err.message, + "Unsupported model\nThuki's engine doesn't support this arch yet. Try another model. Engine improves over time and may support it down the road." + ); + } + + #[test] + fn engine_start_error_matches_short_unknown_architecture_phrasing() { + assert_eq!( + engine_start_error("llama_model_load: unknown architecture").kind, + EngineErrorKind::ModelUnsupported + ); + } + + #[test] + fn engine_start_error_other_failures_surface_raw_reason() { + let err = engine_start_error("engine health check returned HTTP 500"); + assert_eq!(err.kind, EngineErrorKind::EngineStartFailed); + assert_eq!( + err.message, + "Thuki's engine could not start.\nengine health check returned HTTP 500" + ); + } + + #[test] + fn engine_start_error_condenses_a_multiline_non_arch_tail() { + let detail = "0.06 I log_info: loading\n0.06 E common_init: error loading model: out of memory\n0.06 I srv exiting"; + let err = engine_start_error(detail); + assert_eq!(err.kind, EngineErrorKind::EngineStartFailed); + assert_eq!( + err.message, + "Thuki's engine could not start.\n0.06 E common_init: error loading model: out of memory" + ); + } + + #[test] + fn concise_detail_returns_a_single_line_unchanged() { + assert_eq!( + concise_detail("engine did not become healthy before the deadline"), + "engine did not become healthy before the deadline" + ); + } + + #[test] + fn concise_detail_falls_back_to_the_last_line_without_an_error_marker() { + assert_eq!(concise_detail("first\nsecond\nthird"), "third"); + } + + #[test] + fn concise_detail_prefers_the_first_error_line_over_a_generic_trailer() { + // llama.cpp prints the root cause first, then generic "exiting due to + // ... error" trailers: the first match must win. + let tail = "I loading model\nE error loading model: out of memory\nE failed to load model\nE exiting due to model loading error"; + assert_eq!(concise_detail(tail), "E error loading model: out of memory"); + } + + #[test] + fn concise_detail_skips_a_benign_error_word_for_the_real_cause() { + // A startup banner mentions "error" as a log level; the actionable line + // is the real loading failure further down. The banner must not win. + let tail = "I log level set to error\nI loading model\nE error loading model: bad magic"; + assert_eq!(concise_detail(tail), "E error loading model: bad magic"); + } + + #[test] + fn concise_detail_falls_back_to_any_failure_mention() { + // No "error:"/"error loading"/"failed to" line, but a bare mention is + // still more informative than the last line, so it is preferred. + let tail = "I starting up\nW cuda error detected\nI shutting down"; + assert_eq!(concise_detail(tail), "W cuda error detected"); + } + + #[test] + fn concise_detail_empty_detail_is_empty() { + assert_eq!(concise_detail(" \n "), ""); + } + #[tokio::test] async fn streams_tokens_from_valid_response() { let mut server = mockito::Server::new_async().await; @@ -2756,6 +2992,7 @@ mod tests { let caps = Capabilities { vision: false, thinking: false, + reasoning_always: false, max_images: None, }; let stats = apply_capability_filter(&mut messages, &caps); @@ -2773,6 +3010,7 @@ mod tests { let caps = Capabilities { vision: true, thinking: false, + reasoning_always: false, max_images: None, }; let stats = apply_capability_filter(&mut messages, &caps); @@ -2794,6 +3032,7 @@ mod tests { let caps = Capabilities { vision: true, thinking: false, + reasoning_always: false, max_images: Some(1), }; let stats = apply_capability_filter(&mut messages, &caps); @@ -2809,6 +3048,7 @@ mod tests { let caps = Capabilities { vision: true, thinking: false, + reasoning_always: false, max_images: Some(2), }; let stats = apply_capability_filter(&mut messages, &caps); @@ -2822,6 +3062,7 @@ mod tests { let caps = Capabilities { vision: false, thinking: false, + reasoning_always: false, max_images: None, }; let stats = apply_capability_filter(&mut messages, &caps); @@ -2841,6 +3082,7 @@ mod tests { let caps = Capabilities { vision: true, thinking: false, + reasoning_always: false, max_images: Some(1), }; let stats = apply_capability_filter(&mut messages, &caps); @@ -3094,6 +3336,7 @@ mod tests { quant: "Q4_K_M".to_string(), vision: mmproj_sha256.is_some(), thinking: false, + reasoning_always: false, mmproj_file: mmproj_sha256.map(|_| format!("{id}-mmproj.gguf")), mmproj_sha256: mmproj_sha256.map(str::to_string), } @@ -3288,6 +3531,16 @@ mod tests { async fn kill(&mut self) { let _ = self.exit_tx.send(true); } + fn stderr_tail(&self) -> String { + String::new() + } + } + + #[test] + fn scripted_child_has_no_stderr_tail() { + let (exit_tx, exit_rx) = tokio::sync::watch::channel(false); + let child = ScriptedChild { exit_tx, exit_rx }; + assert_eq!(crate::engine::process::EngineChild::stderr_tail(&child), ""); } #[async_trait::async_trait] @@ -3362,6 +3615,7 @@ mod tests { &engine, engine_target(), "org/repo:m.gguf".to_string(), + false, vec![], &client, CancellationToken::new(), @@ -3399,6 +3653,7 @@ mod tests { &engine, engine_target(), "org/repo:m.gguf".to_string(), + false, vec![], &client, CancellationToken::new(), @@ -3452,6 +3707,7 @@ mod tests { &engine, engine_target(), "org/repo:m.gguf".to_string(), + false, vec![], &client, cancel_token, @@ -3496,6 +3752,7 @@ mod tests { &engine, engine_target(), "org/repo:m.gguf".to_string(), + false, vec![], &client, CancellationToken::new(), @@ -3531,6 +3788,32 @@ mod tests { assert!(!parse_props_vision(b"not json"), "malformed body"); } + #[test] + fn observe_reasoning_chunk_sets_flag_only_on_thinking_token() { + let flag = std::sync::atomic::AtomicBool::new(false); + observe_reasoning_chunk(&StreamChunk::Token("hi".into()), &flag); + assert!(!flag.load(std::sync::atomic::Ordering::Relaxed)); + observe_reasoning_chunk(&StreamChunk::Done, &flag); + assert!(!flag.load(std::sync::atomic::Ordering::Relaxed)); + observe_reasoning_chunk(&StreamChunk::ThinkingToken("step".into()), &flag); + assert!(flag.load(std::sync::atomic::Ordering::Relaxed)); + } + + #[test] + fn should_backstop_mark_only_fires_for_surprising_pasted_reasoning() { + // Reasoning requested OFF, model still reasoned, not yet recorded, not + // curated: the one case that should mark. + assert!(should_backstop_mark(false, true, false, false)); + // /think was on: expected reasoning, never a surprise. + assert!(!should_backstop_mark(true, true, false, false)); + // No reasoning streamed: nothing to learn. + assert!(!should_backstop_mark(false, false, false, false)); + // Already recorded as always: no redundant write. + assert!(!should_backstop_mark(false, true, true, false)); + // Curated starter: registry is truth, never override from behavior. + assert!(!should_backstop_mark(false, true, false, true)); + } + #[tokio::test] async fn fetch_builtin_vision_transport_error_fails_closed() { let client = reqwest::Client::new(); @@ -3612,6 +3895,7 @@ mod tests { &engine, engine_target(), "org/repo:m.gguf".to_string(), + false, image_message(), &client, CancellationToken::new(), diff --git a/src-tauri/src/config/defaults.rs b/src-tauri/src/config/defaults.rs index ed0c8b30..d3f3bc84 100644 --- a/src-tauri/src/config/defaults.rs +++ b/src-tauri/src/config/defaults.rs @@ -67,6 +67,16 @@ pub const DEFAULT_NUM_CTX: u32 = 16384; /// current consumer model including the largest 1 M-context variants. pub const BOUNDS_NUM_CTX: (u32, u32) = (2048, 1_048_576); +/// Upper bound on a model's context window that Thuki will trust and display +/// from external GGUF metadata (the `context_length` field of an arbitrary +/// Hugging Face repo, shown in the Browse-all listing). Defense-in-depth: the +/// field is attacker-controllable and editable (`gguf_set_metadata.py`), so a +/// value above this sane ceiling is treated as untrustworthy and dropped rather +/// than rendered. Mirrors the [`BOUNDS_NUM_CTX`] upper bound: 1 M tokens covers +/// every current model. Why not tunable: it bounds attacker-controlled data, a +/// security guard rather than a user preference. +pub const MAX_MODEL_CONTEXT_LENGTH: u32 = 1_048_576; + /// Accepted range for `keep_warm_inactivity_minutes`. /// -1 = keep resident forever, 0 = provider's natural short default (~5 min), /// 1..=1440 = explicit timeout. Values below -1 or above 1440 are clamped to @@ -119,6 +129,25 @@ pub const ENGINE_IDLE_CHECK_INTERVAL_SECS: u64 = 30; /// normal use. pub const ENGINE_COMMAND_QUEUE_CAPACITY: usize = 64; +/// Number of trailing `llama-server` stderr lines the runner retains so a +/// crash can report the engine's own reason (e.g. "unknown model +/// architecture") instead of a generic message. Not user-tunable: +/// defense-in-depth bound on subprocess output; 20 lines covers the final +/// load-error block llama.cpp prints without retaining its whole log. +pub const ENGINE_STDERR_TAIL_LINES: usize = 20; + +/// Maximum bytes buffered (and retained) per captured engine stderr line. Not +/// user-tunable: defense-in-depth bound so one pathological newline-less line +/// (e.g. an enormous architecture string echoed from crafted GGUF metadata) +/// cannot force an unbounded read allocation; bytes past the cap are dropped. +pub const ENGINE_STDERR_TAIL_LINE_MAX_BYTES: usize = 500; + +/// Reason reported when the built-in engine process exits without leaving any +/// stderr we could capture (e.g. an external SIGKILL). Not user-tunable: +/// internal diagnostic fallback surfaced only when the real reason is +/// unavailable. +pub const ENGINE_CRASH_FALLBACK_MESSAGE: &str = "engine process exited unexpectedly"; + /// Minimum interval between Progress events emitted during a model download. /// Bounds IPC channel traffic: a fast local connection can deliver thousands /// of chunks per second and the UI only needs a few updates per second. Not @@ -404,11 +433,52 @@ pub const OPENAI_MODELS_TIMEOUT_SECS: u64 = 5; /// the integrity guarantees that make the curated starter registry safe. pub const HF_BASE_URL: &str = "https://huggingface.co"; +/// Page size for the in-app Hugging Face GGUF model search. The Discover +/// "Load more" control raises the requested limit in multiples of this value. +/// Baked-in: the per-page step for the browser, not a user preference. +pub const HF_SEARCH_LIMIT: usize = 30; + +/// Hard cap on a single Hugging Face search request's page size. "Load more" +/// grows the requested limit in [`HF_SEARCH_LIMIT`] steps; this bounds the +/// largest single request so a runaway page count cannot ask the Hub for an +/// unbounded result set. Baked-in: defense-in-depth bound on request size. +pub const HF_SEARCH_LIMIT_MAX: usize = 120; + +/// Approximate resident-memory overhead in GiB added on top of a model's +/// weights size when estimating whether it fits in this Mac's RAM (the KV +/// cache at the default context plus runtime buffers). Baked-in: feeds the +/// RAM-fit *hint* in Library/Discover only; the authoritative per-starter +/// estimates live in the model registry. +pub const RUNTIME_OVERHEAD_GB: f64 = 2.0; + +/// Maximum accepted byte length for a Hugging Face search query before it is +/// sent upstream. Defense-in-depth bound on attacker-influenced input: the +/// query reaches the fixed Hub host (no SSRF) and is percent-encoded by the +/// client, but an unbounded string is still rejected to cap request size. +pub const MAX_HF_SEARCH_QUERY_LEN: usize = 200; + /// Maximum accepted byte length for a model slug passed to `set_active_model`. /// Real Ollama slugs are a handful of characters; 256 is generous while still /// capping adversarial inputs long before any network or database work. pub const MAX_MODEL_SLUG_LEN: usize = 256; +/// Maximum metadata key-value pairs the GGUF reader will scan before giving +/// up. Real GGUF models carry a few dozen KV entries; 4096 never truncates a +/// legitimate header while bounding a malformed `metadata_kv_count` so the +/// reasoning-classifier scan cannot loop on a corrupt or hostile file. +pub const MAX_GGUF_KV_COUNT: u64 = 4096; + +/// Maximum accepted byte length for a single GGUF metadata key. Keys are short +/// dotted identifiers (`tokenizer.chat_template`); 1 KiB is far above any real +/// key and stops a corrupt length field from forcing a huge allocation. +pub const MAX_GGUF_KEY_BYTES: u64 = 1024; + +/// Maximum accepted byte length for a GGUF string value the reader actually +/// materializes (the chat template and architecture). Real chat templates run +/// a few KB to ~100 KB; 4 MiB never truncates one while bounding the memory a +/// corrupt or hostile length field can demand. +pub const MAX_GGUF_STRING_BYTES: u64 = 4 * 1024 * 1024; + /// Authoritative allowlist of `(section, key)` pairs the Settings GUI is /// permitted to write via the `set_config_field` Tauri command. /// diff --git a/src-tauri/src/database.rs b/src-tauri/src/database.rs index 9bb412bc..329e1c8a 100644 --- a/src-tauri/src/database.rs +++ b/src-tauri/src/database.rs @@ -229,6 +229,11 @@ fn run_migrations(conn: &Connection) -> SqlResult<()> { // this migration. ensure_column(conn, "messages", "model_name", "TEXT")?; + // Reasoning-capability class for installed models. NULL for rows written + // before the dynamic classifier existed; the startup heal re-classifies + // those, and every new install writes a non-NULL 0/1. + ensure_column(conn, "installed_models", "reasoning_always", "INTEGER")?; + Ok(()) } diff --git a/src-tauri/src/engine/process.rs b/src-tauri/src/engine/process.rs index 50a4df0f..c5c1432f 100644 --- a/src-tauri/src/engine/process.rs +++ b/src-tauri/src/engine/process.rs @@ -8,13 +8,17 @@ //! poll loop, command-line construction) lives in pure functions tested //! directly. +use std::collections::VecDeque; use std::future::Future; use std::path::PathBuf; +use std::sync::{Arc, Mutex}; use std::time::Duration; use async_trait::async_trait; -use crate::config::defaults::ENGINE_HEALTH_PROBE_TIMEOUT_SECS; +use crate::config::defaults::{ + ENGINE_HEALTH_PROBE_TIMEOUT_SECS, ENGINE_STDERR_TAIL_LINES, ENGINE_STDERR_TAIL_LINE_MAX_BYTES, +}; /// Everything needed to launch one engine process. #[derive(Debug, Clone, PartialEq)] @@ -36,6 +40,12 @@ pub trait EngineChild: Send { async fn wait_exit(&mut self); /// Kills the process and waits for the exit to land. async fn kill(&mut self); + /// The captured tail of the process's stderr, ready to read once + /// `wait_exit` has resolved (which drains the stream to EOF). The runner + /// surfaces this as the crash reason so an engine load failure reports the + /// engine's own message instead of a generic string. Empty when the + /// process left no stderr. + fn stderr_tail(&self) -> String; } /// Spawn-and-probe seam between the runner actor and the operating system. @@ -118,7 +128,7 @@ pub struct TokioEngineProcess { } /// Pure: the `llama-server` command line for one spawn: -/// `-m [--mmproj

] --ctx-size --host 127.0.0.1 --port

--no-webui`. +/// `-m [--mmproj

] --ctx-size --host 127.0.0.1 --port

--no-webui --parallel 1`. fn llama_server_args(args: &SpawnArgs) -> Vec { let mut argv: Vec = vec!["-m".into(), args.model_path.clone().into()]; if let Some(mmproj) = &args.mmproj_path { @@ -132,12 +142,93 @@ fn llama_server_args(args: &SpawnArgs) -> Vec { argv.push("--port".into()); argv.push(args.port.to_string().into()); argv.push("--no-webui".into()); + // Single decode slot. Thuki is single-user, so it never needs parallel + // slots, and the default (n_parallel = 4) actively hurts: the summon-time + // warm-up prime and the user's first message can land on different KV + // slots, so the first message re-does the full system-prompt prefill cold + // instead of reusing the prime's cache (slow first turn, fast after). One + // slot also gives the conversation the full --ctx-size instead of ctx / 4. + argv.push("--parallel".into()); + argv.push("1".into()); argv } -/// A spawned `llama-server` process. +/// Pure: turns one captured line's raw bytes into a stored tail line. Lossy +/// UTF-8 so invalid bytes from a corrupt stream never panic, with trailing +/// whitespace (e.g. a `\r` from CRLF) trimmed. +fn finalize_stderr_line(bytes: &[u8]) -> String { + String::from_utf8_lossy(bytes).trim_end().to_string() +} + +/// Pure: feeds one chunk of stderr bytes through the bounded line accumulator. +/// `current` carries the in-progress line across chunk boundaries; on each +/// `\n` the completed line is finalized and pushed into the tail ring. Bytes +/// past `ENGINE_STDERR_TAIL_LINE_MAX_BYTES` on a single line are dropped, so +/// peak per-line memory stays bounded regardless of how rarely the stream +/// emits a newline (a hard cap on read buffering, not just retained memory). +fn ingest_stderr_chunk(chunk: &[u8], current: &mut Vec, tail: &mut VecDeque) { + for &byte in chunk { + if byte == b'\n' { + push_stderr_line( + tail, + finalize_stderr_line(current), + ENGINE_STDERR_TAIL_LINES, + ); + current.clear(); + } else if current.len() < ENGINE_STDERR_TAIL_LINE_MAX_BYTES { + current.push(byte); + } + } +} + +/// Pure: appends a captured line to the bounded tail ring, dropping the oldest +/// line once `max_lines` is exceeded so only the trailing window is kept. +fn push_stderr_line(buf: &mut VecDeque, line: String, max_lines: usize) { + buf.push_back(line); + while buf.len() > max_lines { + buf.pop_front(); + } +} + +/// Pure: joins the retained tail lines into one newline-separated string. +fn join_stderr_tail(buf: &VecDeque) -> String { + buf.iter().cloned().collect::>().join("\n") +} + +/// Drains a child's stderr pipe into the bounded tail ring until EOF. Reads in +/// fixed-size chunks (not unbounded lines) and delegates all splitting and +/// bounding to [`ingest_stderr_chunk`], so a stream that never emits a newline +/// cannot force an unbounded allocation. Coverage-off: thin I/O over the tested +/// ingester; a trailing newline-less line (e.g. a process killed mid-line) is +/// flushed after EOF. +#[cfg_attr(coverage_nightly, coverage(off))] +async fn pump_stderr(pipe: tokio::process::ChildStderr, tail: Arc>>) { + use tokio::io::AsyncReadExt; + let mut reader = tokio::io::BufReader::new(pipe); + let mut chunk = [0u8; 4096]; + let mut current: Vec = Vec::new(); + loop { + match reader.read(&mut chunk).await { + Ok(0) | Err(_) => break, + Ok(n) => ingest_stderr_chunk(&chunk[..n], &mut current, &mut tail.lock().unwrap()), + } + } + if !current.is_empty() { + push_stderr_line( + &mut tail.lock().unwrap(), + finalize_stderr_line(¤t), + ENGINE_STDERR_TAIL_LINES, + ); + } +} + +/// A spawned `llama-server` process. `stderr_tail` is the shared bounded ring +/// the reader task fills; the reader handle is joined on exit so the tail is +/// complete before the runner reads it. struct TokioChild { inner: tokio::process::Child, + stderr_tail: Arc>>, + reader: Option>, } #[async_trait] @@ -145,12 +236,26 @@ impl EngineChild for TokioChild { #[cfg_attr(coverage_nightly, coverage(off))] async fn wait_exit(&mut self) { let _ = self.inner.wait().await; + // Join the reader so the stderr tail is fully drained to EOF (which + // coincides with the pipe closing at process exit) before the runner + // reads the crash reason. + if let Some(reader) = self.reader.take() { + let _ = reader.await; + } } #[cfg_attr(coverage_nightly, coverage(off))] async fn kill(&mut self) { let _ = self.inner.start_kill(); let _ = self.inner.wait().await; + if let Some(reader) = self.reader.take() { + let _ = reader.await; + } + } + + #[cfg_attr(coverage_nightly, coverage(off))] + fn stderr_tail(&self) -> String { + join_stderr_tail(&self.stderr_tail.lock().unwrap()) } } @@ -158,14 +263,29 @@ impl EngineChild for TokioChild { impl EngineProcess for TokioEngineProcess { #[cfg_attr(coverage_nightly, coverage(off))] async fn spawn(&self, args: &SpawnArgs) -> Result, String> { - let child = tokio::process::Command::new(&self.binary) + let mut child = tokio::process::Command::new(&self.binary) .args(llama_server_args(args)) .stdout(std::process::Stdio::null()) - .stderr(std::process::Stdio::null()) + // Capture stderr so a load failure (e.g. "unknown model + // architecture") reaches the user instead of being discarded. + .stderr(std::process::Stdio::piped()) .kill_on_drop(true) .spawn() .map_err(|e| e.to_string())?; - Ok(Box::new(TokioChild { inner: child })) + + let stderr_tail = Arc::new(Mutex::new(VecDeque::new())); + // Drain stderr into the bounded tail ring. The task ends when the pipe + // closes at process exit; `wait_exit`/`kill` join it. + let reader = child + .stderr + .take() + .map(|pipe| tokio::spawn(pump_stderr(pipe, Arc::clone(&stderr_tail)))); + + Ok(Box::new(TokioChild { + inner: child, + stderr_tail, + reader, + })) } #[cfg_attr(coverage_nightly, coverage(off))] @@ -215,10 +335,68 @@ mod tests { "--port", "4242", "--no-webui", + "--parallel", + "1", ] ); } + #[test] + fn finalize_stderr_line_is_lossy_and_trims_trailing() { + assert_eq!(finalize_stderr_line(b"hello"), "hello"); + // Trailing CR (CRLF) and spaces are trimmed. + assert_eq!(finalize_stderr_line(b"hello\r"), "hello"); + // Invalid UTF-8 never panics; it becomes the replacement char. + assert_eq!(finalize_stderr_line(&[b'h', b'i', 0xFF]), "hi\u{FFFD}"); + } + + #[test] + fn ingest_stderr_chunk_splits_on_newlines_and_carries_across_chunks() { + let mut tail = VecDeque::new(); + let mut current = Vec::new(); + // No newline yet: nothing pushed, line held in `current`. + ingest_stderr_chunk(b"ab", &mut current, &mut tail); + assert!(tail.is_empty()); + // Completes "abc", then starts "d". + ingest_stderr_chunk(b"c\nd", &mut current, &mut tail); + assert_eq!(tail.iter().cloned().collect::>(), vec!["abc"]); + ingest_stderr_chunk(b"\n", &mut current, &mut tail); + assert_eq!(tail.iter().cloned().collect::>(), vec!["abc", "d"]); + } + + #[test] + fn ingest_stderr_chunk_caps_an_overlong_newlineless_line() { + let mut tail = VecDeque::new(); + let mut current = Vec::new(); + // A flood longer than the per-line cap, with no newline, must not grow + // `current` past the cap: peak read buffering is bounded. + let flood = vec![b'x'; ENGINE_STDERR_TAIL_LINE_MAX_BYTES + 100]; + ingest_stderr_chunk(&flood, &mut current, &mut tail); + assert_eq!(current.len(), ENGINE_STDERR_TAIL_LINE_MAX_BYTES); + assert!(tail.is_empty()); + ingest_stderr_chunk(b"\n", &mut current, &mut tail); + assert_eq!(tail.len(), 1); + assert_eq!(tail[0].len(), ENGINE_STDERR_TAIL_LINE_MAX_BYTES); + } + + #[test] + fn push_stderr_line_keeps_only_the_trailing_window() { + let mut buf = VecDeque::new(); + push_stderr_line(&mut buf, "a".to_string(), 2); + push_stderr_line(&mut buf, "b".to_string(), 2); + push_stderr_line(&mut buf, "c".to_string(), 2); + assert_eq!(buf.iter().cloned().collect::>(), vec!["b", "c"]); + } + + #[test] + fn join_stderr_tail_newline_joins_in_order() { + let mut buf = VecDeque::new(); + assert_eq!(join_stderr_tail(&buf), ""); + push_stderr_line(&mut buf, "first".to_string(), 8); + push_stderr_line(&mut buf, "second".to_string(), 8); + assert_eq!(join_stderr_tail(&buf), "first\nsecond"); + } + #[test] fn llama_server_args_with_mmproj() { assert_eq!( @@ -235,6 +413,8 @@ mod tests { "--port", "4242", "--no-webui", + "--parallel", + "1", ] ); } diff --git a/src-tauri/src/engine/runner.rs b/src-tauri/src/engine/runner.rs index e8ea05f4..c97d0e92 100644 --- a/src-tauri/src/engine/runner.rs +++ b/src-tauri/src/engine/runner.rs @@ -22,7 +22,8 @@ use tokio::sync::{mpsc, oneshot, watch}; use super::process::{poll_until_healthy, EngineChild, EngineProcess, SpawnArgs}; use super::state::{step, Effect, EngineState, Event, Target}; use crate::config::defaults::{ - ENGINE_COMMAND_QUEUE_CAPACITY, ENGINE_HEALTH_DEADLINE_SECS, ENGINE_HEALTH_POLL_INTERVAL_MS, + ENGINE_COMMAND_QUEUE_CAPACITY, ENGINE_CRASH_FALLBACK_MESSAGE, ENGINE_HEALTH_DEADLINE_SECS, + ENGINE_HEALTH_POLL_INTERVAL_MS, }; /// Snapshot of the engine lifecycle published through the status watch. @@ -340,6 +341,18 @@ impl Core { } } +/// Pure: the crash reason for a `ChildCrashed` event. Uses the engine's +/// captured stderr tail when it carries anything, otherwise the generic +/// fallback (an external SIGKILL leaves no stderr to report). +fn crash_reason(stderr_tail: &str) -> String { + let trimmed = stderr_tail.trim(); + if trimmed.is_empty() { + ENGINE_CRASH_FALLBACK_MESSAGE.to_string() + } else { + trimmed.to_string() + } +} + /// What woke the actor loop. enum Wake { Cmd(Option), @@ -436,12 +449,19 @@ async fn run_actor( core.dispatch(Event::SpawnFailed(error)).await; } Wake::ChildExit => { + // Read the captured stderr tail before dropping the child so + // the crash reports the engine's own message (e.g. "unknown + // model architecture") instead of a generic string. + let reason = crash_reason( + &core + .child + .as_ref() + .map(|child| child.stderr_tail()) + .unwrap_or_default(), + ); core.child = None; core.health = None; - core.dispatch(Event::ChildCrashed( - "engine process exited unexpectedly".to_string(), - )) - .await; + core.dispatch(Event::ChildCrashed(reason)).await; } Wake::Tick => { if in_flight.load(Ordering::SeqCst) > 0 { @@ -485,6 +505,9 @@ mod tests { probes_served: usize, log: Vec, current_exit: Option>>, + /// Stderr tail handed to the next spawned child, mirroring the real + /// process's captured stderr. + next_stderr_tail: String, } /// Scriptable [`EngineProcess`]: records every spawn, hands out @@ -524,6 +547,11 @@ mod tests { .push_back(message.to_string()); } + /// Sets the stderr tail the next spawned child reports on exit. + fn push_stderr_tail(&self, tail: &str) { + self.inner.lock().unwrap().next_stderr_tail = tail.to_string(); + } + /// Makes the live child exit without a kill being issued. fn crash_current(&self) { let exit = { @@ -544,6 +572,7 @@ mod tests { inner: Arc>, exit_tx: Arc>, exit_rx: watch::Receiver, + stderr_tail: String, } #[async_trait::async_trait] @@ -552,6 +581,10 @@ mod tests { let _ = self.exit_rx.wait_for(|exited| *exited).await; } + fn stderr_tail(&self) -> String { + self.stderr_tail.clone() + } + async fn kill(&mut self) { { let mut inner = self.inner.lock().unwrap(); @@ -580,10 +613,12 @@ mod tests { let (exit_tx, exit_rx) = watch::channel(false); let exit_tx = Arc::new(exit_tx); inner.current_exit = Some(Arc::clone(&exit_tx)); + let stderr_tail = std::mem::take(&mut inner.next_stderr_tail); Ok(Box::new(FakeChild { inner: Arc::clone(&self.inner), exit_tx, exit_rx, + stderr_tail, })) } @@ -1016,6 +1051,38 @@ mod tests { assert_eq!(process.snapshot(|i| i.kills), 0); } + #[test] + fn crash_reason_prefers_stderr_tail_over_fallback() { + assert_eq!(crash_reason(""), ENGINE_CRASH_FALLBACK_MESSAGE); + assert_eq!(crash_reason(" \n "), ENGINE_CRASH_FALLBACK_MESSAGE); + assert_eq!( + crash_reason("error loading model: unknown model architecture: 'x'\n"), + "error loading model: unknown model architecture: 'x'" + ); + } + + /// A crash surfaces the engine's captured stderr as the failure reason, so + /// an unsupported-architecture load failure reaches the user verbatim + /// instead of collapsing to the generic exit message. + #[tokio::test(start_paused = true)] + async fn crash_surfaces_captured_stderr_reason() { + let process = FakeProcess::new(); + let handle = spawn_handle(&process, 0); + + process.push_stderr_tail( + "error loading model: unknown model architecture: 'deepseek4_mtp_support'", + ); + load(&handle, &process, "a").await; + process.crash_current(); + + let mut rx = handle.status(); + wait_for_state(&mut rx, "failed").await; + assert_eq!( + rx.borrow().error.as_deref(), + Some("error loading model: unknown model architecture: 'deepseek4_mtp_support'") + ); + } + // ── Runner: idle unload ──────────────────────────────────────────── #[tokio::test(start_paused = true)] diff --git a/src-tauri/src/history.rs b/src-tauri/src/history.rs index a4c8a104..54d5dc27 100644 --- a/src-tauri/src/history.rs +++ b/src-tauri/src/history.rs @@ -296,6 +296,10 @@ pub(crate) async fn generate_title_text( messages: title_messages, api_key: api_key.clone(), flavor: *flavor, + // Title generation answers directly; the built-in engine + // must not run a thinking pass (it would burn the token + // budget before producing the title). + enable_thinking: false, }, client, cancel_token, diff --git a/src-tauri/src/lib.rs b/src-tauri/src/lib.rs index 57f725fe..5994d11d 100644 --- a/src-tauri/src/lib.rs +++ b/src-tauri/src/lib.rs @@ -135,13 +135,55 @@ use _thuki_panel::ThukiPanel; #[cfg(target_os = "macos")] mod _settings_panel { use tauri::Manager; + use tauri_nspanel::TrackingAreaOptions; tauri_nspanel::tauri_panel! { panel!(ThukiSettingsPanel { config: { can_become_key_window: true, is_floating_panel: true } + with: { + // Same hover-activate rationale as ThukiPanel. Settings is a + // nonactivating panel with hides_on_deactivate(false), so once + // it is defocused (the user clicks another app) a plain click + // can never regain key on modern macOS and the webview drops + // clicks, drag, and hover - the form inputs go dead. An + // `active_always` tracking area keeps mouse events flowing while + // the app is inactive, and the mouse-entered callback (wired in + // `init_settings_panel`) makes the panel key on cursor-enter so + // the inputs come back without activating the app. + tracking_area: { + options: TrackingAreaOptions::new() + .active_always() + .mouse_entered_and_exited() + .mouse_moved() + .cursor_update(), + auto_resize: true + } + } }) + panel_event!(ThukiSettingsEventsInner {}) + } + + /// Constructs the mouse-event handler and attaches it to the Settings panel. + /// + /// Mirrors `attach_overlay_event_handler` for ThukiPanel: the mouse-entered + /// callback makes the Settings overlay the key window the instant the cursor + /// enters it, restoring clicks/drag/typing after the panel has been + /// defocused (see the tracking-area comment on the panel). + pub fn attach_settings_event_handler(app_handle: tauri::AppHandle) { + use tauri_nspanel::ManagerExt; + let Ok(panel) = app_handle.get_webview_panel("settings") else { + return; + }; + let cb_handle = app_handle.clone(); + let events = ThukiSettingsEventsInner::new(); + events.on_mouse_entered(move |_event| { + if let Ok(p) = cb_handle.get_webview_panel("settings") { + p.make_key_window(); + } + }); + panel.set_event_handler(Some(events.as_ref())); } } #[cfg(target_os = "macos")] @@ -157,13 +199,55 @@ use _settings_panel::ThukiSettingsPanel; #[cfg(target_os = "macos")] mod _update_panel { use tauri::Manager; + use tauri_nspanel::TrackingAreaOptions; tauri_nspanel::tauri_panel! { panel!(ThukiUpdatePanel { config: { can_become_key_window: true, is_floating_panel: true } + with: { + // Same hover-activate rationale as ThukiPanel. The update panel + // is nonactivating with hides_on_deactivate(false), so after it + // is defocused a plain click can never regain key on modern + // macOS and the webview drops clicks, drag, and hover - the four + // action buttons go dead. An `active_always` tracking area keeps + // mouse events flowing while the app is inactive, and the + // mouse-entered callback (wired in `init_update_panel`) makes the + // panel key on cursor-enter so the buttons come back without + // activating the app. + tracking_area: { + options: TrackingAreaOptions::new() + .active_always() + .mouse_entered_and_exited() + .mouse_moved() + .cursor_update(), + auto_resize: true + } + } }) + panel_event!(ThukiUpdateEventsInner {}) + } + + /// Constructs the mouse-event handler and attaches it to the update panel. + /// + /// Mirrors `attach_overlay_event_handler` for ThukiPanel: the mouse-entered + /// callback makes the update overlay the key window the instant the cursor + /// enters it, restoring clicks after the panel has been defocused (see the + /// tracking-area comment on the panel). + pub fn attach_update_event_handler(app_handle: tauri::AppHandle) { + use tauri_nspanel::ManagerExt; + let Ok(panel) = app_handle.get_webview_panel("update") else { + return; + }; + let cb_handle = app_handle.clone(); + let events = ThukiUpdateEventsInner::new(); + events.on_mouse_entered(move |_event| { + if let Ok(p) = cb_handle.get_webview_panel("update") { + p.make_key_window(); + } + }); + panel.set_event_handler(Some(events.as_ref())); } } #[cfg(target_os = "macos")] @@ -433,10 +517,9 @@ fn show_overlay(app_handle: &tauri::AppHandle, ctx: crate::context::ActivationCo // Pre-load the active model so the user's first message does not pay // the cold-start penalty. Fires on all show paths: double-tap, tray, // and first-launch auto-show. Branches by the active provider's kind: - // Ollama keeps its native /api/chat warmup, the built-in engine gets a - // /v1 prime ONLY when it is already serving (summoning the overlay must - // never load a model implicitly), and openai providers get no warmup - // (nothing local to warm). + // Ollama warms via its native /api/chat, the built-in engine starts + // (or reuses) its sidecar and primes the KV cache, and openai providers + // get no warmup (nothing local to warm). let warmup_kind = app_handle .state::>() .read() @@ -484,29 +567,45 @@ fn show_overlay(app_handle: &tauri::AppHandle, ctx: crate::context::ActivationCo } } crate::config::defaults::PROVIDER_KIND_BUILTIN => { - let status = app_handle - .state::() - .status() - .borrow() - .clone(); - if let Some(port) = warmup::builtin_prime_port(&status) { - let (model, system_prompt) = { - let cfg = app_handle - .state::>() - .read() - .clone(); - ( - cfg.inference.active_provider_model().to_string(), - cfg.prompt.resolved_system.clone(), - ) + let (model_id, num_ctx, system_prompt) = { + let cfg_state = app_handle.state::>(); + let cfg = cfg_state.read(); + ( + cfg.inference.active_provider_model().to_string(), + cfg.inference.num_ctx, + cfg.prompt.resolved_system.clone(), + ) + }; + if warmup::builtin_should_warm(&model_id) { + let store = app_handle.state::(); + let db = app_handle.state::(); + // Resolve the manifest row to an engine Target inside a scope so + // the connection guard drops before the spawned load. A poisoned + // lock is recovered: an unrelated panic does not invalidate it. + let target = { + let conn = match db.0.lock() { + Ok(conn) => conn, + Err(poisoned) => poisoned.into_inner(), + }; + crate::commands::builtin_target(&conn, &store, &model_id, num_ctx) }; - let client = app_handle.state::().inner().clone(); - tauri::async_runtime::spawn(warmup::prime_builtin( - port, - model, - system_prompt, - client, - )); + // A missing/uninstalled model yields an Err; warmup is + // best-effort, so just skip rather than surfacing anything. + if let Ok(target) = target { + let engine = app_handle + .state::() + .inner() + .clone(); + let client = app_handle.state::().inner().clone(); + tauri::async_runtime::spawn(warmup::warm_builtin( + app_handle.clone(), + engine, + target, + model_id, + system_prompt, + client, + )); + } } } _ => {} @@ -595,7 +694,7 @@ fn show_overlay(app_handle: &tauri::AppHandle, ctx: crate::context::ActivationCo /// the OS-default spawn position or previous moves. #[cfg_attr(coverage_nightly, coverage(off))] fn position_settings_window(window: &tauri::WebviewWindow) { - const SETTINGS_WIDTH: f64 = 580.0; + const SETTINGS_WIDTH: f64 = 760.0; // macOS menu bar is ~24 px logical on standard displays; notched MacBooks // push it to ~37 px. 72 px gives a comfortable ~35-48 px visual gap below // the menu bar on all hardware. @@ -1470,6 +1569,11 @@ fn init_settings_panel(app_handle: &tauri::AppHandle) { .can_join_all_spaces() .into(), ); + // Hover-activate: take key focus the moment the cursor enters the + // Settings overlay, mirroring init_panel. Pairs with the + // `active_always` tracking area on ThukiSettingsPanel so a defocused + // nonactivating panel regains key without activating the app. + _settings_panel::attach_settings_event_handler(app_handle.clone()); } Err(e) => { eprintln!("thuki: [settings] NSPanel conversion failed: {e:?}"); @@ -1515,6 +1619,11 @@ fn init_update_panel(app_handle: &tauri::AppHandle) { .can_join_all_spaces() .into(), ); + // Hover-activate: take key focus the moment the cursor enters the + // update overlay, mirroring init_panel. Pairs with the + // `active_always` tracking area on ThukiUpdatePanel so a defocused + // nonactivating panel regains key without activating the app. + _update_panel::attach_update_event_handler(app_handle.clone()); } Err(e) => { eprintln!("thuki: [update] NSPanel conversion failed: {e:?}"); @@ -1942,6 +2051,8 @@ pub fn run() { let _ = warmup_handle.emit("warmup:model-loaded", model); }, ))); + // Port-keyed dedup + cue state for the built-in engine warm-up. + app.manage(warmup::BuiltinWarmState::default()); // ── Configuration (TOML file at app_config_dir) ───────── // Loaded once at startup. Missing file -> seed defaults. @@ -2144,11 +2255,18 @@ pub fn run() { initial_active_model, ))); app.manage(models::ModelCapabilitiesCache::default()); - app.manage(history::Database(std::sync::Mutex::new(db_conn))); // ── Model blob store + download slot for the built-in engine ── let model_store = models::storage::ModelStore::new(app_data_dir.join("models")) .expect("failed to initialise model blob store"); + + // One-time heal: classify any installed models recorded before the + // dynamic reasoning classifier existed (reasoning_always IS NULL), + // reading each model's local GGUF, so the picker badge and /think + // gate are correct without waiting for the first chat. + models::heal_unclassified_reasoning(&db_conn, &model_store); + + app.manage(history::Database(std::sync::Mutex::new(db_conn))); app.manage(model_store); app.manage(models::DownloadState::default()); @@ -2192,6 +2310,14 @@ pub fn run() { tauri::async_runtime::spawn(async move { while status_rx.changed().await.is_ok() { let status = status_rx.borrow_and_update().clone(); + // A load left memory (idle-unload, model switch, crash): + // drop the built-in warm-up dedup so the next load primes + // fresh even when the OS reuses the same port. The dedup + // is keyed on port, so a stale primed record would + // otherwise skip the cold reload's prime. + if status.state != "loaded" { + status_handle.state::().reset(); + } let _ = status_handle.emit("engine:status", status); } }); @@ -2249,16 +2375,22 @@ pub fn run() { #[cfg(not(coverage))] models::get_starter_options, #[cfg(not(coverage))] + models::get_staff_picks, + #[cfg(not(coverage))] models::get_system_ram_bytes, #[cfg(not(coverage))] models::get_models_dir_free_bytes, #[cfg(not(coverage))] models::download_starter, #[cfg(not(coverage))] + models::download_staff_pick, + #[cfg(not(coverage))] models::download_repo_model, #[cfg(not(coverage))] models::list_hf_repo_ggufs, #[cfg(not(coverage))] + models::search_hf_models, + #[cfg(not(coverage))] models::list_openai_models, #[cfg(not(coverage))] models::cancel_model_download, @@ -2271,6 +2403,8 @@ pub fn run() { #[cfg(not(coverage))] models::delete_installed_model, #[cfg(not(coverage))] + models::reveal_model_in_finder, + #[cfg(not(coverage))] history::save_conversation, #[cfg(not(coverage))] history::persist_message, @@ -2329,6 +2463,8 @@ pub fn run() { warmup::get_loaded_model, #[cfg(not(coverage))] warmup::get_engine_status, + #[cfg(not(coverage))] + warmup::get_builtin_warm_state, updater::commands::get_updater_state, #[cfg(not(coverage))] updater::commands::check_for_update, diff --git a/src-tauri/src/models/gguf.rs b/src-tauri/src/models/gguf.rs new file mode 100644 index 00000000..174ed115 --- /dev/null +++ b/src-tauri/src/models/gguf.rs @@ -0,0 +1,532 @@ +/*! + * Minimal, panic-safe GGUF metadata reader. + * + * The reasoning classifier ([`crate::models::reasoning`]) needs a model's + * embedded chat template (`tokenizer.chat_template`) and architecture + * (`general.architecture`). Both live in the GGUF metadata key-value header at + * the very start of the file, before any tensor data, so they can be read + * straight off the downloaded blob with no engine load and no network. + * + * This reader extracts ONLY those two string values; every other value is + * skipped by computing its on-disk size and seeking past it (the giant + * tokenizer arrays are never materialized). It is deliberately forgiving: any + * malformed, truncated, or hostile input resolves to "what was found so far" + * (often `None`) rather than panicking, matching Thuki's never-panic-on-input + * contract. A miss is harmless because the runtime behavioral backstop + * self-corrects an `Always` model from its real output. + * + * Format reference: the GGUF header is `magic("GGUF") | version(u32) | + * tensor_count(u64) | metadata_kv_count(u64)`, followed by `metadata_kv_count` + * key-value pairs. A key is `len(u64) | bytes`; a value is `type(u32)` then a + * type-dependent payload. Only versions 2 and 3 (u64 counts) are accepted; the + * obsolete v1 layout (u32 counts) is rejected. + */ + +use std::io::{Read, Seek, SeekFrom}; +use std::path::Path; + +use crate::config::defaults::{MAX_GGUF_KEY_BYTES, MAX_GGUF_KV_COUNT, MAX_GGUF_STRING_BYTES}; + +/// GGUF value type tag for a UTF-8 string (`len(u64) | bytes`). +const GGUF_TYPE_STRING: u32 = 8; +/// GGUF value type tag for an array (`elem_type(u32) | count(u64) | elements`). +const GGUF_TYPE_ARRAY: u32 = 9; + +/// Metadata extracted from a GGUF header. Either field is `None` when the +/// model does not carry it (or the reader stopped before reaching it). +#[derive(Debug, Default, Clone, PartialEq, Eq)] +pub struct GgufMetadata { + /// The embedded Jinja chat template (`tokenizer.chat_template`). + pub chat_template: Option, + /// The model architecture (`general.architecture`, e.g. `qwen3`, `gpt-oss`). + pub architecture: Option, +} + +/// Reads `general.architecture` and `tokenizer.chat_template` from a GGUF +/// stream. Returns `None` only when the stream is not a GGUF the reader +/// understands (bad magic, unsupported version, or a header too short to carry +/// the counts); a stream that is a valid GGUF but is truncated or malformed +/// partway through returns `Some` with whatever was decoded before the fault. +/// +/// Generic over [`Read`] + [`Seek`] so it is driven by an in-memory +/// [`std::io::Cursor`] in tests and a [`std::io::BufReader`] over the blob +/// file in production. +pub fn read_gguf_metadata(r: &mut R) -> Option { + let mut magic = [0u8; 4]; + r.read_exact(&mut magic).ok()?; + if &magic != b"GGUF" { + return None; + } + let version = read_u32_le(r)?; + if version != 2 && version != 3 { + return None; + } + // tensor_count is not needed: the metadata KV block precedes the tensor + // info, so we never have to walk the tensors to reach the template. + let _tensor_count = read_u64_le(r)?; + let kv_count = read_u64_le(r)?; + + let mut meta = GgufMetadata::default(); + // Clamp the loop so a corrupt `metadata_kv_count` cannot drive an + // unbounded scan; real models sit far below the cap. + let limit = kv_count.min(MAX_GGUF_KV_COUNT); + for _ in 0..limit { + // Past this point every read failure is treated as "end of usable + // metadata": break and return what was decoded so far, never `?` (a + // truncation after the template was read must not discard it). + let Some(key_len) = read_u64_le(r) else { break }; + if key_len > MAX_GGUF_KEY_BYTES { + break; + } + let mut key = vec![0u8; key_len as usize]; + if r.read_exact(&mut key).is_err() { + break; + } + let Some(value_type) = read_u32_le(r) else { + break; + }; + + if value_type == GGUF_TYPE_STRING && key == b"tokenizer.chat_template" { + match read_string_value(r) { + Some(s) => meta.chat_template = Some(s), + None => break, + } + } else if value_type == GGUF_TYPE_STRING && key == b"general.architecture" { + match read_string_value(r) { + Some(s) => meta.architecture = Some(s), + None => break, + } + } else if skip_value(r, value_type).is_none() { + break; + } + + // Both targets found: no reason to walk the rest of the header. + if meta.chat_template.is_some() && meta.architecture.is_some() { + break; + } + } + Some(meta) +} + +/// Opens `path`, wraps it in a buffered reader, and extracts its GGUF +/// metadata. Returns `None` when the file cannot be opened or is not a +/// readable GGUF. Coverage-off: a thin filesystem wrapper around +/// [`read_gguf_metadata`], which carries all the tested parsing logic. +#[cfg_attr(coverage_nightly, coverage(off))] +pub fn read_gguf_metadata_from_file(path: &Path) -> Option { + let file = std::fs::File::open(path).ok()?; + let mut reader = std::io::BufReader::new(file); + read_gguf_metadata(&mut reader) +} + +/// Reads a little-endian `u32`, or `None` on a short read. +fn read_u32_le(r: &mut R) -> Option { + let mut b = [0u8; 4]; + r.read_exact(&mut b).ok()?; + Some(u32::from_le_bytes(b)) +} + +/// Reads a little-endian `u64`, or `None` on a short read. +fn read_u64_le(r: &mut R) -> Option { + let mut b = [0u8; 8]; + r.read_exact(&mut b).ok()?; + Some(u64::from_le_bytes(b)) +} + +/// Reads a GGUF string value (`len(u64) | bytes`) the reader wants to keep. +/// Refuses a length above [`MAX_GGUF_STRING_BYTES`] so a corrupt length cannot +/// force a huge allocation. Decodes lossily so a non-UTF-8 byte never drops an +/// otherwise-usable template. +fn read_string_value(r: &mut R) -> Option { + let len = read_u64_le(r)?; + if len > MAX_GGUF_STRING_BYTES { + return None; + } + let mut buf = vec![0u8; len as usize]; + r.read_exact(&mut buf).ok()?; + Some(String::from_utf8_lossy(&buf).into_owned()) +} + +/// On-disk byte size of a fixed-width GGUF scalar value type, or `None` for a +/// non-scalar (string, array) or unknown type tag. +fn scalar_size(value_type: u32) -> Option { + match value_type { + // UINT8, INT8, BOOL + 0 | 1 | 7 => Some(1), + // UINT16, INT16 + 2 | 3 => Some(2), + // UINT32, INT32, FLOAT32 + 4..=6 => Some(4), + // UINT64, INT64, FLOAT64 + 10..=12 => Some(8), + _ => None, + } +} + +/// Advances the stream past a value of `value_type` without materializing it. +/// Returns `None` on an unknown type, a malformed array, or a seek/read fault. +fn skip_value(r: &mut R, value_type: u32) -> Option<()> { + match value_type { + GGUF_TYPE_STRING => { + let len = read_u64_le(r)?; + seek_forward(r, len) + } + GGUF_TYPE_ARRAY => skip_array(r), + scalar => { + let n = scalar_size(scalar)?; + seek_forward(r, n) + } + } +} + +/// Skips an array value: `elem_type(u32) | count(u64) | elements`. A scalar +/// element array is skipped in one seek; a string element array is walked +/// element by element (each string is length-prefixed). Nested arrays and +/// unknown element types are unsupported and return `None`. +fn skip_array(r: &mut R) -> Option<()> { + let elem_type = read_u32_le(r)?; + let count = read_u64_le(r)?; + match elem_type { + GGUF_TYPE_STRING => { + for _ in 0..count { + let len = read_u64_le(r)?; + seek_forward(r, len)?; + } + Some(()) + } + GGUF_TYPE_ARRAY => None, + scalar => { + let size = scalar_size(scalar)?; + let total = size.checked_mul(count)?; + seek_forward(r, total) + } + } +} + +/// Seeks `n` bytes forward from the current position. Refuses a `n` that does +/// not fit in the seek offset type so a corrupt length cannot wrap. +fn seek_forward(r: &mut R, n: u64) -> Option<()> { + let offset = i64::try_from(n).ok()?; + r.seek(SeekFrom::Current(offset)).ok()?; + Some(()) +} + +#[cfg(test)] +mod tests { + use super::*; + use std::io::Cursor; + + // ── GGUF byte builders (mirror the on-disk layout) ─────────────────────── + + /// Encodes a GGUF string: `len(u64) | bytes`. + fn enc_string(s: &[u8]) -> Vec { + let mut v = (s.len() as u64).to_le_bytes().to_vec(); + v.extend_from_slice(s); + v + } + + /// Encodes a string-valued KV pair: `key | type(8) | value`. + fn kv_string(key: &str, value: &[u8]) -> Vec { + let mut v = enc_string(key.as_bytes()); + v.extend_from_slice(&GGUF_TYPE_STRING.to_le_bytes()); + v.extend_from_slice(&enc_string(value)); + v + } + + /// Encodes a scalar KV pair with a raw `value_type` and raw payload bytes. + fn kv_scalar(key: &str, value_type: u32, payload: &[u8]) -> Vec { + let mut v = enc_string(key.as_bytes()); + v.extend_from_slice(&value_type.to_le_bytes()); + v.extend_from_slice(payload); + v + } + + /// Encodes a `key | type(9) | elem_type | count | elements` array KV. + fn kv_array(key: &str, elem_type: u32, count: u64, elements: &[u8]) -> Vec { + let mut v = enc_string(key.as_bytes()); + v.extend_from_slice(&GGUF_TYPE_ARRAY.to_le_bytes()); + v.extend_from_slice(&elem_type.to_le_bytes()); + v.extend_from_slice(&count.to_le_bytes()); + v.extend_from_slice(elements); + v + } + + /// Assembles a full GGUF header from `version` and pre-encoded KV blobs. + fn build_gguf(version: u32, kvs: &[Vec]) -> Vec { + let mut v = b"GGUF".to_vec(); + v.extend_from_slice(&version.to_le_bytes()); + v.extend_from_slice(&0u64.to_le_bytes()); // tensor_count + v.extend_from_slice(&(kvs.len() as u64).to_le_bytes()); // metadata_kv_count + for kv in kvs { + v.extend_from_slice(kv); + } + v + } + + fn read(bytes: &[u8]) -> Option { + read_gguf_metadata(&mut Cursor::new(bytes.to_vec())) + } + + // ── Happy paths ────────────────────────────────────────────────────────── + + #[test] + fn extracts_template_and_architecture() { + let bytes = build_gguf( + 3, + &[ + kv_string("general.architecture", b"qwen3"), + kv_string("tokenizer.chat_template", b"{%- if enable_thinking %}"), + ], + ); + let meta = read(&bytes).unwrap(); + assert_eq!(meta.architecture.as_deref(), Some("qwen3")); + assert_eq!( + meta.chat_template.as_deref(), + Some("{%- if enable_thinking %}") + ); + } + + #[test] + fn version_2_is_accepted() { + let bytes = build_gguf(2, &[kv_string("tokenizer.chat_template", b"")]); + let meta = read(&bytes).unwrap(); + assert_eq!(meta.chat_template.as_deref(), Some("")); + } + + #[test] + fn skips_scalar_kv_before_target() { + let bytes = build_gguf( + 3, + &[ + kv_scalar("some.u16", 2, &7u16.to_le_bytes()), + kv_scalar("some.i16", 3, &(-3i16).to_le_bytes()), + kv_scalar("some.u32", 4, &7u32.to_le_bytes()), + kv_scalar("some.bool", 7, &[1]), + kv_scalar("some.f64", 12, &1.5f64.to_le_bytes()), + kv_string("tokenizer.chat_template", b"<|channel|>"), + ], + ); + assert_eq!( + read(&bytes).unwrap().chat_template.as_deref(), + Some("<|channel|>") + ); + } + + #[test] + fn skips_scalar_array_before_target() { + // token_type-style INT32 array: 3 elements, skipped in one seek. + let elems: Vec = [1i32, 2, 3].iter().flat_map(|n| n.to_le_bytes()).collect(); + let bytes = build_gguf( + 3, + &[ + kv_array("tokenizer.ggml.token_type", 5, 3, &elems), + kv_string("tokenizer.chat_template", b""), + ], + ); + assert_eq!( + read(&bytes).unwrap().chat_template.as_deref(), + Some("") + ); + } + + #[test] + fn skips_string_array_before_target() { + // tokens-style string array walked element by element. + let mut elems = Vec::new(); + elems.extend_from_slice(&enc_string(b"a")); + elems.extend_from_slice(&enc_string(b"bb")); + let bytes = build_gguf( + 3, + &[ + kv_array("tokenizer.ggml.tokens", GGUF_TYPE_STRING, 2, &elems), + kv_string("tokenizer.chat_template", b""), + ], + ); + assert_eq!( + read(&bytes).unwrap().chat_template.as_deref(), + Some("") + ); + } + + #[test] + fn architecture_only_is_returned_without_template() { + let bytes = build_gguf(3, &[kv_string("general.architecture", b"gemma3")]); + let meta = read(&bytes).unwrap(); + assert_eq!(meta.architecture.as_deref(), Some("gemma3")); + assert_eq!(meta.chat_template, None); + } + + #[test] + fn stops_after_both_found_ignoring_trailing_malformed_kv() { + // A nested-array KV (unsupported) AFTER both targets must not matter: + // the early-exit returns before the reader reaches it. + let bad_nested = kv_array("trailing.bad", GGUF_TYPE_ARRAY, 1, &[]); + let bytes = build_gguf( + 3, + &[ + kv_string("general.architecture", b"qwen3"), + kv_string("tokenizer.chat_template", b""), + bad_nested, + ], + ); + let meta = read(&bytes).unwrap(); + assert_eq!(meta.architecture.as_deref(), Some("qwen3")); + assert_eq!(meta.chat_template.as_deref(), Some("")); + } + + #[test] + fn lossy_decode_keeps_non_utf8_template() { + // An invalid UTF-8 byte (0xff) is replaced, not dropped. + let bytes = build_gguf(3, &[kv_string("tokenizer.chat_template", b"\xff")]); + let template = read(&bytes).unwrap().chat_template.unwrap(); + assert!(template.starts_with("")); + } + + // ── Header rejections (return None) ────────────────────────────────────── + + #[test] + fn bad_magic_is_none() { + assert_eq!(read(b"NOPExxxxxxxxxxxxxxxxxxxx"), None); + } + + #[test] + fn unsupported_version_is_none() { + let bytes = build_gguf(1, &[kv_string("tokenizer.chat_template", b"")]); + assert_eq!(read(&bytes), None); + } + + #[test] + fn truncated_before_counts_is_none() { + // "GGUF" + version only, no tensor/kv counts. + let mut bytes = b"GGUF".to_vec(); + bytes.extend_from_slice(&3u32.to_le_bytes()); + assert_eq!(read(&bytes), None); + } + + // ── Mid-scan faults (return partial Some) ──────────────────────────────── + + #[test] + fn claimed_kv_but_no_body_returns_empty() { + // metadata_kv_count says 1 but the stream ends right after the counts. + let mut bytes = b"GGUF".to_vec(); + bytes.extend_from_slice(&3u32.to_le_bytes()); + bytes.extend_from_slice(&0u64.to_le_bytes()); // tensor_count + bytes.extend_from_slice(&1u64.to_le_bytes()); // kv_count = 1, but no KV follows + assert_eq!(read(&bytes), Some(GgufMetadata::default())); + } + + #[test] + fn oversized_key_length_stops_scan() { + let mut huge_key = (MAX_GGUF_KEY_BYTES + 1).to_le_bytes().to_vec(); + huge_key.extend_from_slice(&0u32.to_le_bytes()); // a stray type, never reached + let bytes = build_gguf(3, &[huge_key]); + assert_eq!(read(&bytes), Some(GgufMetadata::default())); + } + + #[test] + fn truncated_key_bytes_stops_scan() { + // key_len claims 10 bytes but only 2 follow. + let mut kv = 10u64.to_le_bytes().to_vec(); + kv.extend_from_slice(b"ab"); + let bytes = build_gguf(3, &[kv]); + assert_eq!(read(&bytes), Some(GgufMetadata::default())); + } + + #[test] + fn truncated_before_value_type_stops_scan() { + // A complete key but the stream ends before the value type u32. + let kv = enc_string(b"general.architecture"); + let bytes = build_gguf(3, &[kv]); + assert_eq!(read(&bytes), Some(GgufMetadata::default())); + } + + #[test] + fn target_string_value_too_large_stops_scan() { + let mut kv = enc_string(b"tokenizer.chat_template"); + kv.extend_from_slice(&GGUF_TYPE_STRING.to_le_bytes()); + kv.extend_from_slice(&(MAX_GGUF_STRING_BYTES + 1).to_le_bytes()); + let bytes = build_gguf(3, &[kv]); + assert_eq!(read(&bytes), Some(GgufMetadata::default())); + } + + #[test] + fn target_string_value_truncated_stops_scan() { + // Architecture value claims 20 bytes but only 3 are present. + let mut kv = enc_string(b"general.architecture"); + kv.extend_from_slice(&GGUF_TYPE_STRING.to_le_bytes()); + kv.extend_from_slice(&20u64.to_le_bytes()); + kv.extend_from_slice(b"abc"); + let bytes = build_gguf(3, &[kv]); + assert_eq!(read(&bytes), Some(GgufMetadata::default())); + } + + #[test] + fn unknown_value_type_stops_scan() { + // Value type 99 is not a real GGUF type: the skip fails and the scan + // stops, but a target read before it is still returned. + let bytes = build_gguf( + 3, + &[ + kv_string("tokenizer.chat_template", b""), + kv_scalar("weird", 99, &[0, 0, 0, 0]), + ], + ); + // chat_template was read first, then early-exit never triggers (arch + // missing) so the unknown type is reached and stops the scan; the + // template is preserved. + let meta = read(&bytes).unwrap(); + assert_eq!(meta.chat_template.as_deref(), Some("")); + } + + #[test] + fn nested_array_element_stops_scan() { + // An array whose elements are themselves arrays is unsupported. + let bytes = build_gguf(3, &[kv_array("bad.nested", GGUF_TYPE_ARRAY, 1, &[])]); + assert_eq!(read(&bytes), Some(GgufMetadata::default())); + } + + #[test] + fn array_count_overflow_stops_scan() { + // count * elem_size overflows u64; the checked multiply bails. + let mut kv = enc_string(b"bad.overflow"); + kv.extend_from_slice(&GGUF_TYPE_ARRAY.to_le_bytes()); + kv.extend_from_slice(&12u32.to_le_bytes()); // FLOAT64, size 8 + kv.extend_from_slice(&u64::MAX.to_le_bytes()); // count + let bytes = build_gguf(3, &[kv]); + assert_eq!(read(&bytes), Some(GgufMetadata::default())); + } + + #[test] + fn skip_string_value_advances_to_next_kv() { + // A non-target string KV is skipped (not kept), then the target read. + let bytes = build_gguf( + 3, + &[ + kv_string("general.name", b"Some Model"), + kv_string("tokenizer.chat_template", b""), + ], + ); + let meta = read(&bytes).unwrap(); + assert_eq!(meta.chat_template.as_deref(), Some("")); + } + + #[test] + fn file_wrapper_reads_a_written_gguf() { + let dir = std::env::temp_dir().join(format!("thuki-gguf-test-{}", std::process::id())); + std::fs::create_dir_all(&dir).unwrap(); + let path = dir.join("model.gguf"); + let bytes = build_gguf(3, &[kv_string("tokenizer.chat_template", b"<|channel|>")]); + std::fs::write(&path, &bytes).unwrap(); + + let meta = read_gguf_metadata_from_file(&path).unwrap(); + assert_eq!(meta.chat_template.as_deref(), Some("<|channel|>")); + + std::fs::remove_dir_all(&dir).ok(); + } + + #[test] + fn file_wrapper_missing_file_is_none() { + let path = std::env::temp_dir().join("thuki-gguf-does-not-exist.gguf"); + assert_eq!(read_gguf_metadata_from_file(&path), None); + } +} diff --git a/src-tauri/src/models/manifest.rs b/src-tauri/src/models/manifest.rs index 94c0290d..9b4fbc7b 100644 --- a/src-tauri/src/models/manifest.rs +++ b/src-tauri/src/models/manifest.rs @@ -39,6 +39,11 @@ pub struct InstalledModel { pub vision: bool, /// Whether the model exposes a thinking/scratchpad token stream. pub thinking: bool, + /// Whether the model's reasoning cannot be turned off (it always reasons). + /// Set by the reasoning classifier at install (and corrected by the runtime + /// backstop). For rows written before the column existed the stored value + /// is `NULL`, read here as `false` and re-classified by the startup heal. + pub reasoning_always: bool, /// Filename of the vision projection blob, if any. pub mmproj_file: Option, /// SHA-256 hex digest of the mmproj blob, if any. @@ -79,8 +84,8 @@ pub fn insert(conn: &Connection, model: &InstalledModel) -> SqlResult SqlResult SqlResult SqlResult> { let mut stmt = conn.prepare( "SELECT id, display_name, repo, revision, file_name, sha256, \ - size_bytes, quant, vision, thinking, mmproj_file, mmproj_sha256 \ + size_bytes, quant, vision, thinking, mmproj_file, mmproj_sha256, \ + reasoning_always \ FROM installed_models ORDER BY display_name", )?; let rows = stmt.query_map([], row_to_model)?; rows.collect() } +/// Returns the installed models whose `reasoning_always` is `NULL`: rows +/// written before the column existed, never touched by the classifier. The +/// startup heal re-classifies each from its local blob (or the registry for a +/// curated row) and persists the result via [`update_classification`], so a +/// subsequent call returns an empty list. +pub fn list_unclassified(conn: &Connection) -> SqlResult> { + let mut stmt = conn.prepare( + "SELECT id, display_name, repo, revision, file_name, sha256, \ + size_bytes, quant, vision, thinking, mmproj_file, mmproj_sha256, \ + reasoning_always \ + FROM installed_models WHERE reasoning_always IS NULL ORDER BY display_name", + )?; + let rows = stmt.query_map([], row_to_model)?; + rows.collect() +} + +/// Persists a reasoning classification onto an existing row: sets both +/// `thinking` and `reasoning_always`. Used by the startup heal to populate a +/// previously-`NULL` row. A no-op (zero rows changed) when `id` is absent. +pub fn update_classification( + conn: &Connection, + id: &str, + thinking: bool, + reasoning_always: bool, +) -> SqlResult<()> { + conn.execute( + "UPDATE installed_models SET thinking = ?2, reasoning_always = ?3 WHERE id = ?1", + params![id, thinking as i32, reasoning_always as i32], + )?; + Ok(()) +} + +/// Marks a model as always-reasoning from observed runtime behavior (the +/// backstop saw reasoning stream while reasoning was requested off). Forces +/// both `reasoning_always` and `thinking` true, since a model that always +/// reasons necessarily emits thinking tokens. Idempotent. +pub fn mark_reasoning_always(conn: &Connection, id: &str) -> SqlResult<()> { + conn.execute( + "UPDATE installed_models SET reasoning_always = 1, thinking = 1 WHERE id = ?1", + params![id], + )?; + Ok(()) +} + /// Returns the model with the given `id`, or `None` if not present. /// /// # Errors @@ -145,7 +196,8 @@ pub fn list(conn: &Connection) -> SqlResult> { pub fn get(conn: &Connection, id: &str) -> SqlResult> { conn.query_row( "SELECT id, display_name, repo, revision, file_name, sha256, \ - size_bytes, quant, vision, thinking, mmproj_file, mmproj_sha256 \ + size_bytes, quant, vision, thinking, mmproj_file, mmproj_sha256, \ + reasoning_always \ FROM installed_models WHERE id = ?1", params![id], row_to_model, @@ -227,6 +279,12 @@ fn row_to_model(row: &rusqlite::Row<'_>) -> SqlResult { thinking: row.get::<_, i32>(9)? != 0, mmproj_file: row.get(10)?, mmproj_sha256: row.get(11)?, + // NULL (a pre-column row) reads as `false`; the startup heal then + // re-classifies it. A stored 0/1 is the classifier's verdict. + reasoning_always: row + .get::<_, Option>(12)? + .map(|v| v != 0) + .unwrap_or(false), }) } @@ -249,6 +307,7 @@ mod tests { quant: "Q4_K_M".to_string(), vision: false, thinking: false, + reasoning_always: false, mmproj_file: None, mmproj_sha256: None, } @@ -479,6 +538,122 @@ mod tests { assert!(found.thinking); } + #[test] + fn reasoning_always_flag_roundtrips() { + let conn = open_in_memory().unwrap(); + let m = InstalledModel { + thinking: true, + reasoning_always: true, + ..make_model("org/repo:ra.gguf", "sha_ra") + }; + insert(&conn, &m).unwrap(); + let found = get(&conn, "org/repo:ra.gguf").unwrap().unwrap(); + assert!(found.reasoning_always); + } + + #[test] + fn fresh_insert_is_not_unclassified() { + // insert always writes a non-NULL reasoning_always, so a freshly + // installed model is never picked up by the heal. + let conn = open_in_memory().unwrap(); + insert(&conn, &make_model("org/repo:fresh.gguf", "sha_f")).unwrap(); + assert!(list_unclassified(&conn).unwrap().is_empty()); + } + + /// Forces a row's `reasoning_always` back to NULL to simulate a row written + /// before the column existed. + fn null_out_reasoning(conn: &Connection, id: &str) { + conn.execute( + "UPDATE installed_models SET reasoning_always = NULL WHERE id = ?1", + params![id], + ) + .unwrap(); + } + + #[test] + fn null_reasoning_row_is_unclassified_and_reads_false() { + let conn = open_in_memory().unwrap(); + let m = InstalledModel { + reasoning_always: true, + ..make_model("org/repo:legacy.gguf", "sha_l") + }; + insert(&conn, &m).unwrap(); + null_out_reasoning(&conn, "org/repo:legacy.gguf"); + + // NULL reads as false through row_to_model. + let found = get(&conn, "org/repo:legacy.gguf").unwrap().unwrap(); + assert!(!found.reasoning_always); + + // ...and the row surfaces in the heal list. + let pending = list_unclassified(&conn).unwrap(); + assert_eq!(pending.len(), 1); + assert_eq!(pending[0].id, "org/repo:legacy.gguf"); + } + + #[test] + fn update_classification_persists_and_clears_unclassified() { + let conn = open_in_memory().unwrap(); + insert(&conn, &make_model("org/repo:u.gguf", "sha_u")).unwrap(); + null_out_reasoning(&conn, "org/repo:u.gguf"); + + update_classification(&conn, "org/repo:u.gguf", true, true).unwrap(); + + let found = get(&conn, "org/repo:u.gguf").unwrap().unwrap(); + assert!(found.thinking); + assert!(found.reasoning_always); + assert!(list_unclassified(&conn).unwrap().is_empty()); + } + + #[test] + fn update_classification_can_set_none_class() { + let conn = open_in_memory().unwrap(); + let m = InstalledModel { + thinking: true, + ..make_model("org/repo:n.gguf", "sha_n") + }; + insert(&conn, &m).unwrap(); + null_out_reasoning(&conn, "org/repo:n.gguf"); + + update_classification(&conn, "org/repo:n.gguf", false, false).unwrap(); + let found = get(&conn, "org/repo:n.gguf").unwrap().unwrap(); + assert!(!found.thinking); + assert!(!found.reasoning_always); + // No longer NULL, so cleared from the heal list. + assert!(list_unclassified(&conn).unwrap().is_empty()); + } + + #[test] + fn mark_reasoning_always_forces_both_flags() { + let conn = open_in_memory().unwrap(); + insert(&conn, &make_model("org/repo:b.gguf", "sha_b")).unwrap(); + + mark_reasoning_always(&conn, "org/repo:b.gguf").unwrap(); + let found = get(&conn, "org/repo:b.gguf").unwrap().unwrap(); + assert!(found.reasoning_always); + assert!(found.thinking); + } + + #[test] + fn list_unclassified_propagates_sql_error_when_table_absent() { + let conn = open_in_memory().unwrap(); + conn.execute_batch("DROP TABLE installed_models;").unwrap(); + assert!(list_unclassified(&conn).is_err()); + } + + #[test] + fn update_classification_propagates_sql_error_when_table_absent() { + let conn = open_in_memory().unwrap(); + conn.execute_batch("DROP TABLE installed_models;").unwrap(); + assert!(update_classification(&conn, "x:y.gguf", true, true).is_err()); + } + + #[test] + fn mark_reasoning_always_propagates_sql_error_when_table_absent() { + let conn = open_in_memory().unwrap(); + conn.execute_batch("DROP TABLE installed_models;").unwrap(); + assert!(mark_reasoning_always(&conn, "x:y.gguf").is_err()); + } + #[test] fn size_bytes_roundtrip_large_value() { let conn = open_in_memory().unwrap(); diff --git a/src-tauri/src/models/mod.rs b/src-tauri/src/models/mod.rs index 316251bd..aecd40fe 100644 --- a/src-tauri/src/models/mod.rs +++ b/src-tauri/src/models/mod.rs @@ -16,7 +16,9 @@ */ pub mod download; +pub mod gguf; pub mod manifest; +pub mod reasoning; pub mod registry; pub mod storage; @@ -25,13 +27,15 @@ use std::sync::Mutex; use futures_util::StreamExt; use serde::{Deserialize, Serialize}; -use tauri::Manager; +use tauri::{Emitter, Manager}; use crate::config::defaults::{ DEFAULT_OLLAMA_SHOW_REQUEST_TIMEOUT_SECS, DEFAULT_OLLAMA_TAGS_REQUEST_TIMEOUT_SECS, - HF_API_TIMEOUT_SECS, HF_BASE_URL, MAX_HF_API_BODY_BYTES, MAX_MODEL_SLUG_LEN, + HF_API_TIMEOUT_SECS, HF_BASE_URL, HF_SEARCH_LIMIT_MAX, MAX_HF_API_BODY_BYTES, + MAX_HF_SEARCH_QUERY_LEN, MAX_MODEL_CONTEXT_LENGTH, MAX_MODEL_SLUG_LEN, MAX_OLLAMA_SHOW_BODY_BYTES, MAX_OLLAMA_TAGS_BODY_BYTES, OPENAI_MODELS_TIMEOUT_SECS, PROVIDER_ID_BUILTIN, PROVIDER_KIND_BUILTIN, PROVIDER_KIND_OLLAMA, PROVIDER_KIND_OPENAI, + RUNTIME_OVERHEAD_GB, }; use crate::config::AppConfig; @@ -394,6 +398,13 @@ fn persist_active_provider_model( let mut guard = active.0.lock().map_err(|e| e.to_string())?; *guard = mirror; } + // Broadcast the same config-change event every settings_commands writer + // emits, so the other webview (the overlay's picker, or the Settings panel) + // resyncs live. set_active_model is otherwise the only model-write path + // that left other windows stale; this also covers finalize_install's + // auto-select and the delete-clear path. The listeners refresh via the + // read-only get_config, never reload_config_from_disk, so this cannot loop. + let _ = app.emit(crate::settings_commands::CONFIG_UPDATED_EVENT, ()); Ok(()) } @@ -733,6 +744,12 @@ pub struct Capabilities { /// ThinkingBlock UI. #[serde(default)] pub thinking: bool, + /// Reasoning is structural and cannot be turned off (gpt-oss/Harmony, + /// DeepSeek-R1, QwQ, ...). Thuki still shows such a model's reasoning + /// cleanly and marks it in the picker so the user is not surprised by the + /// latency. `false` when reasoning is optional (off by default) or absent. + #[serde(default)] + pub reasoning_always: bool, /// Maximum number of images the model accepts in a single request, when /// known. `None` means "unknown / unbounded by Thuki" and the gate lets /// the request through. Today this is keyed off the model architecture @@ -987,20 +1004,33 @@ pub async fn get_model_capabilities( } } -/// Capability map for the built-in provider, derived from the installed-model -/// manifest. Each row carries the curated vision/thinking flags recorded at -/// download time; `max_images` stays `None` because llama-server imposes no -/// fixed per-request image cap. +/// Capability map for the built-in provider. For a curated starter the flags +/// come from the current registry, not the manifest row: the row freezes the +/// flags recorded at download time, so a later flag correction (e.g. a +/// reasoning model previously recorded as non-thinking) would otherwise stay +/// wrong for already-installed models. Reading the registry heals those rows on +/// every read with no manifest migration. A pasted (non-curated) repo has no +/// registry entry and keeps the flags its row recorded. `max_images` stays +/// `None` because llama-server imposes no fixed per-request image cap. pub(crate) fn builtin_capabilities_from_manifest( rows: &[manifest::InstalledModel], ) -> HashMap { rows.iter() .map(|row| { + // Curated starters heal `vision`/`thinking`/`reasoning_always` from + // the registry (highest confidence). A pasted repo has no registry + // entry and keeps its row's classified flags: the install-time GGUF + // classifier populates them, and the runtime backstop corrects them. + let (vision, thinking, reasoning_always) = + registry::by_repo_file(&row.repo, &row.file_name) + .map(|s| (s.vision, s.thinking, s.reasoning_always)) + .unwrap_or((row.vision, row.thinking, row.reasoning_always)); ( row.id.clone(), Capabilities { - vision: row.vision, - thinking: row.thinking, + vision, + thinking, + reasoning_always, max_images: None, }, ) @@ -1022,6 +1052,7 @@ pub(crate) fn openai_capabilities(model: &str, vision: bool) -> HashMap, +} + #[derive(Default)] -pub struct DownloadState(pub std::sync::Mutex>); +pub struct DownloadState(pub std::sync::Mutex>); -/// Atomically claims the single download slot. Returns a fresh cancellation -/// token on success; an error when another download already holds the slot -/// (or the lock is poisoned). +/// Atomically claims a download slot for `key`, recording the blob `shas` it +/// will write. Returns a fresh cancellation token on success; an error when +/// `key` already has an in-flight download (or the lock is poisoned). pub fn claim_download( state: &DownloadState, + key: &str, + shas: Vec, ) -> Result { let mut guard = state.0.lock().map_err(|e| e.to_string())?; - if guard.is_some() { + if guard.contains_key(key) { return Err("a download is already in progress".to_string()); } let token = tokio_util::sync::CancellationToken::new(); - *guard = Some(token.clone()); + guard.insert( + key.to_string(), + DownloadSlot { + token: token.clone(), + shas, + }, + ); Ok(token) } -/// Clears the download slot. Best-effort: a poisoned lock is ignored because -/// release runs on the task teardown path where there is nothing left to do. -pub fn release_download(state: &DownloadState) { +/// Releases the slot held by `key`. Best-effort: a poisoned lock is ignored +/// because release runs on the task teardown path where there is nothing left +/// to do. +pub fn release_download(state: &DownloadState, key: &str) { if let Ok(mut guard) = state.0.lock() { - *guard = None; + guard.remove(key); } } -/// True while a model download holds the slot. Read before quitting so the app -/// can warn that quitting discards the in-flight download. +/// True while any model download holds a slot. Read before quitting so the app +/// can warn that quitting discards the in-flight download(s). pub fn download_in_flight(state: &DownloadState) -> bool { - state.0.lock().map(|guard| guard.is_some()).unwrap_or(false) + state + .0 + .lock() + .map(|guard| !guard.is_empty()) + .unwrap_or(false) } -/// Cancels the in-flight download's token, if one is claimed. Does NOT clear -/// the slot: the download task notices the cancellation, emits `Cancelled`, -/// and releases the slot itself. -pub fn cancel_active_download(state: &DownloadState) { +/// Cancels the download held by `key`, if one is in flight. Does NOT remove the +/// slot: the download task notices the cancellation, emits `Cancelled`, and +/// releases its own slot. A missing key is a harmless no-op. +pub fn cancel_download(state: &DownloadState, key: &str) { if let Ok(guard) = state.0.lock() { - if let Some(token) = guard.as_ref() { - token.cancel(); + if let Some(slot) = guard.get(key) { + slot.token.cancel(); } } } @@ -1168,42 +1229,73 @@ pub struct StarterOption { pub partial_bytes: Option, } -/// Builds the starter picker rows from the manifest, the blob store's partial -/// slots, and the machine's RAM. A manifest read error degrades to "not -/// installed" rather than failing the whole picker. +/// Annotates one registry entry with the machine-specific facts the picker +/// renders next to it: RAM fit, installed state, and resumable-partial size. A +/// manifest read error degrades to "not installed" rather than failing the row. +fn annotate_starter( + s: ®istry::Starter, + conn: &rusqlite::Connection, + store: &storage::ModelStore, + ram_bytes: u64, +) -> StarterOption { + StarterOption { + starter: s.clone(), + fit: registry::ram_fit(s.est_runtime_gb, ram_bytes), + installed: matches!( + manifest::get(conn, ®istry::installed_model_id(s)), + Ok(Some(_)) + ), + partial_bytes: store.existing_partial_len(s.sha256), + } +} + +/// The onboarding starter picker rows: exactly the three tier heroes, annotated +/// for this machine. Onboarding's 3-up comparison is fixed at one model per +/// tier, so it draws only the heroes even as the Staff Picks catalog grows. pub fn build_starter_options( conn: &rusqlite::Connection, store: &storage::ModelStore, ram_bytes: u64, +) -> Vec { + registry::onboarding_heroes() + .into_iter() + .map(|s| annotate_starter(s, conn, store, ram_bytes)) + .collect() +} + +/// The full Staff Picks catalog: every curated registry entry annotated for +/// this machine. The frontend groups the rows by `starter.category`; unlike +/// [`build_starter_options`] this is not capped at one model per tier. +pub fn build_staff_picks( + conn: &rusqlite::Connection, + store: &storage::ModelStore, + ram_bytes: u64, ) -> Vec { registry::STARTERS .iter() - .map(|s| StarterOption { - starter: s.clone(), - fit: registry::ram_fit(s.est_runtime_gb, ram_bytes), - installed: matches!( - manifest::get(conn, ®istry::to_installed_model(s).id), - Ok(Some(_)) - ), - partial_bytes: store.existing_partial_len(s.sha256), - }) + .map(|s| annotate_starter(s, conn, store, ram_bytes)) .collect() } -/// Maps a frontend tier string (`"fast" | "balanced" | "smartest"`) onto its -/// curated starter. Every [`registry::Tier`] has exactly one `STARTERS` -/// entry (asserted by registry tests), so the lookup is total. +/// Maps a Staff Picks `id` onto its curated registry entry. An unknown id +/// yields an error rather than a panic, so a stale frontend id can never crash +/// the download path. +pub fn starter_for_id(id: &str) -> Result<&'static registry::Starter, String> { + registry::by_id(id).ok_or_else(|| format!("unknown staff pick id: {id}")) +} + +/// Maps a frontend tier string (`"fast" | "balanced" | "smartest"`) onto the +/// onboarding hero for that tier. The hero is resolved by id from +/// [`registry::ONBOARDING_HERO_IDS`], so adding more models of the same tier to +/// the Staff Picks catalog never changes which model onboarding downloads. pub fn starter_for_tier(tier: &str) -> Result<&'static registry::Starter, String> { - let tier = match tier { - "fast" => registry::Tier::Fast, - "balanced" => registry::Tier::Balanced, - "smartest" => registry::Tier::Smartest, + let idx = match tier { + "fast" => 0, + "balanced" => 1, + "smartest" => 2, other => return Err(format!("unknown starter tier: {other}")), }; - Ok(registry::STARTERS - .iter() - .find(|s| s.tier == tier) - .expect("every tier has a starter")) + starter_for_id(registry::ONBOARDING_HERO_IDS[idx]) } /// The builtin provider's currently configured model id (empty when none). @@ -1252,6 +1344,35 @@ pub fn quant_from_filename(file: &str) -> String { .unwrap_or_default() } +/// Marker substrings that flag a GGUF model as emitting explicit reasoning +/// tokens (rendered in the ThinkingBlock UI). There is no machine-readable +/// thinking signal in GGUF metadata or the Hugging Face API, so detection reads +/// the publisher's own naming: an explicit reasoning self-label +/// (`thinking`/`reasoning`/`reasoner`) or a known reasoning-first family. The +/// list is kept narrow to avoid false positives; curated starters set the flag +/// explicitly in the registry and never consult it, and a user override is the +/// authority whenever the guess is wrong. +const THINKING_MARKERS: &[&str] = &[ + "thinking", + "reasoning", + "reasoner", + "deepseek-r1", + "qwq", + "gpt-oss", + "magistral", +]; + +/// Best-effort detection of whether an arbitrary GGUF model is a reasoning +/// model, matching [`THINKING_MARKERS`] case-insensitively against both the +/// repo id and the file name. Returns `false` when nothing matches. +pub fn detect_thinking(repo: &str, file: &str) -> bool { + let repo = repo.to_ascii_lowercase(); + let file = file.to_ascii_lowercase(); + THINKING_MARKERS + .iter() + .any(|marker| repo.contains(marker) || file.contains(marker)) +} + /// A `.gguf` entry in a Hugging Face repo listing, for the paste-a-repo UI. #[derive(Debug, Clone, PartialEq, Serialize)] pub struct HfGgufFile { @@ -1259,6 +1380,12 @@ pub struct HfGgufFile { pub file: String, /// File size in bytes; 0 when the API reports no size. pub size_bytes: u64, + /// LFS content digest: the blob key used to resume or discard the partial. + /// Empty when the repo file is not LFS-backed (rare for GGUF weights). + pub sha256: String, + /// Length of an interrupted partial for this file on disk, or `None` when + /// there is none. Drives the Browse-all Paused / Resume / Discard row. + pub partial_bytes: Option, } /// Subset of the HF `/api/models/?blobs=true` response Thuki consumes. @@ -1272,6 +1399,34 @@ struct HfRepoInfo { siblings: Vec, } +/// The slice of HF's parsed `gguf` metadata block Thuki reads. Present on a +/// search row when the query requests `expand[]=gguf`. Untrusted external input: +/// the context window is sanitized before use, and the chat template is only fed +/// to the never-panicking [`reasoning::classify_reasoning`] classifier. +#[derive(Deserialize)] +struct HfGgufMeta { + #[serde(default)] + context_length: Option, + /// The model's embedded chat template, the highest-signal reasoning class + /// source. Already carried by `expand[]=gguf` (the same block that holds the + /// context window), so reading it costs no extra request. + #[serde(default)] + chat_template: Option, + /// `general.architecture`, a secondary reasoning signal (e.g. gpt-oss is + /// always-on even when its template omits the channel marker). + #[serde(default)] + architecture: Option, +} + +/// Trust check for an externally-reported context window. Accepts a positive +/// value no larger than [`MAX_MODEL_CONTEXT_LENGTH`] and narrows it to `u32`; +/// anything missing, zero, or implausibly large is dropped to `None` so a +/// hand-edited or malicious GGUF cannot inject an absurd figure into the UI. +pub fn sanitize_context_length(raw: Option) -> Option { + raw.filter(|&n| n >= 1 && n <= MAX_MODEL_CONTEXT_LENGTH as u64) + .map(|n| n as u32) +} + /// One repo file in the HF listing. Only LFS-backed `.gguf` files matter. #[derive(Deserialize)] struct HfSibling { @@ -1321,9 +1476,20 @@ pub struct MmprojCompanion { pub size_bytes: u64, } +/// True when `name` is an `mmproj*.gguf` vision projection companion. The +/// presence of one is Thuki's ground-truth vision signal: llama.cpp cannot do +/// image input without it, regardless of how the base model is tagged. +fn is_mmproj(name: &str) -> bool { + name.starts_with("mmproj") && name.ends_with(".gguf") +} + /// Pure parse of an HF repo listing into the spec for one target `file`. /// Capability rule for pasted repos: vision = an `mmproj*.gguf` sibling with -/// complete LFS metadata exists; thinking = false (full detection is not yet implemented). +/// complete LFS metadata exists. The reasoning class is recorded in two stages: +/// [`repo_installed_model`] seeds `thinking` from the model name via +/// [`detect_thinking`], then `finalize_install` refines `thinking` and sets +/// `reasoning_always` from the downloaded GGUF's chat template (falling back to +/// the name guess when the template cannot be read). pub fn resolve_listing(body: &[u8], file: &str) -> Result { let info: HfRepoInfo = serde_json::from_slice(body) .map_err(|e| format!("failed to decode Hugging Face API response: {e}"))?; @@ -1345,7 +1511,7 @@ pub fn resolve_listing(body: &[u8], file: &str) -> Result let mmproj = info .siblings .iter() - .filter(|s| s.rfilename.starts_with("mmproj") && s.rfilename.ends_with(".gguf")) + .filter(|s| is_mmproj(&s.rfilename)) .find_map(|s| { lfs_digest(s).map(|(sha256, size_bytes)| MmprojCompanion { file: s.rfilename.clone(), @@ -1370,12 +1536,19 @@ pub fn parse_gguf_listing(body: &[u8]) -> Result, String> { Ok(info .siblings .into_iter() - .filter(|s| s.rfilename.ends_with(".gguf") && !s.rfilename.starts_with("mmproj")) + .filter(|s| s.rfilename.ends_with(".gguf") && !is_mmproj(&s.rfilename)) .map(|s| { let size_bytes = s.lfs.as_ref().and_then(|l| l.size).or(s.size).unwrap_or(0); + let sha256 = s + .lfs + .as_ref() + .and_then(|l| l.sha256.clone()) + .unwrap_or_default(); HfGgufFile { file: s.rfilename, size_bytes, + sha256, + partial_bytes: None, } }) .collect()) @@ -1481,6 +1654,408 @@ pub async fn fetch_repo_gguf_listing( parse_gguf_listing(&body) } +// ─── Hugging Face model search ─────────────────────────────────────────────── + +/// One repo row from a Hugging Face model search, trimmed to the fields the +/// in-app browser needs to identify, rank, gate, and label a model. +#[derive(Debug, Clone, PartialEq, Serialize)] +pub struct HfModelSummary { + /// Repo id, e.g. `unsloth/Qwen3.5-9B-GGUF`; the install target. + pub id: String, + /// Lifetime download count. The search is sorted by it and the UI shows it + /// as a trust signal; `0` when the API omits the field. + pub downloads: u64, + /// True when the repo is access-gated (license click-through or manual + /// approval). Gated repos cannot be fetched anonymously, so the UI can flag + /// them instead of offering a download that would fail. + pub gated: bool, + /// Model's trained context window in tokens, from the repo's parsed GGUF + /// `context_length` metadata (a per-repo property, identical across quants). + /// `None` when the API omits it or the value fails [`sanitize_context_length`]. + pub context_length: Option, + /// True when the repo ships an `mmproj*.gguf` vision companion (see + /// [`is_mmproj`]). A capability of the model, shared by every quant, so the + /// pill belongs on the repo row, not the per-quant list. + pub vision: bool, + /// True when the model emits reasoning tokens, from its chat template via + /// [`reasoning::classify_reasoning`], or the repo name via [`detect_thinking`] + /// when the template is absent. Also a per-repo capability. + pub thinking: bool, +} + +/// A page of search rows plus whether the Hub holds more. The flag is derived +/// from the raw entry count, not the kept-row count, so the per-row pipeline +/// allowlist (which drops non-chat repos) cannot prematurely end pagination. +#[derive(Debug, Clone, PartialEq, Serialize)] +pub struct HfSearchPage { + pub rows: Vec, + pub has_more: bool, +} + +/// HF `pipeline_tag`s Thuki surfaces in Browse-all: plain text chat and +/// multimodal (image+text) chat. Every other tag (embeddings, translation, +/// text-to-video, ...) is not a usable chat model and is dropped. This is an +/// allowlist applied per row, replacing a single server-side `pipeline_tag` +/// filter that could not express "text OR image-text" and so hid vision repos. +const SEARCHABLE_PIPELINE_TAGS: &[&str] = &["text-generation", "image-text-to-text"]; + +/// One entry in the Hugging Face `/api/models` search response. Only the fields +/// surfaced by [`HfModelSummary`] (and the `pipeline_tag` allowlist gate) are +/// decoded; everything else is ignored so upstream additions cannot break it. +#[derive(Deserialize)] +struct HfSearchEntry { + #[serde(default)] + id: String, + #[serde(default)] + downloads: u64, + /// HF reports `gated` as `false` or a strategy string (`"auto"`/`"manual"`); + /// [`deserialize_gated`] normalizes it to a bool. Absent on some rows, so it + /// defaults to `false`. + #[serde(default, deserialize_with = "deserialize_gated")] + gated: bool, + /// HF pipeline tag, present because the search requests `expand[]=pipeline_tag`. + /// Gated against [`SEARCHABLE_PIPELINE_TAGS`]; an absent tag drops the row. + #[serde(default)] + pipeline_tag: Option, + /// HF-parsed GGUF metadata, present because the search requests + /// `expand[]=gguf`: the context window and the chat template / architecture. + #[serde(default)] + gguf: Option, + /// Repo file listing, present because the search requests `expand[]=siblings`. + /// Scanned for an `mmproj*.gguf` companion to derive the vision flag. + #[serde(default)] + siblings: Vec, +} + +/// Projects one raw search entry onto a summary row, or `None` when the row is +/// not a usable chat model (empty id, or a `pipeline_tag` outside +/// [`SEARCHABLE_PIPELINE_TAGS`]). +fn search_entry_to_summary(entry: HfSearchEntry) -> Option { + let HfSearchEntry { + id, + downloads, + gated, + pipeline_tag, + gguf, + siblings, + } = entry; + if id.is_empty() { + return None; + } + if !pipeline_tag + .as_deref() + .is_some_and(|tag| SEARCHABLE_PIPELINE_TAGS.contains(&tag)) + { + return None; + } + let vision = siblings.iter().any(|s| is_mmproj(&s.rfilename)); + // Reasoning runs through the one shared derivation so a search row and the + // install it leads to can never disagree. A search row has no chosen file, + // so the name fallback (used only when no template ships) sees the repo only. + let chat_template = gguf.as_ref().and_then(|g| g.chat_template.as_deref()); + let architecture = gguf.as_ref().and_then(|g| g.architecture.as_deref()); + let (thinking, _) = reasoning_flags_from_metadata(chat_template, architecture, &id, ""); + let context_length = sanitize_context_length(gguf.and_then(|g| g.context_length)); + Some(HfModelSummary { + id, + downloads, + gated, + context_length, + vision, + thinking, + }) +} + +/// Normalizes Hugging Face's polymorphic `gated` field (a bool `false` or a +/// strategy string like `"manual"`) into a plain bool: any string means gated, +/// `true` means gated, everything else (including `null`) means not gated. +fn deserialize_gated<'de, D>(deserializer: D) -> Result +where + D: serde::Deserializer<'de>, +{ + Ok(match serde_json::Value::deserialize(deserializer)? { + serde_json::Value::Bool(b) => b, + serde_json::Value::String(_) => true, + _ => false, + }) +} + +/// Pure parse of an `/api/models` search body into a page of summary rows. +/// Non-chat and empty-id rows are dropped per [`search_entry_to_summary`]; +/// `has_more` is set from the raw entry count against `limit` so dropped rows +/// never cut pagination short, and is forced `false` once `limit` reaches the +/// [`HF_SEARCH_LIMIT_MAX`] ceiling: requests are clamped to that ceiling, so a +/// full page there would refetch the same capped rows forever. "Load more" +/// stops at the ceiling instead. +pub fn parse_search_results(body: &[u8], limit: usize) -> Result { + let entries: Vec = serde_json::from_slice(body) + .map_err(|e| format!("failed to decode Hugging Face search response: {e}"))?; + let has_more = entries.len() >= limit && limit < HF_SEARCH_LIMIT_MAX; + let rows = entries + .into_iter() + .filter_map(search_entry_to_summary) + .collect(); + Ok(HfSearchPage { rows, has_more }) +} + +// ─── RAM-fit estimation + annotated view rows ──────────────────────────────── +// +// The model-settings UI surfaces a "will this fit in your Mac's RAM" hint in +// both Discover and Library. The authoritative per-starter estimate lives in +// the registry; for arbitrary downloaded/searched models there is no curated +// number, so these helpers estimate the resident footprint (weights + a fixed +// KV/runtime overhead) and reuse `registry::ram_fit` for the threshold. They +// are deliberately approximate: the result is a hint, never a hard gate. + +/// A repo `.gguf` file annotated with the accurate per-quant RAM-fit computed +/// from its real file size. `fit` is `None` when host RAM or the file size is +/// unknown (both are required to judge fit). +#[derive(Debug, Clone, PartialEq, Serialize)] +pub struct HfGgufFileRow { + #[serde(flatten)] + pub file: HfGgufFile, + pub fit: Option, + /// Whether this exact repo file is already recorded in the installed + /// manifest. Lets Browse-all show an "Installed" marker instead of a + /// download button once a quant finishes downloading. + pub installed: bool, +} + +/// An installed model annotated with its RAM-fit on the host, computed from the +/// recorded weights size. `fit` is `None` when host RAM or the size is unknown. +#[derive(Debug, Clone, PartialEq, Serialize)] +pub struct InstalledModelView { + #[serde(flatten)] + pub model: manifest::InstalledModel, + pub fit: Option, + /// Trained context window in tokens, healed from the curated registry by + /// repo + file. `None` for a pasted model with no registry entry (its + /// context is not recorded in the manifest). + pub context_length: Option, + /// Vision projector size in bytes, healed from the registry so the listed + /// total (weights + mmproj) matches Discover's. `0` for a text model or a + /// pasted repo with no registry entry (the manifest records only weights). + pub mmproj_bytes: u64, + /// Model maker (e.g. "Google"), healed from the registry. `None` for a + /// pasted repo with no entry, where the UI falls back to the repo id. + pub origin: Option, +} + +/// Estimated resident memory (GiB) for a GGUF weights blob of `size_bytes`: +/// the on-disk size plus the fixed [`RUNTIME_OVERHEAD_GB`]. +pub fn estimate_runtime_gb_from_bytes(size_bytes: u64) -> f64 { + size_bytes as f64 / (1u64 << 30) as f64 + RUNTIME_OVERHEAD_GB +} + +/// Clamps a requested search page size to `1..=`[`HF_SEARCH_LIMIT_MAX`] so a +/// runaway page count cannot request an unbounded result set. +pub fn clamp_search_limit(limit: usize) -> usize { + limit.clamp(1, HF_SEARCH_LIMIT_MAX) +} + +/// Annotates repo `.gguf` rows with the accurate per-quant RAM-fit from each +/// file's real size. A row gets `None` when host RAM or the file size is 0. +pub fn annotate_gguf_rows(files: Vec, ram_bytes: u64) -> Vec { + files + .into_iter() + .map(|file| { + let fit = if ram_bytes > 0 && file.size_bytes > 0 { + Some(registry::ram_fit( + estimate_runtime_gb_from_bytes(file.size_bytes), + ram_bytes, + )) + } else { + None + }; + HfGgufFileRow { + file, + fit, + installed: false, + } + }) + .collect() +} + +/// Fills each row's `partial_bytes` from the blob store so Browse-all can offer +/// Resume / Discard for any file with an interrupted partial on disk. A row +/// whose `sha256` is empty (a non-LFS file) has no content-addressed partial +/// and stays `None`. +pub fn attach_partials( + rows: Vec, + store: &storage::ModelStore, +) -> Vec { + rows.into_iter() + .map(|mut row| { + if !row.file.sha256.is_empty() { + row.file.partial_bytes = store.existing_partial_len(&row.file.sha256); + } + row + }) + .collect() +} + +/// Marks each row whose `:` is already recorded in the installed +/// manifest, so Browse-all shows an "Installed" marker rather than a download +/// button once a quant finishes. A manifest read error degrades to "not +/// installed" rather than failing the listing, mirroring [`annotate_starter`]. +pub fn attach_installed( + rows: Vec, + repo: &str, + conn: &rusqlite::Connection, +) -> Vec { + rows.into_iter() + .map(|mut row| { + let id = format!("{repo}:{}", row.file.file); + row.installed = matches!(manifest::get(conn, &id), Ok(Some(_))); + row + }) + .collect() +} + +/// Annotates installed models with their RAM-fit on the host, from the recorded +/// weights size. A model gets `None` when host RAM or the size is 0. +pub fn build_installed_views( + models: Vec, + ram_bytes: u64, +) -> Vec { + models + .into_iter() + .map(|model| { + let fit = if ram_bytes > 0 && model.size_bytes > 0 { + Some(registry::ram_fit( + estimate_runtime_gb_from_bytes(model.size_bytes), + ram_bytes, + )) + } else { + None + }; + // Curated models heal their context window, vision-projector size, + // and maker from the registry so the Library row reads the same + // facts Discover does; a pasted repo has no entry, so those stay + // absent (the UI falls back to the repo id for the maker). + let starter = registry::by_repo_file(&model.repo, &model.file_name); + let context_length = starter.map(|s| s.context_length); + let mmproj_bytes = starter.map_or(0, |s| s.mmproj_bytes); + let origin = starter.map(|s| s.origin.to_string()); + InstalledModelView { + model, + fit, + context_length, + mmproj_bytes, + origin, + } + }) + .collect() +} + +/// Validates the query length, runs the Hugging Face GGUF model search against +/// `base_url`, and parses the result. `base_url` is parameterized so tests +/// point at a mock server; production passes [`HF_BASE_URL`]. +pub async fn fetch_hf_search( + client: &reqwest::Client, + base_url: &str, + query: &str, + limit: usize, +) -> Result { + let query = query.trim(); + if query.len() > MAX_HF_SEARCH_QUERY_LEN { + return Err(format!( + "search query exceeds maximum length of {MAX_HF_SEARCH_QUERY_LEN} bytes" + )); + } + let body = fetch_hf_search_inner( + client, + base_url, + query, + std::time::Duration::from_secs(HF_API_TIMEOUT_SECS), + MAX_HF_API_BODY_BYTES, + limit, + ) + .await?; + parse_search_results(&body, limit) +} + +/// Innermost search fetcher with timeout, body cap, and result limit +/// configurable so the cap branches are testable. Every query parameter is +/// percent-encoded by `Url::parse_with_params` (no manual string building) so a +/// query cannot smuggle URL syntax, and the host stays fixed to `base_url` so +/// there is no SSRF surface. The body cap is enforced incrementally during the +/// streaming read, mirroring [`fetch_hf_repo_listing_inner`]. +async fn fetch_hf_search_inner( + client: &reqwest::Client, + base_url: &str, + query: &str, + timeout: std::time::Duration, + max_body_bytes: usize, + limit: usize, +) -> Result, String> { + let endpoint = format!("{}/api/models", base_url.trim_end_matches('/')); + let limit = limit.to_string(); + // `filter=gguf` matches repos *tagged* gguf (the dedicated quant repos that + // actually ship `.gguf` files). `library=gguf` is deliberately NOT used: it + // also matches base repos that merely link to GGUF quants elsewhere, so the + // rows would have no downloadable `.gguf` files of their own. The chat-model + // gate is NOT a server `pipeline_tag` filter: that param takes a single value + // and so cannot express "text-generation OR image-text-to-text", which hid + // every multimodal repo. Instead each row's `pipeline_tag` is expanded and + // checked against `SEARCHABLE_PIPELINE_TAGS` in `search_entry_to_summary`. + let mut params: Vec<(&str, &str)> = vec![ + ("filter", "gguf"), + ("sort", "downloads"), + ("direction", "-1"), + ("limit", &limit), + // One expand set carries everything a row needs in a single request, so + // there is no per-repo follow-up call: `gguf` (context window + chat + // template + architecture), `siblings` (the file list, scanned for an + // mmproj vision companion), and `pipeline_tag` (the chat-model allowlist). + ("expand[]", "gguf"), + ("expand[]", "siblings"), + ("expand[]", "pipeline_tag"), + ]; + // An empty query browses the most-downloaded GGUF repos; only attach the + // search term when the user actually typed one. + if !query.is_empty() { + params.push(("search", query)); + } + let url = reqwest::Url::parse_with_params(&endpoint, params) + .map_err(|e| format!("failed to build Hugging Face search URL: {e}"))?; + let response = client + .get(url) + .timeout(timeout) + .send() + .await + .map_err(|e| format!("failed to reach Hugging Face: {e}"))?; + + if !response.status().is_success() { + return Err(format!( + "Hugging Face API returned HTTP {}", + response.status().as_u16() + )); + } + + if let Some(declared_len) = response.content_length() { + if declared_len as usize > max_body_bytes { + return Err(format!( + "Hugging Face search response exceeded {max_body_bytes} bytes" + )); + } + } + + let mut stream = response.bytes_stream(); + let mut buf: Vec = Vec::new(); + while let Some(chunk) = stream.next().await { + let chunk = chunk.map_err(|e| format!("failed to read Hugging Face search body: {e}"))?; + if buf.len() + chunk.len() > max_body_bytes { + return Err(format!( + "Hugging Face search response exceeded {max_body_bytes} bytes" + )); + } + buf.extend_from_slice(&chunk); + } + + Ok(buf) +} + // ─── OpenAI-compatible model listing ───────────────────────────────────────── /// Subset of an OpenAI-compatible `/v1/models` response Thuki consumes. @@ -1643,12 +2218,97 @@ pub fn repo_installed_model( size_bytes: resolved.weights_size_bytes, quant: quant_from_filename(file), vision: resolved.mmproj.is_some(), - thinking: false, + // Name-based first guess; finalize_install refines `thinking` and sets + // `reasoning_always` from the downloaded GGUF's chat template, falling + // back to this guess when the template cannot be read. + thinking: detect_thinking(repo, file), + reasoning_always: false, mmproj_file: resolved.mmproj.as_ref().map(|m| m.file.clone()), mmproj_sha256: resolved.mmproj.as_ref().map(|m| m.sha256.clone()), } } +/// The curated `(thinking, reasoning_always)` flags for a model, when it is a +/// registry starter. `None` for a pasted/arbitrary repo. Curated flags are the +/// highest-confidence source, so both the installer and the heal prefer them +/// over a GGUF scan. +pub(crate) fn curated_reasoning_flags(repo: &str, file_name: &str) -> Option<(bool, bool)> { + registry::STARTERS + .iter() + .find(|s| s.repo == repo && s.file_name == file_name) + .map(|s| (s.thinking, s.reasoning_always)) +} + +/// Derives `(thinking, reasoning_always)` for a non-curated model from its GGUF +/// metadata: a readable chat template is classified by +/// [`reasoning::classify_reasoning`]; an absent or empty template falls back to +/// the repo/file name via [`detect_thinking`], leaving `reasoning_always` off +/// for the runtime backstop to correct an always-reasoning model from real +/// output. The single reasoning-derivation point: the Browse-all search rows and +/// the install/heal path both route through it, so identical metadata always +/// yields identical flags. The name is only the inputs here; for a repo-level +/// search row with no chosen file, pass an empty `file`. +pub(crate) fn reasoning_flags_from_metadata( + chat_template: Option<&str>, + architecture: Option<&str>, + repo: &str, + file: &str, +) -> (bool, bool) { + match chat_template { + Some(t) if !t.is_empty() => reasoning::classify_reasoning(t, architecture).flags(), + _ => (detect_thinking(repo, file), false), + } +} + +/// Resolves the final reasoning flags for a model: curated registry flags when +/// it is a starter, otherwise the class derived from the on-disk GGUF blob's +/// chat template (with the name fallback baked into +/// [`reasoning_flags_from_metadata`]). Coverage-off: the registry lookup and the +/// derivation are tested through [`curated_reasoning_flags`] / +/// [`reasoning_flags_from_metadata`]; this wrapper only adds the blob read. +#[cfg_attr(coverage_nightly, coverage(off))] +fn resolve_reasoning_flags( + store: &storage::ModelStore, + repo: &str, + file_name: &str, + sha256: &str, +) -> (bool, bool) { + if let Some(curated) = curated_reasoning_flags(repo, file_name) { + return curated; + } + let meta = gguf::read_gguf_metadata_from_file(&store.blob_path(sha256)); + let template = meta.as_ref().and_then(|m| m.chat_template.as_deref()); + let architecture = meta.as_ref().and_then(|m| m.architecture.as_deref()); + reasoning_flags_from_metadata(template, architecture, repo, file_name) +} + +/// Re-classifies installed built-in rows whose `reasoning_always` is `NULL` +/// (rows written before the classifier existed) and persists the result so they +/// stop appearing in [`manifest::list_unclassified`]. Best-effort: any list, +/// blob-read, or write failure is logged and skipped, never fatal. Coverage-off: +/// orchestration over tested helpers (`list_unclassified`, `resolve_reasoning_flags`, +/// `update_classification`). +#[cfg_attr(coverage_nightly, coverage(off))] +pub fn heal_unclassified_reasoning(conn: &rusqlite::Connection, store: &storage::ModelStore) { + let pending = match manifest::list_unclassified(conn) { + Ok(rows) => rows, + Err(e) => { + eprintln!("thuki: [models] reasoning heal: failed to list rows: {e}"); + return; + } + }; + for row in pending { + let (thinking, reasoning_always) = + resolve_reasoning_flags(store, &row.repo, &row.file_name, &row.sha256); + if let Err(e) = manifest::update_classification(conn, &row.id, thinking, reasoning_always) { + eprintln!( + "thuki: [models] reasoning heal: failed to persist {}: {e}", + row.id + ); + } + } +} + /// Deletion outcome consumed by the thin Tauri wrapper. #[derive(Debug, Clone, Copy, PartialEq)] pub struct DeleteOutcome { @@ -1671,7 +2331,7 @@ pub fn delete_installed_model_inner( builtin_model: &str, ) -> Result { let guard = state.0.lock().map_err(|e| e.to_string())?; - if guard.is_some() { + if !guard.is_empty() { return Err("a download is already in progress".to_string()); } let orphans = manifest::delete(conn, id).map_err(|e| e.to_string())?; @@ -1683,9 +2343,10 @@ pub fn delete_installed_model_inner( /// Removes the partial file for `sha256` so the next download starts fresh. /// Refuses malformed digests (the digest doubles as a file name) and refuses -/// while a download is running (it may be writing that very partial). Holds -/// the download-state lock across the removal so a concurrent claim cannot -/// race the delete. +/// only while a download is actively writing this very blob (deleting its +/// partial would fail that download's verification with NotFound). Unrelated +/// parallel downloads do not block the discard. Holds the download-state lock +/// across the removal so a concurrent claim cannot race the delete. pub fn discard_partial_inner( state: &DownloadState, store: &storage::ModelStore, @@ -1695,8 +2356,11 @@ pub fn discard_partial_inner( return Err("invalid sha256".to_string()); } let guard = state.0.lock().map_err(|e| e.to_string())?; - if guard.is_some() { - return Err("a download is already in progress".to_string()); + if guard + .values() + .any(|slot| slot.shas.iter().any(|s| s == sha256)) + { + return Err("a download for this file is already in progress".to_string()); } match std::fs::remove_file(store.partial_path(sha256)) { Ok(()) => Ok(()), @@ -1748,6 +2412,19 @@ pub fn get_starter_options( Ok(build_starter_options(&conn, &store, system_ram_bytes())) } +/// Returns the full Staff Picks catalog: every curated registry entry annotated +/// with RAM fit, installed state, and resumable-partial size. The frontend +/// groups the rows by `starter.category` into use-case sections. +#[cfg_attr(coverage_nightly, coverage(off))] +#[cfg_attr(not(coverage), tauri::command)] +pub fn get_staff_picks( + db: tauri::State<'_, crate::history::Database>, + store: tauri::State<'_, storage::ModelStore>, +) -> Result, String> { + let conn = db.0.lock().map_err(|e| e.to_string())?; + Ok(build_staff_picks(&conn, &store, system_ram_bytes())) +} + /// Total physical RAM in bytes, for frontend sizing copy. #[cfg_attr(coverage_nightly, coverage(off))] #[cfg_attr(not(coverage), tauri::command)] @@ -1770,16 +2447,46 @@ pub fn get_models_dir_free_bytes(store: tauri::State<'_, storage::ModelStore>) - #[cfg_attr(not(coverage), tauri::command)] pub fn download_starter( tier: String, + key: String, on_event: tauri::ipc::Channel, app: tauri::AppHandle, download_state: tauri::State<'_, DownloadState>, ) -> Result<(), String> { let starter = starter_for_tier(&tier)?; - let token = claim_download(&download_state)?; + let specs = registry::download_specs(starter); + let token = claim_download(&download_state, &key, spec_shas(&specs))?; + spawn_model_download( + app, + specs, + registry::to_installed_model(starter), + key, + token, + on_event, + ); + Ok(()) +} + +/// Starts downloading a Staff Picks catalog entry by its stable `id`. Same +/// verified path as [`download_starter`] (pinned revision + sha256, manifest +/// record on success), but keyed by id so a category can hold any number of +/// models. Progress streams over `on_event`. +#[cfg_attr(coverage_nightly, coverage(off))] +#[cfg_attr(not(coverage), tauri::command)] +pub fn download_staff_pick( + id: String, + key: String, + on_event: tauri::ipc::Channel, + app: tauri::AppHandle, + download_state: tauri::State<'_, DownloadState>, +) -> Result<(), String> { + let starter = starter_for_id(&id)?; + let specs = registry::download_specs(starter); + let token = claim_download(&download_state, &key, spec_shas(&specs))?; spawn_model_download( app, - registry::download_specs(starter), + specs, registry::to_installed_model(starter), + key, token, on_event, ); @@ -1793,17 +2500,20 @@ pub fn download_starter( pub async fn download_repo_model( repo: String, file: String, + key: String, on_event: tauri::ipc::Channel, app: tauri::AppHandle, client: tauri::State<'_, reqwest::Client>, download_state: tauri::State<'_, DownloadState>, ) -> Result<(), String> { let resolved = resolve_repo_spec(&client, HF_BASE_URL, &repo, &file).await?; - let token = claim_download(&download_state)?; + let specs = repo_download_specs(HF_BASE_URL, &repo, &file, &resolved); + let token = claim_download(&download_state, &key, spec_shas(&specs))?; spawn_model_download( app, - repo_download_specs(HF_BASE_URL, &repo, &file, &resolved), + specs, repo_installed_model(&repo, &file, &resolved), + key, token, on_event, ); @@ -1816,8 +2526,26 @@ pub async fn download_repo_model( pub async fn list_hf_repo_ggufs( repo: String, client: tauri::State<'_, reqwest::Client>, -) -> Result, String> { - fetch_repo_gguf_listing(&client, HF_BASE_URL, &repo).await + store: tauri::State<'_, storage::ModelStore>, + db: tauri::State<'_, crate::history::Database>, +) -> Result, String> { + let files = fetch_repo_gguf_listing(&client, HF_BASE_URL, &repo).await?; + let rows = attach_partials(annotate_gguf_rows(files, system_ram_bytes()), &store); + let conn = db.0.lock().map_err(|e| e.to_string())?; + Ok(attach_installed(rows, &repo, &conn)) +} + +/// Searches Hugging Face for GGUF model repos matching `query`, most-downloaded +/// first. Backs the in-app model browser; an empty query returns the most +/// popular GGUF repos. +#[cfg_attr(coverage_nightly, coverage(off))] +#[cfg_attr(not(coverage), tauri::command)] +pub async fn search_hf_models( + query: String, + limit: usize, + client: tauri::State<'_, reqwest::Client>, +) -> Result { + fetch_hf_search(&client, HF_BASE_URL, &query, clamp_search_limit(limit)).await } /// Lists the models served by the configured OpenAI-compatible provider via @@ -1836,12 +2564,13 @@ pub async fn list_openai_models( fetch_openai_models(&client, &base_url, api_key.as_deref()).await } -/// Cancels the in-flight model download, if any. The download task emits -/// `Cancelled` and keeps the partial for a later resume. +/// Cancels the in-flight model download identified by `key`, if any. The +/// download task emits `Cancelled` and keeps the partial for a later resume. +/// Other concurrent downloads are unaffected. #[cfg_attr(coverage_nightly, coverage(off))] #[cfg_attr(not(coverage), tauri::command)] -pub fn cancel_model_download(download_state: tauri::State<'_, DownloadState>) { - cancel_active_download(&download_state); +pub fn cancel_model_download(key: String, download_state: tauri::State<'_, DownloadState>) { + cancel_download(&download_state, &key); } /// Removes the partial file for `sha256` (the user chose Discard over Resume). @@ -1860,9 +2589,10 @@ pub fn discard_partial_download( #[cfg_attr(not(coverage), tauri::command)] pub fn list_installed_models( db: tauri::State<'_, crate::history::Database>, -) -> Result, String> { +) -> Result, String> { let conn = db.0.lock().map_err(|e| e.to_string())?; - manifest::list(&conn).map_err(|e| e.to_string()) + let models = manifest::list(&conn).map_err(|e| e.to_string())?; + Ok(build_installed_views(models, system_ram_bytes())) } /// Deletes an installed model: manifest row, orphaned blobs, and (when it was @@ -1889,6 +2619,32 @@ pub fn delete_installed_model( Ok(()) } +/// Reveals an installed model's weights blob in Finder. Thin FFI wrapper +/// (excluded from coverage) over `open -R`, mirroring +/// [`crate::settings_commands::reveal_config_in_finder`]; the manifest lookup +/// and content-addressed path are covered through `manifest::get` and +/// `storage::ModelStore::blob_path`. +#[cfg_attr(coverage_nightly, coverage(off))] +#[cfg_attr(not(coverage), tauri::command)] +pub fn reveal_model_in_finder( + id: String, + db: tauri::State<'_, crate::history::Database>, + store: tauri::State<'_, storage::ModelStore>, +) -> Result<(), String> { + let model = { + let conn = db.0.lock().map_err(|e| e.to_string())?; + manifest::get(&conn, &id) + .map_err(|e| e.to_string())? + .ok_or_else(|| format!("model not installed: {id}"))? + }; + std::process::Command::new("open") + .arg("-R") + .arg(store.blob_path(&model.sha256)) + .spawn() + .map(|_| ()) + .map_err(|e| e.to_string()) +} + /// Maps the `finalize_install` outcome onto the terminal download event: /// `AllDone` once the install is recorded, `Failed` otherwise. AllDone is /// emitted here (after finalize) rather than from `run_download` so the @@ -1904,6 +2660,12 @@ pub(crate) fn finalize_outcome_event(result: Result<(), String>) -> download::Do } } +/// The blob shas a spec list writes, recorded on the download slot so a discard +/// can scope its in-flight check to the exact partial(s) this download owns. +fn spec_shas(specs: &[download::DownloadSpec]) -> Vec { + specs.iter().map(|s| s.sha256.clone()).collect() +} + /// Runs the claimed download on the async runtime: streams events to the /// channel, records the manifest row + builtin provider model on success /// (then emits AllDone, or Failed when recording fails), and releases the @@ -1915,6 +2677,7 @@ fn spawn_model_download( app: tauri::AppHandle, specs: Vec, model: manifest::InstalledModel, + key: String, token: tokio_util::sync::CancellationToken, on_event: tauri::ipc::Channel, ) { @@ -1935,31 +2698,61 @@ fn spawn_model_download( } let _ = on_event_finalize.send(finalize_outcome_event(finalized)); } - release_download(&app.state::()); + release_download(&app.state::(), &key); }); } /// Records a completed download: manifest insert, removal of blobs the /// replaced row no longer references (a re-download whose upstream content -/// changed must not strand the old multi-GB blob), then the builtin -/// provider's `model` field (the active provider is never changed here). +/// changed must not strand the old multi-GB blob), then adopts the model as the +/// builtin provider's selection ONLY when none is chosen yet (the active +/// provider is never changed here). A later install must not steal the active +/// model from a model the user already selected. #[cfg_attr(coverage_nightly, coverage(off))] fn finalize_install( app: &tauri::AppHandle, model: &manifest::InstalledModel, ) -> Result<(), String> { + let store = app.state::(); + // Classify reasoning from the just-downloaded GGUF's chat template so the + // picker badge and `/think` gate are correct the instant the install lands. + // Curated starters keep their registry flags; a template that cannot be read + // keeps the placeholder flags for the runtime backstop to correct. + let (thinking, reasoning_always) = + resolve_reasoning_flags(store.inner(), &model.repo, &model.file_name, &model.sha256); + let model = manifest::InstalledModel { + thinking, + reasoning_always, + ..model.clone() + }; let orphans = { let db = app.state::(); let conn = db.0.lock().map_err(|e| e.to_string())?; - manifest::insert(&conn, model).map_err(|e| e.to_string())? + manifest::insert(&conn, &model).map_err(|e| e.to_string())? }; // Best-effort: the install itself succeeded, so a failure to reclaim the // superseded blobs must not fail the download; it only leaks disk space. - if let Err(e) = app.state::().remove_blobs(&orphans) { + if let Err(e) = store.remove_blobs(&orphans) { eprintln!("thuki: [models] failed to remove superseded blobs: {e}"); } let config = app.state::>(); - persist_active_provider_model(app, &config, PROVIDER_ID_BUILTIN, &model.id) + // Auto-select only the first model: adopt this download as the built-in + // model when the provider has none yet; otherwise a completed download just + // installs and leaves the user's active choice alone. Parallel downloads + // finish in arbitrary order, so a last-one-wins overwrite would be + // unpredictable. + if adopt_as_builtin_model(&builtin_provider_model(&config.read())) { + persist_active_provider_model(app, &config, PROVIDER_ID_BUILTIN, &model.id) + } else { + Ok(()) + } +} + +/// Whether a freshly installed model should become the built-in provider's +/// active model: only when the provider has no model selected yet (empty id). +/// Keeps "auto-select the first model" predictable under parallel downloads. +fn adopt_as_builtin_model(current_builtin_model: &str) -> bool { + current_builtin_model.is_empty() } // ─── Tests ────────────────────────────────────────────────────────────────── @@ -3060,6 +3853,7 @@ mod tests { let caps = Capabilities { vision: true, thinking: false, + reasoning_always: false, max_images: Some(1), }; let v = serde_json::to_value(&caps).unwrap(); @@ -3068,6 +3862,7 @@ mod tests { serde_json::json!({ "vision": true, "thinking": false, + "reasoningAlways": false, "maxImages": 1, }) ); @@ -3078,6 +3873,7 @@ mod tests { let caps = Capabilities { vision: true, thinking: false, + reasoning_always: false, max_images: None, }; let v = serde_json::to_value(&caps).unwrap(); @@ -3620,6 +4416,7 @@ mod tests { quant: "Q4_K_M".to_string(), vision, thinking, + reasoning_always: false, mmproj_file: None, mmproj_sha256: None, } @@ -3644,6 +4441,53 @@ mod tests { assert!(caps.values().all(|c| c.max_images.is_none())); } + /// A curated starter installed before its `thinking` flag was corrected + /// still carries the stale flag in its manifest row. The capability view + /// heals it from the current registry, so the model is no longer wrongly + /// told it "does not emit thinking tokens" without a manifest migration. + #[test] + fn builtin_capabilities_heal_curated_flags_from_registry() { + let fast = registry::STARTERS + .iter() + .find(|s| s.tier == registry::Tier::Fast) + .unwrap(); + // Simulate a row written before the flag fix: capabilities recorded + // as the old, wrong values. + let mut stale = registry::to_installed_model(fast); + stale.thinking = false; + stale.vision = false; + + let caps = builtin_capabilities_from_manifest(&[stale]); + + let healed = &caps[®istry::to_installed_model(fast).id]; + assert!( + healed.thinking, + "registry heals the corrected reasoning flag" + ); + assert!( + healed.vision, + "registry capabilities win for curated models" + ); + } + + /// gpt-oss (curated Smartest) reasons unstoppably; its `reasoning_always` + /// capability is healed from the registry so the picker can badge it. A + /// pasted (non-curated) row defaults to not-always (runtime detection is a + /// follow-up). + #[test] + fn builtin_capabilities_reasoning_always_from_registry() { + let smartest = registry::STARTERS + .iter() + .find(|s| s.tier == registry::Tier::Smartest) + .unwrap(); + let caps = builtin_capabilities_from_manifest(&[registry::to_installed_model(smartest)]); + assert!(caps[®istry::to_installed_model(smartest).id].reasoning_always); + + let pasted = + builtin_capabilities_from_manifest(&[manifest_row("org/repo:x.gguf", false, true)]); + assert!(!pasted["org/repo:x.gguf"].reasoning_always); + } + #[test] fn builtin_capabilities_empty_manifest_yields_empty_map() { assert!(builtin_capabilities_from_manifest(&[]).is_empty()); @@ -3702,21 +4546,26 @@ mod tests { } #[test] - fn build_starter_options_marks_installed_and_partial() { + fn build_starter_options_returns_annotated_onboarding_heroes() { let conn = crate::database::open_in_memory().unwrap(); let (_dir, store) = make_store(); - // First starter is installed (manifest row present); second has an - // in-flight partial; third is untouched. - let starters = registry::STARTERS; - manifest::insert(&conn, ®istry::to_installed_model(&starters[0])).unwrap(); - std::fs::write(store.partial_path(starters[1].sha256), [0u8; 10]).unwrap(); + // Onboarding draws exactly the three tier heroes, in tier order. First + // hero is installed (manifest row present); second has an in-flight + // partial; third is untouched. + let heroes = registry::onboarding_heroes(); + manifest::insert(&conn, ®istry::to_installed_model(heroes[0])).unwrap(); + std::fs::write(store.partial_path(heroes[1].sha256), [0u8; 10]).unwrap(); const GIB: u64 = 1 << 30; let opts = build_starter_options(&conn, &store, 16 * GIB); - assert_eq!(opts.len(), starters.len()); - assert_eq!(opts[0].starter, starters[0]); + assert_eq!(opts.len(), heroes.len()); + assert_eq!( + opts.iter().map(|o| o.starter.id).collect::>(), + registry::ONBOARDING_HERO_IDS.to_vec() + ); + assert_eq!(&opts[0].starter, heroes[0]); assert!(opts[0].installed); assert_eq!(opts[0].partial_bytes, None); assert!(!opts[1].installed); @@ -3724,7 +4573,7 @@ mod tests { assert!(!opts[2].installed); assert_eq!(opts[2].partial_bytes, None); // Fit hints come straight from registry::ram_fit at the given RAM. - for (opt, s) in opts.iter().zip(starters) { + for (opt, s) in opts.iter().zip(heroes) { assert_eq!(opt.fit, registry::ram_fit(s.est_runtime_gb, 16 * GIB)); } } @@ -3771,38 +4620,91 @@ mod tests { assert!(starter_for_tier("turbo").is_err()); } + #[test] + fn starter_for_id_resolves_and_rejects() { + // The id-keyed Staff Picks download path resolves a real slug and + // rejects an unknown one with an error rather than a panic. + assert_eq!(starter_for_id("qwen3.5-9b").unwrap().id, "qwen3.5-9b"); + assert_eq!(starter_for_id("gpt-oss-20b").unwrap().id, "gpt-oss-20b"); + assert!(starter_for_id("not-a-real-id").is_err()); + } + + #[test] + fn build_staff_picks_covers_every_registry_entry() { + let conn = crate::database::open_in_memory().unwrap(); + let (_dir, store) = make_store(); + // Install the first catalog entry; only it must read back as installed. + manifest::insert(&conn, ®istry::to_installed_model(®istry::STARTERS[0])).unwrap(); + + const GIB: u64 = 1 << 30; + let opts = build_staff_picks(&conn, &store, 16 * GIB); + + // Every registry entry is present, in registry order. + assert_eq!(opts.len(), registry::STARTERS.len()); + assert_eq!( + opts.iter().map(|o| o.starter.id).collect::>(), + registry::STARTERS.iter().map(|s| s.id).collect::>() + ); + assert!(opts[0].installed); + assert!(opts[1..].iter().all(|o| !o.installed)); + // Fit comes straight from registry::ram_fit at the given RAM. + for (opt, s) in opts.iter().zip(registry::STARTERS) { + assert_eq!(opt.fit, registry::ram_fit(s.est_runtime_gb, 16 * GIB)); + } + } + // ── Model library: download claim ──────────────────────────────────────── #[test] - fn download_claim_rejects_second_concurrent() { + fn download_claim_allows_distinct_keys_and_rejects_a_duplicate() { let state = DownloadState::default(); - let token = claim_download(&state).unwrap(); + let token = claim_download(&state, "model-a", vec![]).unwrap(); assert!(!token.is_cancelled()); - let err = claim_download(&state).unwrap_err(); + // A different model downloads concurrently: its own slot is granted. + assert!(claim_download(&state, "model-b", vec![]).is_ok()); + // The same key cannot start twice while it is in flight. + let err = claim_download(&state, "model-a", vec![]).unwrap_err(); assert_eq!(err, "a download is already in progress"); - // Release clears the claim so a new download can start. - release_download(&state); - assert!(claim_download(&state).is_ok()); + // Releasing one key frees only that slot. + release_download(&state, "model-a"); + assert!(claim_download(&state, "model-a", vec![]).is_ok()); + } + + #[test] + fn spec_shas_collects_every_blob_digest() { + let specs = registry::download_specs(registry::onboarding_heroes()[1]); + let shas = spec_shas(&specs); + assert_eq!(shas.len(), specs.len()); + for (sha, spec) in shas.iter().zip(&specs) { + assert_eq!(sha, &spec.sha256); + } } #[test] - fn download_in_flight_tracks_the_claim() { + fn download_in_flight_tracks_any_claim() { let state = DownloadState::default(); assert!(!download_in_flight(&state)); - let _token = claim_download(&state).unwrap(); + let _a = claim_download(&state, "a", vec![]).unwrap(); + let _b = claim_download(&state, "b", vec![]).unwrap(); assert!(download_in_flight(&state)); - release_download(&state); + // One release leaves the other download in flight. + release_download(&state, "a"); + assert!(download_in_flight(&state)); + release_download(&state, "b"); assert!(!download_in_flight(&state)); } #[test] - fn cancel_active_download_cancels_claimed_token_and_tolerates_idle() { + fn cancel_download_cancels_only_the_keyed_token_and_tolerates_idle() { let state = DownloadState::default(); - // No claim yet: cancelling is a harmless no-op. - cancel_active_download(&state); - let token = claim_download(&state).unwrap(); - cancel_active_download(&state); - assert!(token.is_cancelled()); + // No such key: cancelling is a harmless no-op. + cancel_download(&state, "missing"); + let a = claim_download(&state, "a", vec![]).unwrap(); + let b = claim_download(&state, "b", vec![]).unwrap(); + cancel_download(&state, "a"); + assert!(a.is_cancelled()); + // Cancelling one download leaves the others running. + assert!(!b.is_cancelled()); } #[test] @@ -3813,14 +4715,23 @@ mod tests { let _guard = state_ref.0.lock().unwrap(); panic!("poison"); }); - assert!(claim_download(&state).is_err()); + assert!(claim_download(&state, "k", vec![]).is_err()); let (_dir, store) = make_store(); assert!(discard_partial_inner(&state, &store, &"a".repeat(64)).is_err()); let conn = crate::database::open_in_memory().unwrap(); assert!(delete_installed_model_inner(&state, &conn, &store, "x:y.gguf", "").is_err()); // Best-effort operations must not panic on the poisoned lock. - cancel_active_download(&state); - release_download(&state); + cancel_download(&state, "k"); + release_download(&state, "k"); + } + + #[test] + fn adopt_as_builtin_model_only_for_the_first_model() { + // No model selected yet: the first completed download is adopted. + assert!(adopt_as_builtin_model("")); + // A model is already active: a later parallel completion does not steal + // the active slot. + assert!(!adopt_as_builtin_model("google/gemma:gemma-q4.gguf")); } #[test] @@ -3904,20 +4815,102 @@ mod tests { vec![ HfGgufFile { file: "model-Q4_K_M.gguf".to_string(), - size_bytes: 1000 + size_bytes: 1000, + sha256: "a".repeat(64), + partial_bytes: None, }, HfGgufFile { file: "extra.gguf".to_string(), - size_bytes: 7 + size_bytes: 7, + sha256: String::new(), + partial_bytes: None, }, HfGgufFile { file: "bare.gguf".to_string(), - size_bytes: 0 + size_bytes: 0, + sha256: String::new(), + partial_bytes: None, }, ] ); } + #[test] + fn sanitize_context_length_trusts_only_sane_values() { + assert_eq!(sanitize_context_length(None), None); + assert_eq!(sanitize_context_length(Some(0)), None); + assert_eq!(sanitize_context_length(Some(131_072)), Some(131_072)); + assert_eq!( + sanitize_context_length(Some(MAX_MODEL_CONTEXT_LENGTH as u64)), + Some(MAX_MODEL_CONTEXT_LENGTH) + ); + assert_eq!( + sanitize_context_length(Some(MAX_MODEL_CONTEXT_LENGTH as u64 + 1)), + None + ); + } + + #[test] + fn attach_partials_reports_planted_and_skips_empty_sha() { + let (_dir, store) = make_store(); + let sha = "a".repeat(64); + // Plant a 9-byte partial for the LFS-backed row. + let path = store.partial_path(&sha); + std::fs::create_dir_all(path.parent().unwrap()).unwrap(); + std::fs::write(&path, [0u8; 9]).unwrap(); + + let rows = vec![ + HfGgufFileRow { + file: HfGgufFile { + file: "weights.gguf".to_string(), + size_bytes: 100, + sha256: sha.clone(), + partial_bytes: None, + }, + fit: None, + installed: false, + }, + HfGgufFileRow { + file: HfGgufFile { + file: "no-lfs.gguf".to_string(), + size_bytes: 50, + sha256: String::new(), + partial_bytes: None, + }, + fit: None, + installed: false, + }, + ]; + let out = attach_partials(rows, &store); + // The LFS-backed row reflects the planted partial; the empty-sha row is + // skipped entirely. + assert_eq!(out[0].file.partial_bytes, Some(9)); + assert_eq!(out[1].file.partial_bytes, None); + } + + #[test] + fn attach_installed_marks_only_manifest_rows() { + let conn = crate::database::open_in_memory().unwrap(); + // Record one of the two files in the manifest under ":". + manifest::insert(&conn, &manifest_row("org/repo:in.gguf", false, false)).unwrap(); + + let row = |name: &str| HfGgufFileRow { + file: HfGgufFile { + file: name.to_string(), + size_bytes: 100, + sha256: String::new(), + partial_bytes: None, + }, + fit: None, + installed: false, + }; + let out = attach_installed(vec![row("in.gguf"), row("out.gguf")], "org/repo", &conn); + + // Only the file recorded in the manifest is marked installed. + assert!(out[0].installed); + assert!(!out[1].installed); + } + #[test] fn parse_gguf_listing_rejects_invalid_json() { let err = parse_gguf_listing(b"not json").unwrap_err(); @@ -3929,9 +4922,19 @@ mod tests { let v = serde_json::to_value(HfGgufFile { file: "x.gguf".to_string(), size_bytes: 5, + sha256: "a".repeat(64), + partial_bytes: Some(3), }) .unwrap(); - assert_eq!(v, serde_json::json!({"file": "x.gguf", "size_bytes": 5})); + assert_eq!( + v, + serde_json::json!({ + "file": "x.gguf", + "size_bytes": 5, + "sha256": "a".repeat(64), + "partial_bytes": 3, + }) + ); } // ── Model library: resolve_listing (pure) ─────────────────────────────── @@ -4197,6 +5200,511 @@ mod tests { assert_eq!(files[0].file, "model-Q4_K_M.gguf"); } + // ── Model library: Hugging Face search ─────────────────────────────────── + + /// Search fixture exercising the capability derivation and the pipeline + /// allowlist: each `gated` shape (bool, strategy string, absent, null), + /// vision from an mmproj sibling, thinking from the chat template (template + /// class wins over a reasoning-y name) with a name fallback when no template + /// is present, a non-chat pipeline that is dropped, an untagged repo that is + /// dropped, and an empty-id row that is dropped. + fn search_fixture() -> serde_json::Value { + serde_json::json!([ + // alpha: chat model that ships an mmproj companion and an optional + // (`enable_thinking`) template -> vision + thinking, with context. + {"id": "org/alpha-GGUF", "downloads": 1000, "gated": false, + "pipeline_tag": "text-generation", + "gguf": {"context_length": 131072, + "chat_template": "{%- if enable_thinking %}{% endif %}", + "architecture": "qwen3"}, + "siblings": [{"rfilename": "alpha-Q4_K_M.gguf"}, + {"rfilename": "mmproj-f16.gguf"}]}, + // beta: a multimodal pipeline tag is allowlisted; no mmproj sibling + // means no vision, and an always-on `` template means thinking. + {"id": "org/beta-GGUF", "downloads": 500, "gated": "manual", + "pipeline_tag": "image-text-to-text", + "gguf": {"chat_template": "<|im_start|>assistant\\n\\n"}, + "siblings": [{"rfilename": "beta.gguf"}]}, + // gamma: no expanded gguf at all, so thinking falls back to the name + // (`QwQ` is a known reasoning family); no mmproj means no vision. + {"id": "org/QwQ-32B-GGUF", "downloads": 7, + "pipeline_tag": "text-generation"}, + // delta: a non-chat pipeline (embeddings) is dropped by the allowlist + // even though it is the most downloaded. + {"id": "org/embed-GGUF", "downloads": 99999, + "pipeline_tag": "feature-extraction"}, + // epsilon: a plain instruct template classifies as non-thinking and + // overrides the reasoning-y repo name; its context is implausibly + // large so it is dropped; no mmproj means no vision. + {"id": "org/Reasoner-GGUF", "downloads": 2, "gated": null, + "pipeline_tag": "text-generation", + "gguf": {"context_length": 9000000000u64, + "chat_template": "<|user|>{{x}}<|assistant|>", + "architecture": "llama"}, + "siblings": [{"rfilename": "r.gguf"}]}, + // zeta: no pipeline tag at all is dropped (the allowlist requires an + // explicit chat-capable tag). + {"id": "org/untagged-GGUF", "downloads": 3}, + // empty id is dropped. + {"id": "", "downloads": 9, "pipeline_tag": "text-generation"} + ]) + } + + #[test] + fn parse_search_results_maps_capabilities_and_drops_non_chat_rows() { + let body = search_fixture().to_string(); + // A generous limit keeps `has_more` false so this case stays about rows. + let page = parse_search_results(body.as_bytes(), 100).unwrap(); + assert!(!page.has_more); + assert_eq!( + page.rows, + vec![ + HfModelSummary { + id: "org/alpha-GGUF".to_string(), + downloads: 1000, + gated: false, + context_length: Some(131072), + vision: true, + thinking: true, + }, + HfModelSummary { + id: "org/beta-GGUF".to_string(), + downloads: 500, + gated: true, + context_length: None, + vision: false, + thinking: true, + }, + // gamma: thinking healed from the `QwQ` name when no template ships. + HfModelSummary { + id: "org/QwQ-32B-GGUF".to_string(), + downloads: 7, + gated: false, + context_length: None, + vision: false, + thinking: true, + }, + // epsilon: the plain template wins over the reasoning-y name. + HfModelSummary { + id: "org/Reasoner-GGUF".to_string(), + downloads: 2, + gated: false, + context_length: None, + vision: false, + thinking: false, + }, + ] + ); + } + + #[test] + fn parse_search_results_flags_has_more_when_the_page_is_full() { + let body = serde_json::json!([ + {"id": "org/a-GGUF", "downloads": 2, "pipeline_tag": "text-generation"}, + {"id": "org/b-GGUF", "downloads": 1, "pipeline_tag": "text-generation"} + ]) + .to_string(); + // Two raw entries: a page of two is full (more may exist on the Hub)... + assert!(parse_search_results(body.as_bytes(), 2).unwrap().has_more); + // ...but a page asking for three was not filled, so the Hub is exhausted. + assert!(!parse_search_results(body.as_bytes(), 3).unwrap().has_more); + } + + #[test] + fn parse_search_results_stops_paginating_at_the_ceiling() { + let page_of = |n: usize| { + let entries: Vec<_> = (0..n) + .map(|i| { + serde_json::json!({ + "id": format!("org/m{i}-GGUF"), + "downloads": 1, + "pipeline_tag": "text-generation" + }) + }) + .collect(); + serde_json::Value::Array(entries).to_string() + }; + // A full page exactly at the clamp ceiling reports no more: requests are + // clamped to HF_SEARCH_LIMIT_MAX, so paging past it would refetch the + // same capped rows forever and never let "Load more" settle. + let full = page_of(HF_SEARCH_LIMIT_MAX); + assert!( + !parse_search_results(full.as_bytes(), HF_SEARCH_LIMIT_MAX) + .unwrap() + .has_more + ); + // One step below the ceiling, a full page still invites another fetch. + let below = page_of(HF_SEARCH_LIMIT_MAX - 1); + assert!( + parse_search_results(below.as_bytes(), HF_SEARCH_LIMIT_MAX - 1) + .unwrap() + .has_more + ); + } + + #[test] + fn parse_search_results_rejects_invalid_json() { + let err = parse_search_results(b"not json", 30).unwrap_err(); + assert!(err.contains("failed to decode"), "got: {err}"); + } + + #[test] + fn hf_model_summary_serializes_snake_case() { + let v = serde_json::to_value(HfModelSummary { + id: "o/r".to_string(), + downloads: 7, + gated: true, + context_length: Some(131072), + vision: true, + thinking: false, + }) + .unwrap(); + assert_eq!( + v, + serde_json::json!({ + "id": "o/r", "downloads": 7, "gated": true, "context_length": 131072, + "vision": true, "thinking": false, + }) + ); + } + + #[test] + fn hf_search_page_serializes_snake_case() { + let v = serde_json::to_value(HfSearchPage { + rows: vec![], + has_more: true, + }) + .unwrap(); + assert_eq!(v, serde_json::json!({ "rows": [], "has_more": true })); + } + + // ── RAM-fit estimation + annotated views ───────────────────────────────── + + #[test] + fn estimate_runtime_gb_from_bytes_adds_overhead() { + // 1 GiB weights + 2.0 overhead. + assert!((estimate_runtime_gb_from_bytes(1 << 30) - 3.0).abs() < 1e-9); + } + + #[test] + fn clamp_search_limit_bounds_the_page_size() { + assert_eq!(clamp_search_limit(0), 1); + assert_eq!(clamp_search_limit(50), 50); + assert_eq!(clamp_search_limit(10_000), HF_SEARCH_LIMIT_MAX); + } + + #[test] + fn annotate_gguf_rows_uses_real_sizes() { + let files = vec![ + HfGgufFile { + file: "a.gguf".to_string(), + size_bytes: 1 << 30, + sha256: String::new(), + partial_bytes: None, + }, + HfGgufFile { + file: "b.gguf".to_string(), + size_bytes: 0, + sha256: String::new(), + partial_bytes: None, + }, + ]; + let rows = annotate_gguf_rows(files.clone(), 64 << 30); + assert_eq!(rows[0].fit, Some(registry::RamFit::Fits)); + // A zero size cannot be judged. + assert_eq!(rows[1].fit, None); + // Unknown host RAM drops every verdict. + let rows = annotate_gguf_rows(files, 0); + assert_eq!(rows[0].fit, None); + } + + #[test] + fn build_installed_views_annotates_fit() { + let model = manifest::InstalledModel { + id: "org/Repo:weights.gguf".to_string(), + display_name: "Repo".to_string(), + repo: "org/Repo".to_string(), + revision: "0".repeat(40), + file_name: "weights.gguf".to_string(), + sha256: "a".repeat(64), + size_bytes: 1 << 30, + quant: "Q4_K_M".to_string(), + vision: false, + thinking: false, + reasoning_always: false, + mmproj_file: None, + mmproj_sha256: None, + }; + let views = build_installed_views(vec![model.clone()], 64 << 30); + assert_eq!(views[0].fit, Some(registry::RamFit::Fits)); + // A pasted repo has no registry entry, so its context window, vision + // projector size, and maker are all unknown. + assert_eq!(views[0].context_length, None); + assert_eq!(views[0].mmproj_bytes, 0); + assert_eq!(views[0].origin, None); + // Unknown host RAM drops the verdict. + let views = build_installed_views(vec![model], 0); + assert_eq!(views[0].fit, None); + + // A curated model heals its context window, projector size, and maker + // from the registry. + let curated = registry::to_installed_model(®istry::STARTERS[0]); + let views = build_installed_views(vec![curated], 64 << 30); + assert_eq!( + views[0].context_length, + Some(registry::STARTERS[0].context_length) + ); + assert_eq!(views[0].mmproj_bytes, registry::STARTERS[0].mmproj_bytes); + assert_eq!( + views[0].origin, + Some(registry::STARTERS[0].origin.to_string()) + ); + } + + #[test] + fn gguf_file_row_serializes_with_flattened_base_and_fit() { + let file_row = HfGgufFileRow { + file: HfGgufFile { + file: "w.gguf".to_string(), + size_bytes: 42, + sha256: String::new(), + partial_bytes: None, + }, + fit: None, + installed: false, + }; + assert_eq!( + serde_json::to_value(file_row).unwrap(), + serde_json::json!({ + "file": "w.gguf", + "size_bytes": 42, + "sha256": "", + "partial_bytes": serde_json::Value::Null, + "fit": serde_json::Value::Null, + "installed": false, + }) + ); + } + + #[tokio::test] + async fn fetch_hf_search_returns_rows_and_sends_widened_query() { + let mut server = mockito::Server::new_async().await; + // The query no longer pins `pipeline_tag=text-generation` (that excluded + // multimodal `image-text-to-text` repos); chat-vs-non-chat is now an + // allowlist applied to each row's expanded `pipeline_tag`. The expand set + // carries the gguf block (context + chat template), the file list (mmproj + // -> vision), and the pipeline tag (the allowlist) in one request. + let mock = server + .mock("GET", "/api/models") + .match_query(mockito::Matcher::AllOf(vec![ + mockito::Matcher::UrlEncoded("filter".into(), "gguf".into()), + mockito::Matcher::UrlEncoded("search".into(), "qwen".into()), + mockito::Matcher::UrlEncoded("sort".into(), "downloads".into()), + mockito::Matcher::UrlEncoded("limit".into(), "60".into()), + // The widened query expands the gguf block, the file list, and + // the pipeline tag, and (critically) no longer pins + // `pipeline_tag=text-generation`, which had hidden vision repos. + mockito::Matcher::Regex("expand%5B%5D=gguf".into()), + mockito::Matcher::Regex("expand%5B%5D=siblings".into()), + mockito::Matcher::Regex("expand%5B%5D=pipeline_tag".into()), + ])) + .with_status(200) + .with_header("content-type", "application/json") + .with_body(search_fixture().to_string()) + .create_async() + .await; + let client = reqwest::Client::new(); + let page = fetch_hf_search(&client, &server.url(), "qwen", 60) + .await + .unwrap(); + mock.assert_async().await; + // Four chat rows survive the allowlist; the multimodal beta row proves + // the widened query surfaces a repo the old filter would have dropped. + assert_eq!(page.rows.len(), 4); + assert_eq!(page.rows[0].id, "org/alpha-GGUF"); + assert!(page.rows.iter().any(|r| r.id == "org/beta-GGUF")); + assert!(!page.has_more); + } + + #[tokio::test] + async fn fetch_hf_search_omits_blank_query() { + let mut server = mockito::Server::new_async().await; + let _m = server + .mock("GET", "/api/models") + .match_query(mockito::Matcher::Any) + .with_status(200) + .with_body("[]") + .create_async() + .await; + let client = reqwest::Client::new(); + // Whitespace-only query trims to empty and the search param is dropped. + let page = fetch_hf_search( + &client, + &server.url(), + " ", + crate::config::defaults::HF_SEARCH_LIMIT, + ) + .await + .unwrap(); + assert!(page.rows.is_empty()); + assert!(!page.has_more); + } + + #[tokio::test] + async fn fetch_hf_search_maps_http_error() { + let mut server = mockito::Server::new_async().await; + let _m = server + .mock("GET", "/api/models") + .match_query(mockito::Matcher::Any) + .with_status(503) + .create_async() + .await; + let client = reqwest::Client::new(); + let err = fetch_hf_search( + &client, + &server.url(), + "q", + crate::config::defaults::HF_SEARCH_LIMIT, + ) + .await + .unwrap_err(); + assert!(err.contains("503"), "got: {err}"); + } + + #[tokio::test] + async fn fetch_hf_search_maps_transport_error() { + let client = reqwest::Client::new(); + let err = fetch_hf_search( + &client, + "http://127.0.0.1:1", + "q", + crate::config::defaults::HF_SEARCH_LIMIT, + ) + .await + .unwrap_err(); + assert!(err.contains("failed to reach Hugging Face"), "got: {err}"); + } + + #[tokio::test] + async fn fetch_hf_search_rejects_overlong_query() { + let client = reqwest::Client::new(); + let long = "x".repeat(crate::config::defaults::MAX_HF_SEARCH_QUERY_LEN + 1); + let err = fetch_hf_search( + &client, + "http://127.0.0.1:9", + &long, + crate::config::defaults::HF_SEARCH_LIMIT, + ) + .await + .unwrap_err(); + assert!(err.contains("maximum length"), "got: {err}"); + } + + #[tokio::test] + async fn fetch_hf_search_inner_rejects_body_over_cap_via_content_length() { + let mut server = mockito::Server::new_async().await; + let _m = server + .mock("GET", "/api/models") + .match_query(mockito::Matcher::Any) + .with_status(200) + .with_body("x".repeat(100)) + .create_async() + .await; + let client = reqwest::Client::new(); + let err = fetch_hf_search_inner( + &client, + &server.url(), + "q", + std::time::Duration::from_secs(5), + 32, + 30, + ) + .await + .unwrap_err(); + assert!(err.contains("exceeded"), "got: {err}"); + } + + #[tokio::test] + async fn fetch_hf_search_inner_rejects_body_over_cap_when_chunked() { + // Chunked response (no Content-Length): the incremental cap must reject. + let listener = std::net::TcpListener::bind("127.0.0.1:0").unwrap(); + let addr = listener.local_addr().unwrap(); + std::thread::spawn(move || { + let (mut conn, _) = listener.accept().unwrap(); + use std::io::{Read, Write}; + let mut request_buf = [0u8; 1024]; + let _ = conn.read(&mut request_buf); + let _ = conn.write_all( + b"HTTP/1.1 200 OK\r\nTransfer-Encoding: chunked\r\n\r\n\ + 0a\r\n0123456789\r\n\ + 0a\r\n0123456789\r\n\ + 0a\r\n0123456789\r\n\ + 0\r\n\r\n", + ); + }); + let client = reqwest::Client::new(); + let base = format!("http://{addr}"); + let err = fetch_hf_search_inner( + &client, + &base, + "q", + std::time::Duration::from_secs(5), + 20, + 30, + ) + .await + .unwrap_err(); + assert!(err.contains("exceeded"), "got: {err}"); + } + + #[tokio::test] + async fn fetch_hf_search_inner_maps_body_read_error() { + // Headers promise 100 body bytes, then the server hangs up. + let listener = std::net::TcpListener::bind("127.0.0.1:0").unwrap(); + let addr = listener.local_addr().unwrap(); + std::thread::spawn(move || { + let (mut stream, _) = listener.accept().unwrap(); + use std::io::{Read, Write}; + let mut buf = [0u8; 1024]; + let _ = stream.read(&mut buf); + let _ = stream.write_all( + b"HTTP/1.1 200 OK\r\nContent-Type: application/json\r\nContent-Length: 100\r\nConnection: close\r\n\r\n", + ); + }); + let client = reqwest::Client::new(); + let base = format!("http://{addr}"); + let err = fetch_hf_search_inner( + &client, + &base, + "q", + std::time::Duration::from_secs(5), + 4 * 1024 * 1024, + 30, + ) + .await + .unwrap_err(); + assert!( + err.contains("failed to read Hugging Face search body"), + "got: {err}" + ); + } + + #[tokio::test] + async fn fetch_hf_search_inner_rejects_unparseable_base_url() { + let client = reqwest::Client::new(); + let err = fetch_hf_search_inner( + &client, + "not a url", + "q", + std::time::Duration::from_secs(5), + 4 * 1024 * 1024, + 30, + ) + .await + .unwrap_err(); + assert!(err.contains("failed to build"), "got: {err}"); + } + // ── Model library: repo spec/model mapping ─────────────────────────────── fn sample_resolved(with_mmproj: bool) -> RepoResolved { @@ -4256,6 +5764,9 @@ mod tests { assert_eq!(m.quant, "Q4_K_M"); assert!(m.vision); assert!(!m.thinking); + // Pasted rows record placeholder reasoning flags; the real class is + // resolved from the GGUF in finalize_install. + assert!(!m.reasoning_always); assert_eq!(m.mmproj_file.as_deref(), Some("mmproj-model-f16.gguf")); assert_eq!(m.mmproj_sha256.as_deref(), Some(&*"b".repeat(64))); @@ -4266,6 +5777,120 @@ mod tests { assert_eq!(m.mmproj_sha256, None); } + // ── Capability detection: thinking heuristic ───────────────────────────── + + #[test] + fn detect_thinking_matches_reasoning_self_labels() { + // A repo or file whose own name advertises reasoning. + assert!(detect_thinking("acme/Model-Thinking", "model.gguf")); + assert!(detect_thinking("acme/model", "model-reasoning-Q4_K_M.gguf")); + assert!(detect_thinking("acme/reasoner-7b", "w.gguf")); + } + + #[test] + fn detect_thinking_matches_known_reasoning_families() { + assert!(detect_thinking("deepseek-ai/DeepSeek-R1-GGUF", "x.gguf")); + assert!(detect_thinking("org/QwQ-32B-GGUF", "x.gguf")); + assert!(detect_thinking("ggml-org/gpt-oss-20b-GGUF", "x.gguf")); + assert!(detect_thinking("mistralai/Magistral-Small-GGUF", "x.gguf")); + } + + #[test] + fn detect_thinking_is_case_insensitive() { + assert!(detect_thinking("ORG/GPT-OSS-20B", "MODEL.GGUF")); + } + + #[test] + fn detect_thinking_defaults_false_without_markers() { + assert!(!detect_thinking( + "google/gemma-4-12b-it", + "gemma-4-12b-it-Q4_K_M.gguf" + )); + assert!(!detect_thinking("o/r", "w-Q4_K_M.gguf")); + } + + #[test] + fn repo_installed_model_flags_thinking_from_name() { + let m = repo_installed_model( + "ggml-org/gpt-oss-20b-GGUF", + "gpt-oss-20b-Q4_K_M.gguf", + &sample_resolved(false), + ); + assert!(m.thinking); + } + + // ── Reasoning-flag resolution helpers ──────────────────────────────────── + + #[test] + fn curated_reasoning_flags_match_every_starter() { + for s in registry::STARTERS { + assert_eq!( + curated_reasoning_flags(s.repo, s.file_name), + Some((s.thinking, s.reasoning_always)), + "curated flags must mirror the registry for {}", + s.repo + ); + } + } + + #[test] + fn curated_reasoning_flags_none_for_pasted_repo() { + assert_eq!(curated_reasoning_flags("nope/repo", "x.gguf"), None); + } + + #[test] + fn reasoning_flags_from_metadata_classify_from_template() { + // Optional family: thinking on, no badge. + assert_eq!( + reasoning_flags_from_metadata( + Some("{% if enable_thinking %}"), + Some("qwen3"), + "any/repo", + "x.gguf" + ), + (true, false) + ); + // Always family: thinking on, badge. + assert_eq!( + reasoning_flags_from_metadata(Some(""), None, "any/repo", "x.gguf"), + (true, true) + ); + // Non-reasoning: both off. + assert_eq!( + reasoning_flags_from_metadata(Some("plain instruct"), None, "any/repo", "x.gguf"), + (false, false) + ); + // A readable template wins over a reasoning-y name. + assert_eq!( + reasoning_flags_from_metadata(Some("plain instruct"), None, "org/QwQ-32B", "x.gguf"), + (false, false) + ); + } + + #[test] + fn reasoning_flags_from_metadata_falls_back_to_name_without_template() { + // No template: the name decides thinking; `reasoning_always` stays off + // for the runtime backstop. Marker in the repo, then in the file name. + assert_eq!( + reasoning_flags_from_metadata(None, None, "org/QwQ-32B", "x.gguf"), + (true, false) + ); + assert_eq!( + reasoning_flags_from_metadata(None, None, "org/plain", "model-reasoning.gguf"), + (true, false) + ); + // No template and no marker: both off. + assert_eq!( + reasoning_flags_from_metadata(None, None, "org/plain", "model.gguf"), + (false, false) + ); + // An empty template is treated as no template and falls back to the name. + assert_eq!( + reasoning_flags_from_metadata(Some(""), None, "org/QwQ-32B", "x.gguf"), + (true, false) + ); + } + // ── Model library: delete ──────────────────────────────────────────────── #[test] @@ -4308,15 +5933,16 @@ mod tests { std::fs::write(store.blob_path(&m.sha256), b"w").unwrap(); // A claimed download slot must refuse the delete and leave the row - // and blob untouched. - let _token = claim_download(&state).unwrap(); + // and blob untouched, even though the in-flight download is a different + // model: a finishing download could insert or share refcounted blobs. + let _token = claim_download(&state, "other-model", vec![]).unwrap(); let err = delete_installed_model_inner(&state, &conn, &store, &m.id, "").unwrap_err(); assert_eq!(err, "a download is already in progress"); assert!(manifest::get(&conn, &m.id).unwrap().is_some()); assert!(store.blob_path(&m.sha256).exists()); // Releasing the slot lets the delete proceed. - release_download(&state); + release_download(&state, "other-model"); assert!(delete_installed_model_inner(&state, &conn, &store, &m.id, "").is_ok()); } @@ -4340,7 +5966,7 @@ mod tests { // ── Model library: discard partial ─────────────────────────────────────── #[test] - fn discard_partial_validates_hex_and_running_state() { + fn discard_partial_validates_hex_and_scopes_to_the_target_sha() { let (_dir, store) = make_store(); let state = DownloadState::default(); let sha = "a".repeat(64); @@ -4349,14 +5975,25 @@ mod tests { assert!(discard_partial_inner(&state, &store, "short").is_err()); assert!(discard_partial_inner(&state, &store, &"Z".repeat(64)).is_err()); - // Rejected while a download is claimed. - let _token = claim_download(&state).unwrap(); + // A download in flight for a DIFFERENT blob does not block discarding + // this paused partial: parallel downloads each own only their own shas, + // so an unrelated active download never touches this file. + std::fs::write(store.partial_path(&sha), b"bytes").unwrap(); + let _other = claim_download(&state, "other-model", vec!["c".repeat(64)]).unwrap(); + discard_partial_inner(&state, &store, &sha).unwrap(); + assert!(!store.partial_path(&sha).exists()); + release_download(&state, "other-model"); + + // A download in flight that IS writing this blob blocks the discard (it + // would unlink the partial out from under the active writer, failing its + // verification with NotFound). + std::fs::write(store.partial_path(&sha), b"bytes").unwrap(); + let _this = claim_download(&state, "this-model", vec![sha.clone()]).unwrap(); let err = discard_partial_inner(&state, &store, &sha).unwrap_err(); assert!(err.contains("in progress"), "got: {err}"); - release_download(&state); + release_download(&state, "this-model"); // Removes an existing partial; a missing partial is fine (idempotent). - std::fs::write(store.partial_path(&sha), b"bytes").unwrap(); discard_partial_inner(&state, &store, &sha).unwrap(); assert!(!store.partial_path(&sha).exists()); discard_partial_inner(&state, &store, &sha).unwrap(); diff --git a/src-tauri/src/models/reasoning.rs b/src-tauri/src/models/reasoning.rs new file mode 100644 index 00000000..4fabce4c --- /dev/null +++ b/src-tauri/src/models/reasoning.rs @@ -0,0 +1,299 @@ +/*! + * Dynamic reasoning-capability classifier for locally-run GGUF models. + * + * Thuki must behave correctly for ANY model a user downloads, not just the + * three curated starters whose class is baked into the registry. The single + * authoritative signal a GGUF carries about whether (and how) it reasons is + * its embedded chat template ([`tokenizer.chat_template`]); the template's + * markers tell us which reasoning family a model belongs to. + * + * This module is the pure, side-effect-free heart of the classifier: + * [`classify_reasoning`] maps a chat-template string (plus the optional + * `general.architecture`) onto one of three classes. The byte-level template + * extraction lives in [`crate::models::gguf`]; persistence and the runtime + * behavioral backstop live in [`crate::models`] / [`crate::commands`]. + * + * The three classes mirror the convergent industry taxonomy (OpenRouter + * `mandatory`, Ollama `thinking` capability, vLLM per-family parsers): + * + * - [`ReasoningClass::None`] — not a reasoning model. `/think` is a no-op, no + * thinking block, no badge. + * - [`ReasoningClass::Optional`] — reasoning can be turned off. Thuki defaults + * it OFF (the OFF blast in [`crate::openai`] suppresses it) and `/think` + * turns it on per-message. No badge. + * - [`ReasoningClass::Always`] — reasoning is structural and cannot be turned + * off. Thuki shows it cleanly and badges the model so the latency is not a + * surprise; `/think` is a harmless no-op. + */ + +/// Marker present in gpt-oss / Harmony templates: reasoning rides the +/// `analysis` channel, which is structural and cannot be disabled. +const MARKER_HARMONY_CHANNEL: &str = "<|channel|>"; + +/// GGUF `general.architecture` value for gpt-oss / Harmony models. Used as a +/// belt-and-suspenders signal alongside [`MARKER_HARMONY_CHANNEL`] so the +/// curated Smartest starter (gpt-oss) classifies as `Always` even if a GGUF +/// variant lays its channel markup out differently than expected. +const ARCH_GPT_OSS: &str = "gpt-oss"; + +/// The literal word that every "reasoning can be disabled" family threads +/// through its template, whether as a kwarg (`enable_thinking`, +/// `thinking_budget`) or a bare Jinja variable (`thinking`). Crucially the +/// always-on tag families spell their tags `` / `` / +/// `` (no `ing`), so the presence of the whole word `thinking` +/// is what separates "has an off switch" from "always reasons". +const MARKER_THINKING_KWARG: &str = "thinking"; + +/// Mistral Magistral / Ministral reasoning tags. Reasoning is driven by a +/// system-prompt instruction rather than a template kwarg, so without Thuki's +/// (absent) reasoning system prompt these models stay quiet: treated as +/// `Optional` (default off), not `Always`. +const MARKER_MISTRAL_THINK_OPEN: &str = "[THINK]"; +const MARKER_MISTRAL_THINK_CLOSE: &str = "[/THINK]"; + +/// Always-on reasoning tags: a template that hard-opens one of these on the +/// assistant turn and offers no off switch always reasons (DeepSeek-R1 and +/// distills, QwQ, EXAONE-Deep, MiniMax-M2, Phi-4-reasoning, Seed-OSS variants +/// without a budget kwarg). Checked only AFTER the off-switch word, so a +/// family that ships both a tag and a kwarg (e.g. Seed-OSS `` + +/// `thinking_budget`) is correctly classified `Optional`. +const ALWAYS_TAGS: &[&str] = &[ + "", + "", + "", + "", + "", +]; + +/// How a model reasons, derived from its chat template. See the module docs +/// for the behavior each class drives. +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +pub enum ReasoningClass { + /// Not a reasoning model. + None, + /// Reasoning can be turned off; Thuki defaults it off. + Optional, + /// Reasoning is structural and cannot be turned off. + Always, +} + +impl ReasoningClass { + /// Projects the class onto the two manifest capability flags Thuki + /// persists and surfaces: `(thinking, reasoning_always)`. + /// + /// - `None` -> `(false, false)`: no thinking block, no badge. + /// - `Optional` -> `(true, false)`: thinking available, no badge. + /// - `Always` -> `(true, true )`: thinking shown, badge. + pub fn flags(self) -> (bool, bool) { + match self { + ReasoningClass::None => (false, false), + ReasoningClass::Optional => (true, false), + ReasoningClass::Always => (true, true), + } + } +} + +/// Classifies a model's reasoning capability from its chat template and +/// optional `general.architecture`, applying the family markers most-specific +/// first: +/// +/// 1. gpt-oss / Harmony (`<|channel|>` or `gpt-oss` architecture) -> `Always`. +/// 2. An off-switch word (`enable_thinking` / `thinking` / `thinking_budget`) +/// anywhere in the template -> `Optional` (the OFF blast controls it). +/// 3. Mistral `[THINK]` / `[/THINK]` tags -> `Optional` (system-prompt +/// driven; quiet without Thuki's reasoning prompt). +/// 4. An always-on reasoning tag (`` / `` / ``) +/// with no off switch -> `Always`. +/// 5. No reasoning markers at all -> `None`. +/// +/// Never panics: any input (empty, binary garbage decoded as text, a template +/// from a future family) resolves to one of the three classes. When the +/// template scan is wrong for an `Always` model, the runtime behavioral +/// backstop self-corrects from real output, so this fast path only needs to +/// be right for the common families. +pub fn classify_reasoning(chat_template: &str, architecture: Option<&str>) -> ReasoningClass { + let arch_is_gpt_oss = architecture + .map(|a| { + let lower = a.to_ascii_lowercase(); + lower.contains(ARCH_GPT_OSS) || lower.contains("gptoss") + }) + .unwrap_or(false); + + // 1. gpt-oss / Harmony: highest-signal, structural reasoning channel. + if chat_template.contains(MARKER_HARMONY_CHANNEL) || arch_is_gpt_oss { + return ReasoningClass::Always; + } + + // 2. Any "off switch" word means the model reads a disable signal and the + // OFF blast already controls it. Covers `enable_thinking`, + // `thinking_budget`, and a bare `thinking` Jinja variable in one check, + // because the always-on tag families never spell the whole word. + if chat_template.contains(MARKER_THINKING_KWARG) { + return ReasoningClass::Optional; + } + + // 3. Mistral reasoning is system-prompt driven, not template-gated, so it + // is quiet by default under Thuki and treated as optional. + if chat_template.contains(MARKER_MISTRAL_THINK_OPEN) + || chat_template.contains(MARKER_MISTRAL_THINK_CLOSE) + { + return ReasoningClass::Optional; + } + + // 4. A reasoning tag with no off switch: the model always reasons. + if ALWAYS_TAGS.iter().any(|tag| chat_template.contains(tag)) { + return ReasoningClass::Always; + } + + // 5. No markers: not a reasoning model. + ReasoningClass::None +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn flags_map_each_class() { + assert_eq!(ReasoningClass::None.flags(), (false, false)); + assert_eq!(ReasoningClass::Optional.flags(), (true, false)); + assert_eq!(ReasoningClass::Always.flags(), (true, true)); + } + + // ── Always: gpt-oss / Harmony ──────────────────────────────────────────── + + #[test] + fn gpt_oss_channel_marker_is_always() { + let t = "<|start|>system<|message|>...<|channel|>analysis<|message|>..."; + assert_eq!(classify_reasoning(t, None), ReasoningClass::Always); + } + + #[test] + fn gpt_oss_architecture_is_always_even_without_channel_marker() { + // A gpt-oss GGUF whose template the scan does not recognize still + // classifies Always from the architecture tiebreak. + assert_eq!( + classify_reasoning("{{ messages }}", Some("gpt-oss")), + ReasoningClass::Always + ); + assert_eq!( + classify_reasoning("", Some("GptOss")), + ReasoningClass::Always + ); + } + + // ── Always: tag families with no off switch ────────────────────────────── + + #[test] + fn deepseek_r1_hard_open_think_is_always() { + // R1 hard-opens after the assistant marker and reads no kwarg. + let t = "{{'<|Assistant|>'}}\\n"; + assert_eq!(classify_reasoning(t, None), ReasoningClass::Always); + } + + #[test] + fn qwq_think_tag_qwen2_is_always() { + let t = "<|im_start|>assistant\\n\\n"; + assert_eq!(classify_reasoning(t, Some("qwen2")), ReasoningClass::Always); + } + + #[test] + fn exaone_deep_thought_tag_is_always() { + let t = "<|assistant|>\\n"; + assert_eq!(classify_reasoning(t, None), ReasoningClass::Always); + } + + #[test] + fn closing_think_tag_alone_is_always() { + // Some templates only carry the closing tag in a prefill branch. + assert_eq!( + classify_reasoning("......", None), + ReasoningClass::Always + ); + assert_eq!( + classify_reasoning("......", None), + ReasoningClass::Always + ); + } + + // ── Optional: off-switch kwarg / variable families ─────────────────────── + + #[test] + fn qwen3_enable_thinking_is_optional() { + let t = "{%- if enable_thinking %}{% endif %}"; + assert_eq!( + classify_reasoning(t, Some("qwen3")), + ReasoningClass::Optional + ); + } + + #[test] + fn glm_enable_thinking_is_optional() { + let t = "<|assistant|>{% if enable_thinking %}...{% endif %}"; + assert_eq!(classify_reasoning(t, None), ReasoningClass::Optional); + } + + #[test] + fn granite_thinking_variable_is_optional() { + let t = "<|start_of_role|>{% if thinking %}...{% endif %}"; + assert_eq!(classify_reasoning(t, None), ReasoningClass::Optional); + } + + #[test] + fn deepseek_v31_thinking_branch_is_optional() { + let t = "{{'<|Assistant|>'}}{% if thinking %}{% else %}{% endif %}"; + assert_eq!(classify_reasoning(t, None), ReasoningClass::Optional); + } + + #[test] + fn seed_oss_budget_kwarg_wins_over_its_tag() { + // Seed-OSS ships both AND thinking_budget; the budget + // (off switch) must win so it is Optional, not Always. + let t = "{{ thinking_budget }}"; + assert_eq!(classify_reasoning(t, None), ReasoningClass::Optional); + } + + #[test] + fn mistral_bracket_think_is_optional() { + // Magistral reasoning is system-prompt driven; quiet by default. + assert_eq!( + classify_reasoning("...[THINK]...", None), + ReasoningClass::Optional + ); + assert_eq!( + classify_reasoning("...[/THINK]...", None), + ReasoningClass::Optional + ); + } + + // ── None: plain instruct models ────────────────────────────────────────── + + #[test] + fn gemma_plain_instruct_is_none() { + let t = "user\\n{{ content }}"; + assert_eq!(classify_reasoning(t, Some("gemma3")), ReasoningClass::None); + } + + #[test] + fn empty_template_is_none() { + assert_eq!(classify_reasoning("", None), ReasoningClass::None); + } + + #[test] + fn arch_without_markers_does_not_force_a_class() { + // A non-gpt-oss architecture with no template markers stays None: the + // architecture only tiebreaks the gpt-oss case. + assert_eq!( + classify_reasoning("{{ messages }}", Some("llama")), + ReasoningClass::None + ); + } + + #[test] + fn channel_marker_beats_a_later_thinking_word() { + // Ordering guard: a Harmony template that also happens to mention the + // word "thinking" still classifies Always (channel checked first). + let t = "<|channel|>analysis ... enable_thinking"; + assert_eq!(classify_reasoning(t, None), ReasoningClass::Always); + } +} diff --git a/src-tauri/src/models/registry.rs b/src-tauri/src/models/registry.rs index e591765a..bc5615a2 100644 --- a/src-tauri/src/models/registry.rs +++ b/src-tauri/src/models/registry.rs @@ -1,15 +1,21 @@ /*! - * Curated starter model registry for the built-in llama.cpp engine. + * Curated model registry for the built-in llama.cpp engine. * - * Three tiers (Fast / Balanced / Smartest) cover the RAM spectrum of Apple - * Silicon Macs. Every entry pins a Hugging Face repo at an exact git revision - * and carries the SHA-256 of each blob, so a starter download is reproducible - * and verifiable end to end (the digests feed straight into - * [`crate::models::download::DownloadSpec`] which verifies them on install). + * This is the Staff Picks catalog: a small, deeply-vetted set of models grouped + * into use-case sections (Everyday chat / Compact & fast / Deep reasoning). + * Three of the entries double as the onboarding heroes (one per tier, see + * [`ONBOARDING_HERO_IDS`]); the rest exist only in the catalog. Every entry + * pins a Hugging Face repo at an exact git revision and carries the SHA-256 of + * each blob, so a download is reproducible and verifiable end to end (the + * digests feed straight into [`crate::models::download::DownloadSpec`] which + * verifies them on install). Provenance comes from the pinned revision and a + * trusted GGUF source (the maker's own repo, `unsloth`, `bartowski`, or + * `ggml-org`); the SHA-256 is an integrity check only. * * Hashes and sizes were read from the Hugging Face tree-at-revision API - * (`/api/models//tree/`) on 2026-06-17, so each digest - * matches the pinned commit, not whatever `main` later points to. + * (`/api/models//tree/`): the three heroes on 2026-06-17, the + * rest of the catalog on 2026-06-20, so each digest matches its pinned commit, + * not whatever `main` later points to. */ use crate::config::defaults::HF_BASE_URL; @@ -29,8 +35,23 @@ pub enum Tier { /// need, baked in at compile time. #[derive(Debug, Clone, serde::Serialize, PartialEq)] pub struct Starter { - /// Which speed/quality tier this entry fills. + /// Stable slug, unique across the registry (e.g. `"gemma-4-12b"`). The + /// download key and the React row key for the Staff Picks catalog, where a + /// single category can hold many models. Onboarding keys on `tier` instead + /// and shows only the three [`ONBOARDING_HERO_IDS`] heroes. + pub id: &'static str, + /// Coarse speed/quality dial for the model. Onboarding's 3-up comparison + /// shows one hero per tier; in the Staff Picks catalog several entries can + /// share a tier, so it is a size/speed hint there, not a unique key. pub tier: Tier, + /// Model family this entry belongs to (e.g. "Gemma", "Qwen", "gpt-oss"). + /// Several starters can share a family when the catalog offers more than one + /// size of the same model. + pub family: &'static str, + /// Use-case section the Discover staff-picks list groups this entry under + /// (e.g. "Everyday chat", "Compact & fast", "Deep reasoning"). Answers + /// "what is it for?" in plain words so a non-expert can pick by intent. + pub category: &'static str, /// Human-readable label shown in the picker (e.g. "Gemma 4 12B"). pub display_name: &'static str, /// Hugging Face repo slug. @@ -49,6 +70,10 @@ pub struct Starter { pub vision: bool, /// Whether the model emits a thinking/scratchpad token stream. pub thinking: bool, + /// Whether the model's reasoning cannot be turned off (it always reasons). + /// `true` only for structurally-always-on families (e.g. gpt-oss/Harmony); + /// `false` when reasoning is optional (the default-off path) or absent. + pub reasoning_always: bool, /// Vision projection file name, when the model is multimodal. pub mmproj_file: Option<&'static str>, /// Lowercase hex SHA-256 of the mmproj blob, when present. @@ -61,6 +86,12 @@ pub struct Starter { /// sliding-window-aware cache). Sanity-check any new entry against a /// real load before trusting the estimate. pub est_runtime_gb: f64, + /// Maximum context window in tokens the model was trained for: its GGUF + /// `context_length` metadata (llama.cpp's `n_ctx_train`), vetted against the + /// maker's published config. Surfaced in the picker so a user can see how + /// much a model can attend to. Display only: the engine loads the user's + /// separate, clamped `num_ctx`, never this value. + pub context_length: u32, /// Short license label surfaced next to the download button. pub license_note: &'static str, /// Model maker (e.g. "OpenAI"), shown in the picker's Origin row. @@ -74,7 +105,10 @@ pub struct Starter { /// The curated starters, ordered Fast, Balanced, Smartest. pub const STARTERS: &[Starter] = &[ Starter { + id: "qwen3.5-9b", tier: Tier::Fast, + family: "Qwen", + category: "Everyday chat", display_name: "Qwen3.5 9B", repo: "unsloth/Qwen3.5-9B-GGUF", revision: "3885219b6810b007914f3a7950a8d1b469d598a5", @@ -83,17 +117,22 @@ pub const STARTERS: &[Starter] = &[ size_bytes: 5_680_522_464, quant: "Q4_K_M", vision: true, - thinking: false, + thinking: true, + reasoning_always: false, mmproj_file: Some("mmproj-BF16.gguf"), mmproj_sha256: Some("853698ce7aa6c7ba732478bad280240969ddf7b0fcbf93900046f63903a83383"), mmproj_bytes: 921_705_024, est_runtime_gb: 8.5, + context_length: 262_144, license_note: "Apache 2.0", origin: "Alibaba", origin_repo: "Qwen/Qwen3.5-9B", }, Starter { + id: "gemma-4-12b", tier: Tier::Balanced, + family: "Gemma", + category: "Everyday chat", display_name: "Gemma 4 12B", repo: "google/gemma-4-12B-it-qat-q4_0-gguf", revision: "f6e7774e6148da3b7f201e42ba37cf084c1db35f", @@ -103,16 +142,21 @@ pub const STARTERS: &[Starter] = &[ quant: "Q4_0", vision: true, thinking: false, + reasoning_always: false, mmproj_file: Some("mmproj-gemma-4-12b-it-qat-q4_0.gguf"), mmproj_sha256: Some("e70b0e5cd80323d5d588b4ed06780356b7b1ba03995a4b8164c6ae9db0ff5989"), mmproj_bytes: 175_115_264, est_runtime_gb: 9.5, + context_length: 262_144, license_note: "Apache 2.0", origin: "Google", origin_repo: "google/gemma-4-12B-it", }, Starter { + id: "gpt-oss-20b", tier: Tier::Smartest, + family: "gpt-oss", + category: "Deep reasoning", display_name: "gpt-oss 20B", repo: "ggml-org/gpt-oss-20b-GGUF", revision: "e1dc459feff949ff451ce107337a2026daa80df8", @@ -121,17 +165,201 @@ pub const STARTERS: &[Starter] = &[ size_bytes: 12_109_566_560, quant: "MXFP4", vision: false, - thinking: false, + thinking: true, + reasoning_always: true, mmproj_file: None, mmproj_sha256: None, mmproj_bytes: 0, est_runtime_gb: 13.3, + context_length: 131_072, license_note: "Apache 2.0", origin: "OpenAI", origin_repo: "openai/gpt-oss-20b", }, + // ── Everyday chat ────────────────────────────────────────────────────── + Starter { + id: "mistral-nemo-12b", + tier: Tier::Balanced, + family: "Mistral", + category: "Everyday chat", + display_name: "Mistral Nemo 12B", + repo: "bartowski/Mistral-Nemo-Instruct-2407-GGUF", + revision: "a2dd64a0a76ea1bdb2bb6ab6fa5496b003c7c908", + file_name: "Mistral-Nemo-Instruct-2407-Q4_K_M.gguf", + sha256: "7c1a10d202d8788dbe5628dc962254d10654c853cae6aaeca0618f05490d4a46", + size_bytes: 7_477_208_192, + quant: "Q4_K_M", + vision: false, + thinking: false, + reasoning_always: false, + mmproj_file: None, + mmproj_sha256: None, + mmproj_bytes: 0, + est_runtime_gb: 9.9, + context_length: 131_072, + license_note: "Apache 2.0", + origin: "Mistral", + origin_repo: "mistralai/Mistral-Nemo-Instruct-2407", + }, + // ── Compact & fast ───────────────────────────────────────────────────── + Starter { + id: "phi-4-mini-3.8b", + tier: Tier::Fast, + family: "Phi", + category: "Compact & fast", + display_name: "Phi-4 Mini 3.8B", + repo: "unsloth/Phi-4-mini-instruct-GGUF", + revision: "78eb92a46fc37e6b524df991ed9aca9bc6aa7b80", + file_name: "Phi-4-mini-instruct-Q4_K_M.gguf", + sha256: "88c00229914083cd112853aab84ed51b87bdf6b9ce42f532d8c85c7c63b1730a", + size_bytes: 2_491_874_272, + quant: "Q4_K_M", + vision: false, + thinking: false, + reasoning_always: false, + mmproj_file: None, + mmproj_sha256: None, + mmproj_bytes: 0, + est_runtime_gb: 4.7, + context_length: 131_072, + license_note: "MIT", + origin: "Microsoft", + origin_repo: "microsoft/Phi-4-mini-instruct", + }, + Starter { + id: "llama-3.2-3b", + tier: Tier::Fast, + family: "Llama", + category: "Compact & fast", + display_name: "Llama 3.2 3B", + repo: "bartowski/Llama-3.2-3B-Instruct-GGUF", + revision: "5ab33fa94d1d04e903623ae72c95d1696f09f9e8", + file_name: "Llama-3.2-3B-Instruct-Q4_K_M.gguf", + sha256: "6c1a2b41161032677be168d354123594c0e6e67d2b9227c84f296ad037c728ff", + size_bytes: 2_019_377_696, + quant: "Q4_K_M", + vision: false, + thinking: false, + reasoning_always: false, + mmproj_file: None, + mmproj_sha256: None, + mmproj_bytes: 0, + est_runtime_gb: 4.0, + context_length: 131_072, + license_note: "Llama 3.2 Community", + origin: "Meta", + origin_repo: "meta-llama/Llama-3.2-3B-Instruct", + }, + Starter { + id: "gemma-4-e4b", + tier: Tier::Fast, + family: "Gemma", + category: "Compact & fast", + display_name: "Gemma 4 E4B", + repo: "google/gemma-4-E4B-it-qat-q4_0-gguf", + revision: "bb3b92e6f031fa438b409f898dd9f14f499a0cb0", + file_name: "gemma-4-E4B_q4_0-it.gguf", + sha256: "e8b6a059ba86947a44ace84d6e5679795bc41862c25c30513142588f0e9dba1d", + size_bytes: 5_154_939_136, + quant: "Q4_0", + vision: true, + thinking: false, + reasoning_always: false, + mmproj_file: Some("gemma-4-E4B-it-mmproj.gguf"), + mmproj_sha256: Some("c6398448d84a4836fdedf58f9775979e69ae0cc4dfdf4d697b5597693a555b12"), + mmproj_bytes: 991_551_904, + est_runtime_gb: 7.4, + context_length: 131_072, + license_note: "Gemma", + origin: "Google", + origin_repo: "google/gemma-4-E4B-it", + }, + // ── Deep reasoning ───────────────────────────────────────────────────── + Starter { + id: "phi-4-reasoning-plus-14b", + tier: Tier::Smartest, + family: "Phi", + category: "Deep reasoning", + display_name: "Phi-4 Reasoning Plus 14B", + repo: "unsloth/Phi-4-reasoning-plus-GGUF", + revision: "80fff8542dc7b88dba725b660beefd80e91e80c9", + file_name: "Phi-4-reasoning-plus-Q4_K_M.gguf", + sha256: "faf720745e20df40f52ee218be14c72b33070f7aacc508b3fbc61d47f32b4ffe", + size_bytes: 9_053_117_120, + quant: "Q4_K_M", + vision: false, + thinking: true, + reasoning_always: true, + mmproj_file: None, + mmproj_sha256: None, + mmproj_bytes: 0, + est_runtime_gb: 12.0, + context_length: 32_768, + license_note: "MIT", + origin: "Microsoft", + origin_repo: "microsoft/Phi-4-reasoning-plus", + }, + Starter { + id: "deepseek-r1-distill-8b", + tier: Tier::Balanced, + family: "DeepSeek", + category: "Deep reasoning", + display_name: "DeepSeek-R1 Distill 8B", + repo: "unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF", + revision: "615f8936e16dfde29dcc00be71145d4d5ce8ed53", + file_name: "DeepSeek-R1-Distill-Llama-8B-Q4_K_M.gguf", + sha256: "0addb1339a82385bcd973186cd80d18dcc71885d45eabd899781a118d03827d9", + size_bytes: 4_920_737_216, + quant: "Q4_K_M", + vision: false, + thinking: true, + reasoning_always: true, + mmproj_file: None, + mmproj_sha256: None, + mmproj_bytes: 0, + est_runtime_gb: 7.0, + context_length: 131_072, + license_note: "MIT", + origin: "DeepSeek", + origin_repo: "deepseek-ai/DeepSeek-R1-Distill-Llama-8B", + }, ]; +/// Ids of the three onboarding hero starters, in tier order +/// (Fast, Balanced, Smartest). Onboarding's 3-up comparison selects exactly +/// these by id; the Staff Picks catalog may hold any number of other entries +/// without disturbing the onboarding heroes. +pub const ONBOARDING_HERO_IDS: [&str; 3] = ["qwen3.5-9b", "gemma-4-12b", "gpt-oss-20b"]; + +/// The registry entry with this id, if any. The id-keyed download path and the +/// onboarding-hero lookup both resolve entries through here, so a bad id yields +/// `None` rather than a panic. +pub fn by_id(id: &str) -> Option<&'static Starter> { + STARTERS.iter().find(|s| s.id == id) +} + +/// The registry entry matching this repo + weights file name, if any. An +/// installed model heals its curated facts (capabilities, context window) from +/// the registry through here, so a later flag or pin correction reaches models +/// downloaded before it. A pasted (non-curated) repo has no entry and yields +/// `None`. +pub fn by_repo_file(repo: &str, file_name: &str) -> Option<&'static Starter> { + STARTERS + .iter() + .find(|s| s.repo == repo && s.file_name == file_name) +} + +/// The three onboarding hero starters, resolved from [`ONBOARDING_HERO_IDS`] in +/// tier order. Any id that is absent from the registry is skipped, so the +/// result is the heroes that actually exist; a registry test asserts all three +/// resolve, so in practice the list is always length three. +pub fn onboarding_heroes() -> Vec<&'static Starter> { + ONBOARDING_HERO_IDS + .iter() + .filter_map(|id| by_id(id)) + .collect() +} + /// RAM-fit hint rendered as a badge on each starter row. #[derive(Debug, Clone, Copy, PartialEq, serde::Serialize)] #[serde(rename_all = "snake_case")] @@ -177,10 +405,18 @@ pub fn download_specs(s: &Starter) -> Vec { specs } -/// Manifest row for an installed starter. id = `":"`. +/// The manifest-row id for a starter: `":"`. The single source +/// of truth for how a curated entry maps onto its installed-manifest key, so the +/// installed-state probe can resolve the id without building a whole +/// [`InstalledModel`] just to read one field. +pub fn installed_model_id(s: &Starter) -> String { + format!("{}:{}", s.repo, s.file_name) +} + +/// Manifest row for an installed starter. id = [`installed_model_id`]. pub fn to_installed_model(s: &Starter) -> InstalledModel { InstalledModel { - id: format!("{}:{}", s.repo, s.file_name), + id: installed_model_id(s), display_name: s.display_name.to_string(), repo: s.repo.to_string(), revision: s.revision.to_string(), @@ -190,6 +426,7 @@ pub fn to_installed_model(s: &Starter) -> InstalledModel { quant: s.quant.to_string(), vision: s.vision, thinking: s.thinking, + reasoning_always: s.reasoning_always, mmproj_file: s.mmproj_file.map(str::to_string), mmproj_sha256: s.mmproj_sha256.map(str::to_string), } @@ -206,17 +443,267 @@ mod tests { s.len() == len && s.bytes().all(|b| matches!(b, b'0'..=b'9' | b'a'..=b'f')) } + /// Resolves the onboarding hero for a tier by id, not by find-first-of-tier: + /// the catalog can hold several entries of the same tier, so only the hero + /// ids identify the three onboarding models unambiguously. fn starter(tier: Tier) -> &'static Starter { - STARTERS.iter().find(|s| s.tier == tier).unwrap() + let idx = match tier { + Tier::Fast => 0, + Tier::Balanced => 1, + Tier::Smartest => 2, + }; + by_id(ONBOARDING_HERO_IDS[idx]).unwrap() + } + + #[test] + fn blob_shas_are_unique_across_entries() { + // Parallel downloads rely on no two catalog entries sharing a blob: the + // content-addressed store would otherwise see two concurrent writers to + // the same `tmp/.partial`. If a future entry legitimately shares a + // blob (e.g. a common mmproj companion), add per-sha download + // serialization before relaxing this guard. See `DownloadState` docs. + let mut seen = std::collections::HashSet::new(); + for s in STARTERS { + assert!( + seen.insert(s.sha256), + "duplicate weights sha256: {}", + s.sha256 + ); + if let Some(mmproj) = s.mmproj_sha256 { + assert!(seen.insert(mmproj), "duplicate blob sha256: {mmproj}"); + } + } + } + + #[test] + fn ids_are_present_and_unique() { + // The Staff Picks catalog and the id-keyed download path key on `id`, + // so every entry needs a non-empty slug and no two may collide. + let mut seen = std::collections::HashSet::new(); + for s in STARTERS { + assert!(!s.id.is_empty(), "{}: id is empty", s.repo); + assert!(seen.insert(s.id), "duplicate id: {}", s.id); + } + } + + #[test] + fn by_repo_file_matches_repo_and_weights_file() { + // Heals an installed model's curated facts from the registry: it matches + // on repo + weights file, and misses when either differs. + let s = &STARTERS[0]; + assert_eq!(by_repo_file(s.repo, s.file_name).unwrap().id, s.id); + assert!(by_repo_file(s.repo, "other.gguf").is_none()); + assert!(by_repo_file("other/repo", s.file_name).is_none()); + } + + #[test] + fn by_id_resolves_present_and_misses_unknown() { + // by_id finds a present entry and returns None for an unknown slug, + // so the lookup never panics on a bad id. + assert_eq!(by_id(STARTERS[0].id).unwrap().id, STARTERS[0].id); + assert!(by_id("no-such-model").is_none()); } #[test] - fn three_tiers_present() { - assert_eq!(STARTERS.len(), 3); + fn onboarding_heroes_are_three_in_tier_order() { + // The onboarding picker shows exactly three heroes, one per tier, in + // Fast/Balanced/Smartest order; each id resolves to a real entry. + assert_eq!(ONBOARDING_HERO_IDS.len(), 3); + let heroes = onboarding_heroes(); + assert_eq!(heroes.len(), 3); assert_eq!( - STARTERS.iter().map(|s| s.tier).collect::>(), + heroes.iter().map(|s| s.tier).collect::>(), vec![Tier::Fast, Tier::Balanced, Tier::Smartest] ); + for id in ONBOARDING_HERO_IDS { + assert!(by_id(id).is_some(), "hero id missing from registry: {id}"); + } + } + + #[test] + fn catalog_is_the_vetted_models_grouped_by_category() { + // The curated Staff Picks catalog: nine deeply-vetted models, exactly + // three per use-case section the Discover surface renders. The three + // onboarding heroes are among them, so a model downloaded during + // onboarding shows up here as Installed with no duplicate row. Locks the + // exact set so a stray add/remove is a deliberate, reviewed change. + use std::collections::BTreeMap; + let mut by_cat: BTreeMap<&str, Vec<&str>> = BTreeMap::new(); + for s in STARTERS { + by_cat.entry(s.category).or_default().push(s.id); + } + for v in by_cat.values_mut() { + v.sort_unstable(); + } + let mut expected: BTreeMap<&str, Vec<&str>> = BTreeMap::new(); + expected.insert( + "Everyday chat", + vec!["gemma-4-12b", "mistral-nemo-12b", "qwen3.5-9b"], + ); + expected.insert( + "Compact & fast", + vec!["gemma-4-e4b", "llama-3.2-3b", "phi-4-mini-3.8b"], + ); + expected.insert( + "Deep reasoning", + vec![ + "deepseek-r1-distill-8b", + "gpt-oss-20b", + "phi-4-reasoning-plus-14b", + ], + ); + for v in expected.values_mut() { + v.sort_unstable(); + } + assert_eq!(by_cat, expected); + } + + #[test] + fn context_windows_match_the_vetted_values() { + // The model's trained max context (GGUF `context_length`), vetted per + // entry against the maker's config; Mistral Nemo is corrected from its + // GGUF's inflated 1,024,000 down to its real 131,072. + let want: &[(&str, u32)] = &[ + ("qwen3.5-9b", 262_144), + ("gemma-4-12b", 262_144), + ("mistral-nemo-12b", 131_072), + ("phi-4-mini-3.8b", 131_072), + ("llama-3.2-3b", 131_072), + ("gemma-4-e4b", 131_072), + ("gpt-oss-20b", 131_072), + ("phi-4-reasoning-plus-14b", 32_768), + ("deepseek-r1-distill-8b", 131_072), + ]; + for (id, ctx) in want { + assert_eq!( + by_id(id).unwrap().context_length, + *ctx, + "{id} context window" + ); + } + } + + #[test] + fn every_entry_has_a_sane_context_window() { + // Display-only trained max; a floor/ceiling guards against a typo and + // documents that the value is bounded. The real KV allocation is the + // user's separate, clamped `num_ctx`, never this number. + for s in STARTERS { + assert!( + (2048..=1_048_576).contains(&s.context_length), + "{}: context_length {} out of sane range", + s.id, + s.context_length + ); + } + } + + #[test] + fn every_category_holds_exactly_three_models() { + // The Discover surface is balanced: nine models, exactly three per + // use-case section, so no section dwarfs the others. + use std::collections::BTreeMap; + let mut counts: BTreeMap<&str, usize> = BTreeMap::new(); + for s in STARTERS { + *counts.entry(s.category).or_default() += 1; + } + assert_eq!(STARTERS.len(), 9, "catalog should hold nine models"); + for (category, n) in counts { + assert_eq!( + n, 3, + "category {category} should hold exactly three, has {n}" + ); + } + } + + #[test] + fn every_entry_carries_origin_and_license() { + for s in STARTERS { + assert!(!s.license_note.is_empty(), "{}: empty license", s.id); + assert!(!s.origin.is_empty(), "{}: empty origin", s.id); + assert!(!s.display_name.is_empty(), "{}: empty display_name", s.id); + assert!(!s.family.is_empty(), "{}: empty family", s.id); + // origin_repo is an "org/name" slug the picker turns into an HF URL. + assert_eq!( + s.origin_repo.split('/').count(), + 2, + "{}: origin_repo is not org/name: {}", + s.id, + s.origin_repo + ); + } + } + + #[test] + fn mmproj_fields_are_internally_consistent() { + for s in STARTERS { + // The mmproj file, its digest, and a non-zero byte count travel + // together, and a vision entry ships a projector while a text entry + // does not (llama.cpp needs the mmproj to see images). + assert_eq!( + s.mmproj_file.is_some(), + s.mmproj_sha256.is_some(), + "{}: mmproj file/sha presence mismatch", + s.id + ); + assert_eq!( + s.mmproj_file.is_some(), + s.mmproj_bytes > 0, + "{}: mmproj file/bytes presence mismatch", + s.id + ); + assert_eq!( + s.vision, + s.mmproj_file.is_some(), + "{}: vision/mmproj mismatch", + s.id + ); + } + } + + #[test] + fn reasoning_always_entries_also_emit_thinking() { + // A model whose reasoning cannot be turned off must also be flagged as + // a thinking model, or the picker badge and `/think` gate disagree. + for s in STARTERS { + if s.reasoning_always { + assert!(s.thinking, "{}: reasoning_always implies thinking", s.id); + } + } + } + + #[test] + fn every_entry_has_a_positive_runtime_estimate() { + for s in STARTERS { + assert!( + s.est_runtime_gb > 0.0, + "{}: non-positive est_runtime_gb", + s.id + ); + } + } + + #[test] + fn family_per_tier() { + // Each entry carries a non-empty family label. + assert_eq!(starter(Tier::Fast).family, "Qwen"); + assert_eq!(starter(Tier::Balanced).family, "Gemma"); + assert_eq!(starter(Tier::Smartest).family, "gpt-oss"); + for s in STARTERS { + assert!(!s.family.is_empty(), "{}: family is empty", s.repo); + } + } + + #[test] + fn category_per_tier() { + // The Discover staff-picks list groups starters into use-case sections, + // so every entry carries a non-empty category label. + assert_eq!(starter(Tier::Fast).category, "Everyday chat"); + assert_eq!(starter(Tier::Balanced).category, "Everyday chat"); + assert_eq!(starter(Tier::Smartest).category, "Deep reasoning"); + for s in STARTERS { + assert!(!s.category.is_empty(), "{}: category is empty", s.repo); + } } #[test] @@ -238,6 +725,37 @@ mod tests { assert_eq!(smartest.mmproj_bytes, 0); } + /// The `thinking` flag is the passive "this model reasons" badge: it drives + /// the picker tag, the `/think` capability gate, and the earlier-turn + /// reasoning strip. It must match each curated model's real behavior, or a + /// reasoning model is wrongly told it "does not emit thinking tokens". + /// Qwen3.5 and gpt-oss are reasoning models; Gemma 4 is not. + #[test] + fn thinking_flag_per_tier() { + assert!(starter(Tier::Fast).thinking, "Qwen3.5 reasons"); + assert!(!starter(Tier::Balanced).thinking, "Gemma 4 does not reason"); + assert!(starter(Tier::Smartest).thinking, "gpt-oss reasons"); + } + + /// `reasoning_always` marks models whose reasoning cannot be turned off. + /// Only gpt-oss (Harmony) is structurally always-on; Qwen3.5's reasoning is + /// optional (off by default via the kwarg blast) and Gemma does not reason. + #[test] + fn reasoning_always_flag_per_tier() { + assert!( + starter(Tier::Smartest).reasoning_always, + "gpt-oss always reasons" + ); + assert!( + !starter(Tier::Fast).reasoning_always, + "Qwen3.5 reasoning is optional" + ); + assert!( + !starter(Tier::Balanced).reasoning_always, + "Gemma does not force reasoning" + ); + } + #[test] fn all_revisions_are_40_hex() { for s in STARTERS { @@ -318,7 +836,7 @@ mod tests { (32, [RamFit::Fits, RamFit::Fits, RamFit::Fits]), ]; for (ram_gib, expected) in table { - for (s, want) in STARTERS.iter().zip(expected) { + for (s, want) in onboarding_heroes().iter().zip(expected) { let got = ram_fit(s.est_runtime_gb, ram_gib * GIB); assert_eq!( got, *want, @@ -369,6 +887,16 @@ mod tests { ); } + #[test] + fn installed_model_id_is_repo_colon_file() { + // The manifest-row id is ":"; `installed_model_id` is + // its single source of truth, so `to_installed_model` never drifts from + // the installed-state probe. + let s = &STARTERS[0]; + assert_eq!(installed_model_id(s), format!("{}:{}", s.repo, s.file_name)); + assert_eq!(to_installed_model(s).id, installed_model_id(s)); + } + #[test] fn to_installed_model_maps_fields() { let balanced = starter(Tier::Balanced); @@ -383,6 +911,7 @@ mod tests { assert_eq!(m.quant, balanced.quant); assert_eq!(m.vision, balanced.vision); assert_eq!(m.thinking, balanced.thinking); + assert_eq!(m.reasoning_always, balanced.reasoning_always); assert_eq!(m.mmproj_file.as_deref(), balanced.mmproj_file); assert_eq!(m.mmproj_sha256.as_deref(), balanced.mmproj_sha256); diff --git a/src-tauri/src/openai.rs b/src-tauri/src/openai.rs index 6c6dae50..5d3582d9 100644 --- a/src-tauri/src/openai.rs +++ b/src-tauri/src/openai.rs @@ -39,6 +39,11 @@ pub struct OpenAiChatParams { pub api_key: Option, /// Picks the user-facing error copy for this request. pub flavor: V1Flavor, + /// Whether the model should run a reasoning pass before answering. + /// Reasoning is opt-in (the `/think` command); a plain message answers + /// directly. Honored only on the built-in engine via + /// [`reasoning_template_kwargs`]; remote `/v1` servers ignore it. + pub enable_thinking: bool, } /// Error returned by [`request_openai_json`]. Mirrors the classification the @@ -211,6 +216,62 @@ fn oversize_sse_line_error() -> EngineError { } } +// ─── Reasoning control ─────────────────────────────────────────────────────── + +/// The per-request reasoning switch to merge into a `/v1` body as +/// `chat_template_kwargs`, or `None` when the request must carry no such field. +/// +/// llama.cpp injects these into the model's chat template and a template +/// silently ignores any kwarg it does not read (verified against the pinned +/// `b9590` sidecar with Qwen3.5: the full set below suppresses reasoning with +/// no error). So one harmless "blast" covers every reasoning family that +/// exposes a template-level switch, with no per-family detection: +/// `enable_thinking` (Qwen3/3.5, GLM, Hunyuan, Gemma), `thinking` (IBM Granite, +/// DeepSeek-V3.x), and `thinking_budget` (`0` = off / `-1` = unrestricted, for +/// ByteDance Seed-OSS). `false`/`0` answers directly; `true`/`-1` reasons. +/// +/// Families with no template switch (DeepSeek-R1 + distills, QwQ, gpt-oss +/// Harmony, MiniMax, EXAONE, Phi-4-reasoning, ...) reason regardless of this +/// switch: the compute cannot be stopped on this engine. Their reasoning is +/// not suppressed; [`stream_openai_chat`] surfaces any `reasoning_content` in +/// the thinking block (always shown, never hidden), so the chain of thought is +/// presented cleanly rather than running invisibly. +/// +/// Only the bundled engine ([`V1Flavor::Builtin`]) receives the kwargs; the +/// fields are llama.cpp-specific and an arbitrary OpenAI-compatible server may +/// reject an unknown body key, so remote providers get nothing. +fn reasoning_template_kwargs(flavor: V1Flavor, enable_thinking: bool) -> Option { + match flavor { + V1Flavor::Builtin => Some(serde_json::json!({ + "enable_thinking": enable_thinking, + "thinking": enable_thinking, + "thinking_budget": if enable_thinking { -1 } else { 0 }, + })), + V1Flavor::Remote => None, + } +} + +/// Builds the streaming `/v1/chat/completions` request body. Pulled out of +/// [`stream_openai_chat`] so the reasoning-control wiring is unit-tested +/// without a live server. No sampling parameters are sent: the server and +/// model defaults apply. +pub(crate) fn chat_request_body( + model: &str, + messages: &[ChatMessage], + flavor: V1Flavor, + enable_thinking: bool, +) -> serde_json::Value { + let mut body = serde_json::json!({ + "model": model, + "messages": messages.iter().map(to_openai_message).collect::>(), + "stream": true, + }); + if let Some(kwargs) = reasoning_template_kwargs(flavor, enable_thinking) { + body["chat_template_kwargs"] = kwargs; + } + body +} + // ─── Streaming chat ────────────────────────────────────────────────────────── /// Streams a `/v1/chat/completions` request (`stream: true`) and emits the @@ -235,12 +296,9 @@ pub async fn stream_openai_chat( messages, api_key, flavor, + enable_thinking, } = params; - let body = serde_json::json!({ - "model": model, - "messages": messages.iter().map(to_openai_message).collect::>(), - "stream": true, - }); + let body = chat_request_body(&model, &messages, flavor, enable_thinking); let mut request = client .post(format!("{base_url}/v1/chat/completions")) .json(&body); @@ -309,6 +367,12 @@ pub async fn stream_openai_chat( continue; }; if let Some(choice) = event.choices.first() { + // Whatever reasoning the model emits is always + // shown (never hidden): an `Optional` model that + // honored the OFF blast emits none, while a + // model that always reasons gets its thinking + // surfaced cleanly in the thinking block rather + // than running invisibly. if let Some(thinking) = choice .delta .reasoning_content @@ -362,8 +426,9 @@ pub(crate) fn json_request_body( messages: &[ChatMessage], schema: serde_json::Value, max_tokens: i32, + flavor: V1Flavor, ) -> serde_json::Value { - serde_json::json!({ + let mut body = serde_json::json!({ "model": model, "messages": messages.iter().map(to_openai_message).collect::>(), "stream": false, @@ -373,7 +438,14 @@ pub(crate) fn json_request_body( "type": "json_schema", "json_schema": {"name": "out", "strict": true, "schema": schema}, }, - }) + }); + // Structured output must never reason on the built-in engine: a thinking + // pass would consume the `max_tokens` budget before any JSON is emitted, + // yielding empty content. Force the switch off (remote servers get nothing). + if let Some(kwargs) = reasoning_template_kwargs(flavor, false) { + body["chat_template_kwargs"] = kwargs; + } + body } /// Sends a single non-streaming `/v1/chat/completions` request with a strict @@ -391,9 +463,10 @@ pub async fn request_openai_json( api_key: Option<&str>, timeout_secs: u64, max_tokens: i32, + flavor: V1Flavor, cancel_token: &CancellationToken, ) -> Result { - let body = json_request_body(model, &messages, schema, max_tokens); + let body = json_request_body(model, &messages, schema, max_tokens, flavor); let mut request = client .post(format!("{base_url}/v1/chat/completions")) .json(&body) @@ -460,6 +533,7 @@ mod tests { messages: vec![user_message("hi")], api_key: None, flavor: V1Flavor::Remote, + enable_thinking: false, } } @@ -1089,6 +1163,7 @@ mod tests { Some("sk-json"), 5, 256, + V1Flavor::Remote, &CancellationToken::new(), ) .await; @@ -1116,6 +1191,7 @@ mod tests { None, 5, 64, + V1Flavor::Remote, &CancellationToken::new(), ) .await; @@ -1141,6 +1217,7 @@ mod tests { None, 5, 64, + V1Flavor::Remote, &token, ) .await; @@ -1174,6 +1251,7 @@ mod tests { None, 1, 64, + V1Flavor::Remote, &CancellationToken::new(), ) .await; @@ -1213,6 +1291,7 @@ mod tests { None, 5, 64, + V1Flavor::Remote, &CancellationToken::new(), ) .await; @@ -1240,6 +1319,7 @@ mod tests { None, 5, 64, + V1Flavor::Remote, &CancellationToken::new(), ) .await; @@ -1269,6 +1349,7 @@ mod tests { None, 5, 64, + V1Flavor::Remote, &CancellationToken::new(), ) .await; @@ -1303,6 +1384,7 @@ mod tests { None, 5, 64, + V1Flavor::Remote, &CancellationToken::new(), ) .await; @@ -1312,4 +1394,106 @@ mod tests { assert_eq!(requests.len(), 1); assert!(!requests[0].headers.contains_key("authorization")); } + + // ── reasoning control (chat_template_kwargs.enable_thinking) ───────────── + + /// Built-in chat carries the llama.cpp per-request reasoning switch. With + /// reasoning opted out (the default), the body sets + /// `chat_template_kwargs.enable_thinking = false` so the model answers + /// directly instead of running a thinking pass. + #[test] + fn builtin_chat_body_disables_thinking_by_default() { + let body = chat_request_body("m", &[user_message("hi")], V1Flavor::Builtin, false); + // The OFF blast covers every reasoning family that honors a template + // kwarg, in one harmless payload: `enable_thinking` (Qwen/GLM/Hunyuan/ + // Gemma), `thinking` (Granite/DeepSeek-V3.x), `thinking_budget` 0 + // (Seed-OSS). Templates ignore the kwargs they do not read. + let kwargs = &body["chat_template_kwargs"]; + assert_eq!(kwargs["enable_thinking"], serde_json::json!(false)); + assert_eq!(kwargs["thinking"], serde_json::json!(false)); + assert_eq!(kwargs["thinking_budget"], serde_json::json!(0)); + assert_eq!(body["stream"], serde_json::json!(true)); + } + + /// Built-in chat with `/think` opts in: the ON blast sets every kwarg to + /// the reasoning-enabled value (`thinking_budget` -1 = unrestricted). + #[test] + fn builtin_chat_body_enables_thinking_when_opted_in() { + let body = chat_request_body("m", &[user_message("hi")], V1Flavor::Builtin, true); + let kwargs = &body["chat_template_kwargs"]; + assert_eq!(kwargs["enable_thinking"], serde_json::json!(true)); + assert_eq!(kwargs["thinking"], serde_json::json!(true)); + assert_eq!(kwargs["thinking_budget"], serde_json::json!(-1)); + } + + /// Remote `/v1` servers never receive the llama.cpp-specific + /// `chat_template_kwargs` field: an arbitrary OpenAI-compatible server may + /// reject an unknown body key, and the `/think` opt-in is built-in only. + #[test] + fn remote_chat_body_omits_thinking_kwargs() { + let body = chat_request_body("m", &[user_message("hi")], V1Flavor::Remote, true); + assert!(body.get("chat_template_kwargs").is_none()); + } + + /// Structured-output calls (search judges, title generation) must never + /// reason on the built-in engine: a thinking pass would consume the + /// `max_tokens` budget before any JSON is emitted. The builtin structured + /// body forces `enable_thinking = false`. + #[test] + fn builtin_structured_body_disables_thinking() { + let body = json_request_body( + "m", + &[user_message("q")], + serde_json::json!({}), + 64, + V1Flavor::Builtin, + ); + let kwargs = &body["chat_template_kwargs"]; + assert_eq!(kwargs["enable_thinking"], serde_json::json!(false)); + assert_eq!(kwargs["thinking"], serde_json::json!(false)); + assert_eq!(kwargs["thinking_budget"], serde_json::json!(0)); + assert_eq!(body["stream"], serde_json::json!(false)); + } + + /// Remote structured-output bodies stay clean of the llama.cpp kwarg. + #[test] + fn remote_structured_body_omits_thinking_kwargs() { + let body = json_request_body( + "m", + &[user_message("q")], + serde_json::json!({}), + 64, + V1Flavor::Remote, + ); + assert!(body.get("chat_template_kwargs").is_none()); + } + + /// End to end: a built-in streaming chat actually sends the reasoning + /// switch on the wire, locking `stream_openai_chat` to `chat_request_body`. + #[tokio::test] + async fn builtin_stream_sends_enable_thinking_on_the_wire() { + let server = MockServer::start().await; + mount_sse(&server, b"data: [DONE]\n".to_vec()).await; + + let client = reqwest::Client::new(); + let (_, callback) = collect_chunks(); + stream_openai_chat( + OpenAiChatParams { + flavor: V1Flavor::Builtin, + enable_thinking: false, + ..chat_params(server.uri()) + }, + &client, + CancellationToken::new(), + callback, + ) + .await; + + let requests = server.received_requests().await.unwrap(); + let sent: serde_json::Value = serde_json::from_slice(&requests[0].body).unwrap(); + assert_eq!( + sent["chat_template_kwargs"]["enable_thinking"], + serde_json::json!(false) + ); + } } diff --git a/src-tauri/src/search/llm.rs b/src-tauri/src/search/llm.rs index 52829d10..9c02d138 100644 --- a/src-tauri/src/search/llm.rs +++ b/src-tauri/src/search/llm.rs @@ -386,7 +386,7 @@ async fn request_json_v1( // Build the trace body via the same helper request_openai_json uses so // the recorded body always mirrors the actual wire shape. let request_body_value = - crate::openai::json_request_body(model, &messages, format.clone(), num_predict); + crate::openai::json_request_body(model, &messages, format.clone(), num_predict, flavor); let started = std::time::Instant::now(); let result = crate::openai::request_openai_json( base_url, @@ -397,6 +397,7 @@ async fn request_json_v1( api_key, timeout_secs, num_predict, + flavor, cancel_token, ) .await; diff --git a/src-tauri/src/search/pipeline.rs b/src-tauri/src/search/pipeline.rs index 6a557465..635374bb 100644 --- a/src-tauri/src/search/pipeline.rs +++ b/src-tauri/src/search/pipeline.rs @@ -516,6 +516,10 @@ async fn run_streaming_branch( messages, api_key: api_key.clone(), flavor: *flavor, + // Search synthesis answers directly; reasoning is opt-in + // chat only, and on the built-in engine a thinking pass + // would burn the token budget before the answer. + enable_thinking: false, }, client, cancel_token, diff --git a/src-tauri/src/settings_commands.rs b/src-tauri/src/settings_commands.rs index 4591bc04..453503d9 100644 --- a/src-tauri/src/settings_commands.rs +++ b/src-tauri/src/settings_commands.rs @@ -161,6 +161,17 @@ pub(crate) fn builtin_deactivated(prior_kind: &str, resolved: &AppConfig) -> boo != crate::config::defaults::PROVIDER_KIND_BUILTIN } +/// True when a config write moved the ACTIVE provider away from Ollama +/// (ollama -> builtin/openai). The mirror of [`builtin_deactivated`]: switching +/// between non-ollama kinds or onto ollama never matches. Pulled out so the +/// predicate is covered by tests instead of riding inside the coverage-off +/// command bodies that fire the Ollama eviction. +pub(crate) fn ollama_deactivated(prior_kind: &str, resolved: &AppConfig) -> bool { + prior_kind == crate::config::defaults::PROVIDER_KIND_OLLAMA + && resolved.inference.active_provider_kind() + != crate::config::defaults::PROVIDER_KIND_OLLAMA +} + /// Fires a best-effort engine unload when a config write switched the active /// provider away from the built-in engine. Without it, a multi-GB /// llama-server stays resident until quit: the eviction UI branches by the @@ -179,6 +190,45 @@ fn unload_engine_if_builtin_deactivated(app: &AppHandle, prior_kind: &str, resol } } +/// Fires a best-effort Ollama eviction when a config write switched the active +/// provider away from Ollama (ollama -> builtin/openai). The mirror of +/// [`unload_engine_if_builtin_deactivated`]: without it the model Thuki loaded +/// into Ollama's VRAM lingers for its `keep_alive` TTL after the user has moved +/// on, holding memory for a provider that is no longer active. Only the model +/// Thuki was chatting with (the Ollama provider's configured `model`) is +/// evicted; models other apps loaded are left alone. Spawned so the switch +/// never blocks on, nor can fail because of, Ollama being unreachable. +#[cfg_attr(coverage_nightly, coverage(off))] +fn evict_ollama_if_deactivated(app: &AppHandle, prior_kind: &str, resolved: &AppConfig) { + if !ollama_deactivated(prior_kind, resolved) { + return; + } + // The provider switch moves only the active_provider pointer; the Ollama + // provider entry still carries the model + endpoint Thuki was using. + let Some(ollama) = resolved + .inference + .providers + .iter() + .find(|p| p.kind == crate::config::defaults::PROVIDER_KIND_OLLAMA) + else { + return; + }; + let model = ollama.model.clone(); + if model.is_empty() { + return; + } + let endpoint = format!("{}/api/generate", ollama.base_url.trim_end_matches('/')); + let client = app.state::().inner().clone(); + // Suppress any in-flight warmup that would re-announce the model as loaded + // after we evict it, matching the explicit Unload-now path. + app.state::().mark_evicted(); + let app_handle = app.clone(); + tauri::async_runtime::spawn(async move { + let _ = crate::warmup::evict_model_request(&endpoint, &model, &client).await; + let _ = app_handle.emit("warmup:model-evicted", ()); + }); +} + // ─── Tauri command surface ────────────────────────────────────────────────── /// Returns the current resolved `AppConfig` snapshot. @@ -350,9 +400,13 @@ pub fn set_active_provider( *guard = mirror; } } - // Switching away from the built-in engine releases its memory; the - // sidecar would otherwise stay resident with no unload affordance. + // Switching away from a local provider releases its memory immediately so + // the now-inactive provider holds no RAM/VRAM: the built-in engine's + // sidecar is killed, and the Ollama model is evicted from VRAM. Exactly one + // fires (the prior kind is builtin, ollama, or openai); openai is remote and + // needs neither. unload_engine_if_builtin_deactivated(&app, &prior_kind, &resolved); + evict_ollama_if_deactivated(&app, &prior_kind, &resolved); emit_config_updated(&app); Ok(resolved) } @@ -885,9 +939,11 @@ pub fn reload_config_from_disk( // Manual edits to `[inference] keep_warm_inactivity_minutes` reach the // engine runner through the same refresh path. forward_keep_warm_idle_minutes(&app, prior_keep_warm_minutes, &resolved); - // A hand-edited `active_provider` that moved away from the built-in - // engine releases the sidecar, mirroring the Settings radio path. + // A hand-edited `active_provider` that moved away from a local provider + // releases its memory (builtin sidecar killed, Ollama model evicted), + // mirroring the Settings radio path. unload_engine_if_builtin_deactivated(&app, &prior_kind, &resolved); + evict_ollama_if_deactivated(&app, &prior_kind, &resolved); emit_config_updated(&app); Ok(resolved) } diff --git a/src-tauri/src/settings_commands/tests.rs b/src-tauri/src/settings_commands/tests.rs index 5c31e544..2eb5d66a 100644 --- a/src-tauri/src/settings_commands/tests.rs +++ b/src-tauri/src/settings_commands/tests.rs @@ -14,8 +14,8 @@ use toml_edit::DocumentMut; use super::{ add_openai_provider_to_disk, builtin_deactivated, cleanup_provider_secrets, coerce_json_to_toml, is_allowed_field, is_allowed_section, is_http_url, json_type_name, - json_value_to_toml_item, keep_warm_idle_minutes_changed, patch_document, read_document, - remove_openai_provider_from_disk, reset_section_on_disk, trace_enabled_changed, + json_value_to_toml_item, keep_warm_idle_minutes_changed, ollama_deactivated, patch_document, + read_document, remove_openai_provider_from_disk, reset_section_on_disk, trace_enabled_changed, validate_provider_value, write_active_provider_to_disk, write_field_to_disk, write_provider_field_to_disk, }; @@ -1655,6 +1655,36 @@ fn builtin_deactivated_ignores_non_builtin_transitions_and_no_ops() { assert!(!builtin_deactivated("", &config_with_active("ollama"))); } +// ─── ollama_deactivated ────────────────────────────────────────────────────── + +#[test] +fn ollama_deactivated_detects_switch_away_from_ollama() { + // ollama -> builtin and ollama -> openai both free the Ollama model. + assert!(ollama_deactivated("ollama", &config_with_active("builtin"))); + assert!(ollama_deactivated("ollama", &config_with_active("openai"))); +} + +#[test] +fn ollama_deactivated_ignores_switch_onto_ollama() { + assert!(!ollama_deactivated( + "builtin", + &config_with_active("ollama") + )); +} + +#[test] +fn ollama_deactivated_ignores_non_ollama_transitions_and_no_ops() { + // ollama -> ollama: nothing changed. + assert!(!ollama_deactivated("ollama", &config_with_active("ollama"))); + // builtin -> builtin: never an Ollama deactivation. + assert!(!ollama_deactivated( + "builtin", + &config_with_active("builtin") + )); + // Unresolved prior kind (empty) never counts as ollama. + assert!(!ollama_deactivated("", &config_with_active("builtin"))); +} + // ─── Helpers ───────────────────────────────────────────────────────────────── fn matches_type_mismatch(err: &ConfigError, section: &str, key: &str) { diff --git a/src-tauri/src/warmup.rs b/src-tauri/src/warmup.rs index a90bf33d..a668913e 100644 --- a/src-tauri/src/warmup.rs +++ b/src-tauri/src/warmup.rs @@ -99,15 +99,12 @@ pub(crate) fn vram_poll_active(kind: &str) -> bool { kind == PROVIDER_KIND_OLLAMA } -/// The engine port to prime, when the built-in engine already serves a model. -/// `None` for every other lifecycle state: summoning the overlay must never -/// load a model implicitly (loads happen on explicit chat or download). -pub(crate) fn builtin_prime_port(status: &crate::engine::runner::EngineStatus) -> Option { - if status.state == "loaded" { - status.port - } else { - None - } +/// Whether the built-in engine should warm-load on the chat-intent signal: +/// only when a model is actually selected. Mirrors the Ollama arm, which also +/// no-ops without a model. An empty id means no built-in model has been picked +/// yet, so there is nothing to load. +pub(crate) fn builtin_should_warm(model_id: &str) -> bool { + !model_id.is_empty() } /// Builds the prime request body for the built-in engine: a plain @@ -132,18 +129,128 @@ pub(crate) fn builtin_prime_body(model: &str, system_prompt: &str) -> serde_json /// priming is app-summon activity, not user chat; if it touched, idle-unload /// would never fire for a user who keeps summoning the overlay without /// chatting. +/// Returns `true` when the prime got an HTTP 200 (the model is now warm and +/// the system-prompt prefix is cached); any transport or non-200 outcome +/// returns `false` so the caller leaves the load un-primed and a later warm +/// can retry. pub(crate) async fn prime_builtin( port: u16, model: String, system_prompt: String, client: reqwest::Client, -) { +) -> bool { let body = builtin_prime_body(&model, &system_prompt); - let _ = client + client .post(format!("http://127.0.0.1:{port}/v1/chat/completions")) .json(&body) .send() - .await; + .await + .map(|r| r.status().as_u16()) + .unwrap_or(0) + == 200 +} + +/// Port-keyed dedup + cue state for the built-in engine, owned by the app +/// layer so the engine runner stays a pure process actor. `warm_builtin` +/// consults it after `ensure_loaded` resolves the serving port, so at most one +/// prime runs per engine load and the overlay shows the "warming" cue for +/// exactly that window. Keyed on port, not target: a model or context switch +/// forces a new process and a new port, so a port mismatch correctly allows a +/// fresh prime after any restart. +#[derive(Default)] +pub struct BuiltinWarmState { + inner: std::sync::Mutex, +} + +#[derive(Default)] +struct BuiltinWarm { + /// Port of a prime currently in flight, if any. Armed by `try_begin`, + /// cleared by `finish` regardless of outcome so a failed prime can retry. + in_flight: Option, + /// Port whose prime completed successfully. A new process gets a new port, + /// so a port mismatch allows a fresh prime after a restart. + primed_port: Option, +} + +impl BuiltinWarmState { + /// Atomically decides whether to prime the engine on `port`. Returns true + /// (and arms the in-flight slot) only when no prime is already running for + /// this port and this port has not already been primed. The two warm + /// callers (summon + first keystroke) both reach this after `ensure_loaded` + /// resolves the same reused port, so the loser dedups to a no-op. + pub fn try_begin(&self, port: u16) -> bool { + let mut g = self.inner.lock().unwrap(); + if g.in_flight == Some(port) || g.primed_port == Some(port) { + return false; + } + g.in_flight = Some(port); + true + } + + /// Clears the in-flight slot for `port` and, on success, records the port + /// as primed so later warm requests for the same load dedup. A `finish` + /// for a port that no longer owns the slot (engine restarted mid-prime) + /// leaves the slot untouched. + pub fn finish(&self, port: u16, success: bool) { + let mut g = self.inner.lock().unwrap(); + if g.in_flight == Some(port) { + g.in_flight = None; + } + if success { + g.primed_port = Some(port); + } + } + + /// Whether a prime is currently in flight. Seeds the Settings keep-warm + /// status when the panel mounts during a cold prime (it otherwise learns + /// the state only from the `warmup:builtin-warming`/`-warmed` events). + pub fn is_warming(&self) -> bool { + self.inner.lock().unwrap().in_flight.is_some() + } + + /// Drops all dedup state so the next warm primes fresh. Called when the + /// engine leaves the `loaded` state (idle-unload, model switch, crash): the + /// primed port belongs to a process that no longer exists, and the OS can + /// hand the next load that exact port again. Without this clear, the cold + /// reload would match the dead port's primed record, dedup to a no-op, and + /// leave the user's first message to eat the full cold prefill. + pub fn reset(&self) { + let mut g = self.inner.lock().unwrap(); + g.in_flight = None; + g.primed_port = None; + } +} + +/// Built-in arm of `warm_up_model`: starts (or reuses) the engine so the +/// selected model is resident by the time the user submits, then primes the +/// KV cache for the system-prompt prefix. Dedup via [`BuiltinWarmState`] +/// collapses the summon + keystroke warms (and any double-summon) to a single +/// prime per load, so the user's first message never queues behind redundant +/// cold primes. Emits `warmup:builtin-warming` while the prime runs and +/// `warmup:builtin-warmed` when it ends, so the Settings keep-warm status can +/// read "warming…" until the model is actually ready (not just `/health` OK). +/// Best-effort throughout: a superseded load, a dedup skip, or a failed prime +/// is swallowed. Coverage-off: the dedup logic lives in `BuiltinWarmState` +/// and the prime in `prime_builtin`, both tested; this only sequences them. +#[cfg_attr(coverage_nightly, coverage(off))] +pub(crate) async fn warm_builtin( + app: tauri::AppHandle, + engine: crate::engine::runner::EngineHandle, + target: crate::engine::state::Target, + model_id: String, + system_prompt: String, + client: reqwest::Client, +) { + let Ok(port) = engine.ensure_loaded(target).await else { + return; + }; + if !app.state::().try_begin(port) { + return; + } + let _ = app.emit("warmup:builtin-warming", ()); + let ok = prime_builtin(port, model_id, system_prompt, client).await; + app.state::().finish(port, ok); + let _ = app.emit("warmup:builtin-warmed", ()); } /// Built-in arm of `evict_model`: stops the engine sidecar and resolves once @@ -153,18 +260,27 @@ pub(crate) async fn evict_builtin(engine: &crate::engine::runner::EngineHandle) engine.unload().await; } -/// Built-in arm of `get_loaded_model`: the provider's configured model id -/// when the engine status watch reports a loaded model, `None` otherwise -/// (including when no model has been picked yet). +/// Built-in arm of `get_loaded_model`: the display name of the model the engine +/// is *actually* serving, resolved from the live status's `model_path` against +/// `installed` (each entry a `(display_name, weights blob path)` pair), or +/// `None` when the engine is not loaded or the resident blob matches no row. +/// +/// This reads true VRAM residency, never the frontend-selected model: switching +/// the active model rewrites config immediately, but the sidecar keeps serving +/// the previous model until a reload, so the configured id would misreport what +/// occupies memory. pub(crate) fn builtin_loaded_model( status: &crate::engine::runner::EngineStatus, - model_id: &str, + installed: &[(String, std::path::PathBuf)], ) -> Option { - if status.state == "loaded" && !model_id.is_empty() { - Some(model_id.to_string()) - } else { - None - } + if status.state != "loaded" || status.model_path.is_empty() { + return None; + } + let resident = std::path::Path::new(&status.model_path); + installed + .iter() + .find(|(_, path)| path.as_path() == resident) + .map(|(name, _)| name.clone()) } impl Default for WarmupState { @@ -252,12 +368,16 @@ impl WarmupState { #[tauri::command] #[cfg_attr(coverage_nightly, coverage(off))] +#[allow(clippy::too_many_arguments)] pub fn warm_up_model( + app: tauri::AppHandle, warmup: tauri::State, models: tauri::State, config: tauri::State>, client: tauri::State, engine: tauri::State, + db: tauri::State, + store: tauri::State, ) { let kind = config.read().inference.active_provider_kind().to_string(); match kind.as_str() { @@ -292,14 +412,38 @@ pub fn warm_up_model( } } PROVIDER_KIND_BUILTIN => { - let status = engine.status().borrow().clone(); - if let Some(port) = builtin_prime_port(&status) { + let (model_id, num_ctx, system_prompt) = { let cfg = config.read(); - let model = cfg.inference.active_provider_model().to_string(); - let system_prompt = cfg.prompt.resolved_system.clone(); - drop(cfg); - let client = client.inner().clone(); - tauri::async_runtime::spawn(prime_builtin(port, model, system_prompt, client)); + ( + cfg.inference.active_provider_model().to_string(), + cfg.inference.num_ctx, + cfg.prompt.resolved_system.clone(), + ) + }; + if !builtin_should_warm(&model_id) { + return; + } + // Resolve the manifest row to an engine Target inside a scope so the + // connection guard drops before the spawned load. A poisoned lock is + // recovered: an unrelated panic does not invalidate the connection. + let target = { + let conn = match db.0.lock() { + Ok(conn) => conn, + Err(poisoned) => poisoned.into_inner(), + }; + crate::commands::builtin_target(&conn, &store, &model_id, num_ctx) + }; + // A missing/uninstalled model yields an Err; warmup is best-effort, + // so just skip rather than surfacing anything. + if let Ok(target) = target { + tauri::async_runtime::spawn(warm_builtin( + app, + engine.inner().clone(), + target, + model_id, + system_prompt, + client.inner().clone(), + )); } } _ => {} @@ -355,6 +499,17 @@ pub fn get_engine_status( engine.current_status() } +/// True while the built-in engine is priming (loaded but the system-prompt +/// prefill has not finished). The Settings keep-warm panel calls this on mount +/// to seed its "warming…" status, since the `warmup:builtin-warming` event it +/// otherwise relies on may have fired before the panel attached its listener. +/// Thin wrapper over [`BuiltinWarmState::is_warming`], which its own tests cover. +#[tauri::command] +#[cfg_attr(coverage_nightly, coverage(off))] +pub fn get_builtin_warm_state(warm: tauri::State<'_, BuiltinWarmState>) -> bool { + warm.is_warming() +} + /// Returns the active model's name if it is currently loaded, `None` if no /// model is selected or nothing is running. Branches by the active provider's /// kind: Ollama queries `/api/ps`, the built-in engine reads its own status @@ -367,13 +522,28 @@ pub async fn get_loaded_model( config: tauri::State<'_, parking_lot::RwLock>, client: tauri::State<'_, reqwest::Client>, engine: tauri::State<'_, crate::engine::runner::EngineHandle>, + db: tauri::State<'_, crate::history::Database>, + store: tauri::State<'_, crate::models::storage::ModelStore>, ) -> Result, String> { let kind = config.read().inference.active_provider_kind().to_string(); match kind.as_str() { PROVIDER_KIND_BUILTIN => { - let model_id = config.read().inference.active_provider_model().to_string(); let status = engine.status().borrow().clone(); - Ok(builtin_loaded_model(&status, &model_id)) + // Resolve the engine's resident blob back to its installed name. A + // poisoned lock is recovered: an unrelated panic must not blind the + // residency line. + let installed = { + let conn = match db.0.lock() { + Ok(conn) => conn, + Err(poisoned) => poisoned.into_inner(), + }; + crate::models::manifest::list(&conn) + .unwrap_or_default() + .into_iter() + .map(|m| (m.display_name, store.blob_path(&m.sha256))) + .collect::>() + }; + Ok(builtin_loaded_model(&status, &installed)) } PROVIDER_KIND_OLLAMA => { let model = models.0.lock().ok().and_then(|g| g.clone()); @@ -1485,13 +1655,14 @@ mod tests { } #[test] - fn prime_skipped_when_engine_not_loaded() { - assert_eq!(builtin_prime_port(&engine_status("stopped", None)), None); - assert_eq!(builtin_prime_port(&engine_status("starting", None)), None); - assert_eq!(builtin_prime_port(&engine_status("failed", None)), None); - assert_eq!( - builtin_prime_port(&engine_status("loaded", Some(40123))), - Some(40123) + fn builtin_should_warm_requires_a_selected_model() { + assert!( + !builtin_should_warm(""), + "no picked model means nothing to warm-load" + ); + assert!( + builtin_should_warm("org/repo:m.gguf"), + "a selected model warms the engine on the chat-intent signal" ); } @@ -1515,7 +1686,7 @@ mod tests { .unwrap() .parse() .expect("mockito url ends in a port"); - prime_builtin( + let ok = prime_builtin( port, "org/repo:m.gguf".to_string(), SYS.to_string(), @@ -1523,23 +1694,154 @@ mod tests { ) .await; + assert!( + ok, + "a 200 prime reports success so the load is marked primed" + ); mock.assert_async().await; } + #[tokio::test] + async fn builtin_prime_swallows_connection_error() { + // Port 1 refuses; prime is best-effort and must not panic, exercising + // the transport-error path of the status capture. + let ok = prime_builtin( + 1, + "org/repo:m.gguf".to_string(), + SYS.to_string(), + reqwest::Client::new(), + ) + .await; + + assert!( + !ok, + "a transport failure reports not-primed so a later warm retries" + ); + } + + // ── BuiltinWarmState (port-keyed dedup) ────────────────────────────────── + #[test] - fn get_loaded_model_builtin_from_status() { - assert_eq!( - builtin_loaded_model(&engine_status("loaded", Some(40123)), "org/repo:m.gguf"), - Some("org/repo:m.gguf".to_string()) + fn warm_state_first_call_begins_then_dedups_in_flight() { + let s = BuiltinWarmState::default(); + assert!(s.try_begin(40000), "first call for a port arms the prime"); + assert!( + !s.try_begin(40000), + "a second call while the prime is in flight dedups to a no-op" + ); + } + + #[test] + fn warm_state_failed_prime_allows_retry() { + let s = BuiltinWarmState::default(); + assert!(s.try_begin(40000)); + s.finish(40000, false); + assert!( + s.try_begin(40000), + "a failed prime leaves the port un-primed so a later warm retries" + ); + } + + #[test] + fn warm_state_successful_prime_dedups_same_port() { + let s = BuiltinWarmState::default(); + assert!(s.try_begin(40000)); + s.finish(40000, true); + assert!( + !s.try_begin(40000), + "a primed port dedups later warms for the same load" ); + } + + #[test] + fn warm_state_new_port_primes_again_after_success() { + let s = BuiltinWarmState::default(); + assert!(s.try_begin(40000)); + s.finish(40000, true); + assert!( + s.try_begin(40001), + "a new process/port (restart or model switch) primes fresh" + ); + } + + #[test] + fn warm_state_finish_for_unowned_port_leaves_slot_armed() { + let s = BuiltinWarmState::default(); + assert!(s.try_begin(40000)); + // The engine restarted mid-prime: a finish for a different port must not + // clear the slot the live prime still owns, but still records its success. + s.finish(40001, true); + assert!( + !s.try_begin(40000), + "the in-flight slot for 40000 is untouched by finish(40001)" + ); + assert!( + !s.try_begin(40001), + "finish(40001, true) still recorded 40001 as primed" + ); + } + + #[test] + fn warm_state_reset_clears_dedup_after_teardown() { + let s = BuiltinWarmState::default(); + assert!(s.try_begin(40000)); + s.finish(40000, true); + assert!(s.try_begin(40001), "a second load primes on its own port"); + assert!(s.is_warming(), "the 40001 prime is in flight"); + // Engine torn down: the next load can reuse either port. reset() drops + // both the primed record and the in-flight slot so a reused port primes + // fresh instead of deduping against the dead process. + s.reset(); + assert!(!s.is_warming(), "reset clears the in-flight slot"); + assert!( + s.try_begin(40000), + "reset clears the primed record so a reused port primes fresh" + ); + } + + #[test] + fn warm_state_is_warming_tracks_in_flight() { + let s = BuiltinWarmState::default(); + assert!(!s.is_warming(), "nothing is in flight at rest"); + assert!(s.try_begin(40000)); + assert!(s.is_warming(), "a begun prime reports warming"); + s.finish(40000, true); + assert!(!s.is_warming(), "a finished prime is no longer warming"); + } + + #[test] + fn builtin_loaded_model_names_the_resident_blob_not_the_selection() { + use std::path::PathBuf; + let resident = PathBuf::from("/blobs/sha_mistral"); + let installed = vec![ + ("Gemma 4 12B".to_string(), PathBuf::from("/blobs/sha_gemma")), + ("Mistral Nemo 12B".to_string(), resident.clone()), + ]; + + // Loaded: the engine is serving the Mistral blob, so the resident model + // is named from the live `model_path`, independent of any selection. + let mut loaded = engine_status("loaded", Some(40123)); + loaded.model_path = resident.display().to_string(); assert_eq!( - builtin_loaded_model(&engine_status("stopped", None), "org/repo:m.gguf"), - None + builtin_loaded_model(&loaded, &installed), + Some("Mistral Nemo 12B".to_string()) ); + + // Not loaded: nothing is resident even if a path lingers in the status. + let mut stopped = engine_status("stopped", None); + stopped.model_path = resident.display().to_string(); + assert_eq!(builtin_loaded_model(&stopped, &installed), None); + + // Loaded but the resident blob matches no installed row: report nothing + // rather than guessing a name. + let mut orphan = engine_status("loaded", Some(40123)); + orphan.model_path = "/blobs/sha_unknown".to_string(); + assert_eq!(builtin_loaded_model(&orphan, &installed), None); + + // Loaded with an empty path (defensive): nothing to name. assert_eq!( - builtin_loaded_model(&engine_status("loaded", Some(40123)), ""), - None, - "no picked model means nothing to report even while loaded" + builtin_loaded_model(&engine_status("loaded", Some(40123)), &installed), + None ); } @@ -1563,6 +1865,16 @@ mod tests { async fn kill(&mut self) { let _ = self.exit_tx.send(true); } + fn stderr_tail(&self) -> String { + String::new() + } + } + + #[test] + fn instant_child_has_no_stderr_tail() { + let (exit_tx, exit_rx) = tokio::sync::watch::channel(false); + let child = InstantChild { exit_tx, exit_rx }; + assert_eq!(crate::engine::process::EngineChild::stderr_tail(&child), ""); } #[async_trait::async_trait] diff --git a/src-tauri/tauri.conf.json b/src-tauri/tauri.conf.json index 9cda66e8..f37deb67 100644 --- a/src-tauri/tauri.conf.json +++ b/src-tauri/tauri.conf.json @@ -28,11 +28,11 @@ "label": "settings", "title": "Thuki Settings", "url": "index.html#/settings", - "width": 580, + "width": 760, "height": 520, - "minWidth": 580, + "minWidth": 760, "minHeight": 280, - "maxWidth": 580, + "maxWidth": 760, "maxHeight": 700, "resizable": false, "fullscreen": false, diff --git a/src/__tests__/App.test.tsx b/src/__tests__/App.test.tsx index b6a0db22..58dc56e5 100644 --- a/src/__tests__/App.test.tsx +++ b/src/__tests__/App.test.tsx @@ -63,6 +63,7 @@ function makeDownloadCtx( cancelConfirm: vi.fn(), start: vi.fn(async () => {}), startRepo: vi.fn(async () => {}), + startById: vi.fn(async () => {}), cancel: vi.fn(async () => {}), retry: vi.fn(async () => {}), resume: vi.fn(async () => {}), diff --git a/src/components/DownloadProgress.tsx b/src/components/DownloadProgress.tsx index bf6ae948..509b5b14 100644 --- a/src/components/DownloadProgress.tsx +++ b/src/components/DownloadProgress.tsx @@ -28,6 +28,12 @@ export interface DownloadProgressProps { state: DownloadUiState; progress: DownloadProgressInfo | null; etaSeconds: number | null; + /** Cumulative bytes across weights + companion: the unified numerator. */ + combinedBytes?: number | null; + /** Full on-disk total (weights + companion): the unified denominator. */ + grandTotalBytes?: number | null; + /** Rolling download rate in bytes per second; drives the unified ETA. */ + speedBytesPerSec?: number | null; confirmInfo?: ConfirmInfo; onConfirm: () => void; onCancelConfirm: () => void; @@ -56,6 +62,67 @@ function gb(bytes: number): string { return (bytes / 1e9).toFixed(1); } +/** Inputs for the single download figures line. */ +export interface DownloadLineInput { + /** Per-file byte counts; the fallback when no grand total is known. */ + progress: DownloadProgressInfo | null; + /** Rolling ETA seconds for the per-file fallback path. */ + etaSeconds: number | null; + /** Cumulative bytes across weights + companion: the unified numerator. */ + combinedBytes: number | null; + /** Full on-disk total (weights + companion): the unified denominator. */ + grandTotalBytes: number | null; + /** Rolling rate; drives the unified ETA when present. */ + speedBytesPerSec: number | null; +} + +/** Percent plus a "x / y GB · ~eta" string, or null figures before any bytes. */ +export interface DownloadLine { + percent: number; + figures: string | null; +} + +/** + * One continuous progress reading. Prefers the unified weights + companion + * figure, so a vision download is a single bar to 100% that never resets + * between the two files; falls back to the current file's own byte counts for + * single-file repo downloads where no grand total is known up front. + */ +export function downloadLine({ + progress, + etaSeconds, + combinedBytes, + grandTotalBytes, + speedBytesPerSec, +}: DownloadLineInput): DownloadLine { + let bytes: number; + let total: number; + let eta: number | null; + if ( + grandTotalBytes !== null && + grandTotalBytes > 0 && + combinedBytes !== null + ) { + bytes = combinedBytes; + total = grandTotalBytes; + eta = + speedBytesPerSec !== null + ? Math.max(0, Math.round((total - bytes) / speedBytesPerSec)) + : etaSeconds; + } else if (progress !== null && progress.totalBytes > 0) { + bytes = progress.bytes; + total = progress.totalBytes; + eta = etaSeconds; + } else { + return { percent: 0, figures: null }; + } + const percent = Math.min(100, Math.floor((bytes / total) * 100)); + const figures = + `${gb(bytes)} / ${gb(total)} GB` + + (eta !== null ? ` · ~${formatEta(eta)}` : ''); + return { percent, figures }; +} + /** Failure headline per kind. Exact copy; consumed verbatim by tests. */ function failureHeadline(kind: string, message: string): string { switch (kind) { @@ -82,6 +149,9 @@ export function DownloadProgress({ state, progress, etaSeconds, + combinedBytes = null, + grandTotalBytes = null, + speedBytesPerSec = null, confirmInfo, onConfirm, onCancelConfirm, @@ -120,96 +190,84 @@ export function DownloadProgress({ ); case 'downloading': - case 'downloading_mmproj': + case 'downloading_mmproj': { + const { percent, figures } = downloadLine({ + progress, + etaSeconds, + combinedBytes, + grandTotalBytes, + speedBytesPerSec, + }); return ( - - - {state.phase === 'downloading_mmproj' - ? 'Downloading vision companion' - : 'Downloading model'} - - 0 - ? Math.floor((progress.bytes / progress.totalBytes) * 100) - : 0 - } - /> - {progress ? ( - - {gb(progress.bytes)} GB of {gb(progress.totalBytes)} GB - - ) : null} - {etaSeconds !== null ? ( - About {formatEta(etaSeconds)} left - ) : null} - - - - + }> + + + {percent}% + + {figures !== null ? ` · ${figures}` : ''} + {state.phase === 'downloading_mmproj' ? ' · finishing vision' : ''} + + + + ); + } case 'verifying': return ( - - Verifying download - - + }> + Verifying download + ); case 'installing': return ( - - Installing - - + }> + Installing + ); case 'warming_up': return ( - - Starting the engine - - + }> + Starting the engine + ); case 'ready': return ( - - - - - - - Ready - - - + }> + + + + + Ready + + ); case 'failed': return ( - - {failureHeadline(state.kind, state.message)} - {state.kind === 'http' ? {state.message} : null} - - - {onChooseAnother ? ( - + }> + + + {failureHeadline(state.kind, state.message)} + + {state.kind === 'http' ? ( + {state.message} ) : null} - - + + + + {onChooseAnother ? ( + + ) : null} + ); default: // idle and resume_pending have no progress UI; the picker owns them. @@ -217,7 +275,7 @@ export function DownloadProgress({ } } -function Card({ children }: { children: React.ReactNode }) { +export function Card({ children }: { children: React.ReactNode }) { return (

+ {children} + {edge} +
+ ); } -function ProgressBar({ percent = 0, indeterminate = false }: ProgressBarProps) { +/** + * The 2px progress edge. Determinate fills to `percent`; indeterminate shows a + * fixed segment. `tone` is the warm accent while working and green at ready. + */ +function Edge({ + percent = 0, + indeterminate = false, + tone, +}: { + percent?: number; + indeterminate?: boolean; + tone: 'accent' | 'green' | 'red'; +}) { + const fill = + tone === 'green' + ? '#5fcf86' + : tone === 'red' + ? '#ef6b6b' + : 'linear-gradient(90deg, #ffa06f, #d45a1e)'; return ( -
- {!indeterminate ? ( -
- {percent}% -
- ) : null} -
+ -
-
-
+ /> + + ); +} + +/** A borderless text button for the inline hairline actions (Retry, etc.). */ +function GhostButton({ + label, + tone, + onClick, +}: { + label: string; + tone: 'accent' | 'muted'; + onClick: () => void; +}) { + return ( + + ); +} + +/** The single status line for the post-download steps (and the ready check). */ +function StatusText({ + children, + ready = false, +}: { + children: React.ReactNode; + ready?: boolean; +}) { + return ( +

+ {children} +

+ ); +} + +/** The inline cancel control: a quiet × that warms on hover via the theme. */ +function CancelX({ onClick }: { onClick: () => void }) { + return ( + ); } @@ -328,7 +503,11 @@ interface FlowButtonProps { primary?: boolean; } -function FlowButton({ label, onClick, primary = false }: FlowButtonProps) { +export function FlowButton({ + label, + onClick, + primary = false, +}: FlowButtonProps) { return (
); diff --git a/src/components/ErrorCard.tsx b/src/components/ErrorCard.tsx index 9deba6ad..e99ae7ad 100644 --- a/src/components/ErrorCard.tsx +++ b/src/components/ErrorCard.tsx @@ -9,6 +9,9 @@ const barColors: Record = { EngineUnreachable: '#ef4444', // Same red as EngineUnreachable: a sidecar crash is equally severe. EngineStartFailed: '#ef4444', + // Amber, not red: an unsupported model architecture is a "pick another + // model" nudge, not an engine crash, so it shares the warning hue. + ModelUnsupported: '#f59e0b', ModelNotFound: '#f59e0b', // Same accent as ModelNotFound: this is a configuration/setup nudge, // not a daemon failure, so the warning hue (amber) is the right read. diff --git a/src/components/ModelPickerPanel.tsx b/src/components/ModelPickerPanel.tsx index d163e004..80b2c227 100644 --- a/src/components/ModelPickerPanel.tsx +++ b/src/components/ModelPickerPanel.tsx @@ -22,6 +22,16 @@ export const OLLAMA_LIBRARY_URL = 'https://ollama.com/library'; export const OLLAMA_PILL_TOOLTIP = 'Browse and pull any model on Ollama. Thuki auto-detects it.'; +/** + * Pill shown on models whose reasoning cannot be turned off (capability + * `reasoningAlways`). Positive, non-alarming framing per industry practice + * (Anthropic/OpenAI/Gemini never present reasoning as a caveat): the goal is + * to set expectations, not warn. `/think` is a no-op for these models. + */ +export const ALWAYS_THINKS_LABEL = 'Always thinks'; +export const ALWAYS_THINKS_TOOLTIP = + 'This model reasons before every answer, so expect a brief pause. Its thinking shows in a collapsible block above each reply.'; + const CHECK_ICON_PATH = ( - + {labelFor(model)} {capLabel && ( @@ -322,6 +337,18 @@ export function ModelPickerPanel({ )} + {alwaysThinks && ( + // A plain span with a native title: the row is a @@ -164,10 +164,13 @@ describe('Tooltip', () => { '[style*="position: fixed"]', ) as HTMLElement | null; expect(fixedBox).not.toBeNull(); - // The inner content div (under the fixed-positioned outer + motion - // wrapper) carries the explicit 225px width style. - const inner = fixedBox?.querySelector('div[style*="width"]'); + // The inner content div carries a max-width (not a fixed width) so the box + // shrinks to short content instead of always being 225px wide. + const inner = fixedBox?.querySelector( + 'div[style*="max-width"]', + ) as HTMLElement | null; expect(inner).not.toBeNull(); - expect((inner as HTMLElement).style.width).toBe('225px'); + expect(inner!.style.maxWidth).toBe('225px'); + expect(inner!.style.width).toBe(''); }); }); diff --git a/src/components/__tests__/downloadLine.test.ts b/src/components/__tests__/downloadLine.test.ts new file mode 100644 index 00000000..5b4e5187 --- /dev/null +++ b/src/components/__tests__/downloadLine.test.ts @@ -0,0 +1,73 @@ +import { describe, it, expect } from 'vitest'; +import { downloadLine } from '../DownloadProgress'; + +const base = { + progress: null, + etaSeconds: null, + combinedBytes: null, + grandTotalBytes: null, + speedBytesPerSec: null, +}; + +describe('downloadLine', () => { + it('uses the unified combined/grand-total figure with ETA from etaSeconds', () => { + const line = downloadLine({ + ...base, + combinedBytes: 1.2e9, + grandTotalBytes: 2.0e9, + etaSeconds: 240, + }); + expect(line).toEqual({ percent: 60, figures: '1.2 / 2.0 GB · ~4m' }); + }); + + it('derives the unified ETA from the rolling speed when present', () => { + const line = downloadLine({ + ...base, + combinedBytes: 1.0e9, + grandTotalBytes: 2.0e9, + speedBytesPerSec: 1e8, + // etaSeconds is ignored on the unified path when a speed is available. + etaSeconds: 9999, + }); + expect(line).toEqual({ percent: 50, figures: '1.0 / 2.0 GB · ~10s' }); + }); + + it('falls back to per-file progress when no grand total is known', () => { + const line = downloadLine({ + ...base, + progress: { file: 'w.gguf', bytes: 2.5e9, totalBytes: 8.2e9 }, + etaSeconds: 300, + }); + expect(line).toEqual({ percent: 30, figures: '2.5 / 8.2 GB · ~5m' }); + }); + + it('clamps the unified percent to 100 and omits ETA when unmeasurable', () => { + const line = downloadLine({ + ...base, + combinedBytes: 2.1e9, + grandTotalBytes: 2.0e9, + }); + expect(line).toEqual({ percent: 100, figures: '2.1 / 2.0 GB' }); + }); + + it('formats a multi-hour ETA on the per-file path', () => { + const line = downloadLine({ + ...base, + progress: { file: 'w.gguf', bytes: 1e9, totalBytes: 10e9 }, + etaSeconds: 7300, + }); + expect(line).toEqual({ percent: 10, figures: '1.0 / 10.0 GB · ~2h 1m' }); + }); + + it('returns 0% and no figures before any bytes are known', () => { + expect(downloadLine(base)).toEqual({ percent: 0, figures: null }); + }); + + it('returns 0% and no figures when the per-file total is zero', () => { + const line = downloadLine({ + ...base, + progress: { file: 'w.gguf', bytes: 10, totalBytes: 0 }, + }); + expect(line).toEqual({ percent: 0, figures: null }); + }); +}); diff --git a/src/contexts/DownloadsContext.tsx b/src/contexts/DownloadsContext.tsx new file mode 100644 index 00000000..43c4cca2 --- /dev/null +++ b/src/contexts/DownloadsContext.tsx @@ -0,0 +1,306 @@ +/** + * Settings-window download registry: many model downloads at once. + * + * Unlike onboarding (one starter at a time, {@link useDownloadModel}), the + * Settings → Discover panes let a user fire off several downloads in parallel. + * This provider holds one live download per key (the backend allows concurrent + * downloads keyed the same way; see `DownloadState` in `models/mod.rs`) and, + * sitting at the Settings window root, keeps every one of them alive across the + * Library / Discover / Providers and Staff picks / Browse all tab switches that + * unmount the panes. + * + * Each entry advances through the shared {@link reduceDownloadEvent} reducer + * (engine handoff off: a Settings download finishes at `ready`). A row looks up + * its own download by {@link downloadKey}; absence means "not downloading". + */ + +import { + createContext, + use, + useCallback, + useMemo, + useRef, + useState, + type ReactNode, +} from 'react'; +import { Channel, invoke } from '@tauri-apps/api/core'; +import { + type DownloadAccumulator, + type DownloadProgressInfo, + type DownloadUiState, + isDownloadInFlight, + reduceDownloadEvent, + startingAccumulator, +} from '../hooks/downloadReducer'; +import { downloadKey, type DownloadIdentity } from '../hooks/downloadKey'; +import type { DownloadEvent } from '../types/starter'; + +/** What the Settings panes start: a Staff Picks id or a Browse-all repo file. */ +type RegistryIdentity = Extract< + DownloadIdentity, + { kind: 'staff' } | { kind: 'repo' } +>; + +/** The render-facing view of one live download. */ +export interface DownloadView { + state: DownloadUiState; + progress: DownloadProgressInfo | null; + etaSeconds: number | null; + combinedBytes: number | null; + speedBytesPerSec: number | null; +} + +/** + * Per-repo roll-up of a family's live downloads, by state, for the collapsed + * Browse-all row pills. Counts only the in-memory registry's active states: + * `downloading` (weights or its mmproj companion), `verifying`, and `failed`. + * Terminal-success (`ready`) is omitted (it clears immediately), and paused + * partials are not registry state at all (they live in the per-file listing + * read on expand), so neither is summarisable here. + */ +export interface RepoDownloadSummary { + downloading: number; + verifying: number; + failed: number; +} + +/** Internal record: the identity (for retry replay) plus its accumulator. */ +interface RegistryEntry { + identity: RegistryIdentity; + acc: DownloadAccumulator; +} + +export interface DownloadsContextValue { + /** The live download for `key` ({@link downloadKey}), or undefined when none. */ + get: (key: string) => DownloadView | undefined; + /** + * Whether any live download belongs to `repo`. Lets a Browse-all repo row + * re-expand itself after a tab switch remounts it collapsed, before its quant + * list (which would reveal the per-file downloads) has been fetched. + */ + hasRepoDownload: (repo: string) => boolean; + /** + * Live download counts for `repo`, by state, for the collapsed-row pills. + * Counts only repo-kind downloads belonging to `repo`; see + * {@link RepoDownloadSummary}. + */ + repoDownloadSummary: (repo: string) => RepoDownloadSummary; + /** Start (or resume) a Staff Picks catalog download by its stable id. */ + startStaffPick: (id: string) => void; + /** Start (or resume) a Browse-all repo download by repo + GGUF file. */ + startRepoDownload: (repo: string, file: string) => void; + /** Cancel the download for `key`; the partial is kept for a later resume. */ + cancel: (key: string) => void; + /** Retry the failed download for `key` (replays its original command). */ + retry: (key: string) => void; + /** Discard a kept partial by blob sha256. */ + discard: (sha256: string) => Promise; + /** Drop a terminal (ready / failed) entry so its row returns to normal. */ + clear: (key: string) => void; +} + +const DownloadsContext = createContext(null); + +/** The download command + args for a registry identity. */ +function commandFor( + identity: RegistryIdentity, +): [string, Record] { + switch (identity.kind) { + case 'staff': + return ['download_staff_pick', { id: identity.id }]; + case 'repo': + return [ + 'download_repo_model', + { repo: identity.repo, file: identity.file }, + ]; + } +} + +export function DownloadsProvider({ children }: { children: ReactNode }) { + const [entries, setEntries] = useState>( + () => new Map(), + ); + // Latest entries for the imperative retry path (reads identity outside React + // state). Mirrored every render so it never lags the rendered map. + const entriesRef = useRef(entries); + entriesRef.current = entries; + + const begin = useCallback((identity: RegistryIdentity) => { + const key = downloadKey(identity); + // A fast double-click, or a click landing before the row re-renders to hide + // its button, would fire a second backend download that claim_download + // rejects, flashing a spurious failure over the live one. Ignore re-entry + // while this key is already downloading; a retry of a terminal + // (failed/ready) entry is not in flight, so it still proceeds. + const existing = entriesRef.current.get(key); + if (existing && isDownloadInFlight(existing.acc.state.phase)) { + return; + } + const [command, args] = commandFor(identity); + setEntries((prev) => { + const next = new Map(prev); + next.set(key, { identity, acc: startingAccumulator() }); + return next; + }); + const channel = new Channel(); + channel.onmessage = (event) => + setEntries((prev) => { + const cur = prev.get(key); + // Entry cleared (Choose another) while a late event was in flight: drop. + if (!cur) return prev; + const acc = reduceDownloadEvent(cur.acc, event, false); + const next = new Map(prev); + // A Cancelled event resets to idle: prune so the row returns to its + // Paused/partial controls instead of lingering as a dead download. + if (acc.state.phase === 'idle') { + next.delete(key); + } else { + next.set(key, { ...cur, acc }); + } + return next; + }); + void invoke(command, { ...args, key, onEvent: channel }).catch((err) => + // A rejected invoke means the command failed before streaming (e.g. the + // repo spec could not be resolved), so no channel event will arrive: mark + // the entry failed from the identity in scope. + setEntries((prev) => { + const next = new Map(prev); + next.set(key, { + identity, + acc: { + ...startingAccumulator(), + state: { phase: 'failed', kind: 'other', message: String(err) }, + }, + }); + return next; + }), + ); + }, []); + + const startStaffPick = useCallback( + (id: string) => begin({ kind: 'staff', id }), + [begin], + ); + + const startRepoDownload = useCallback( + (repo: string, file: string) => begin({ kind: 'repo', repo, file }), + [begin], + ); + + const cancel = useCallback((key: string) => { + void invoke('cancel_model_download', { key }); + }, []); + + const retry = useCallback( + (key: string) => { + const entry = entriesRef.current.get(key); + if (entry) begin(entry.identity); + }, + [begin], + ); + + const discard = useCallback(async (sha256: string) => { + await invoke('discard_partial_download', { sha256 }); + }, []); + + const clear = useCallback((key: string) => { + setEntries((prev) => { + if (!prev.has(key)) return prev; + const next = new Map(prev); + next.delete(key); + return next; + }); + }, []); + + const get = useCallback( + (key: string): DownloadView | undefined => { + const entry = entries.get(key); + if (!entry) return undefined; + const { state, progress, etaSeconds, combinedBytes, speedBytesPerSec } = + entry.acc; + return { state, progress, etaSeconds, combinedBytes, speedBytesPerSec }; + }, + [entries], + ); + + const hasRepoDownload = useCallback( + (repo: string): boolean => { + for (const entry of entries.values()) { + if (entry.identity.kind === 'repo' && entry.identity.repo === repo) { + return true; + } + } + return false; + }, + [entries], + ); + + const repoDownloadSummary = useCallback( + (repo: string): RepoDownloadSummary => { + const summary: RepoDownloadSummary = { + downloading: 0, + verifying: 0, + failed: 0, + }; + for (const entry of entries.values()) { + if (entry.identity.kind !== 'repo' || entry.identity.repo !== repo) { + continue; + } + switch (entry.acc.state.phase) { + case 'downloading': + case 'downloading_mmproj': + summary.downloading += 1; + break; + case 'verifying': + summary.verifying += 1; + break; + case 'failed': + summary.failed += 1; + break; + } + } + return summary; + }, + [entries], + ); + + const value = useMemo( + () => ({ + get, + hasRepoDownload, + repoDownloadSummary, + startStaffPick, + startRepoDownload, + cancel, + retry, + discard, + clear, + }), + [ + get, + hasRepoDownload, + repoDownloadSummary, + startStaffPick, + startRepoDownload, + cancel, + retry, + discard, + clear, + ], + ); + + return {children}; +} + +/** + * Returns the Settings download registry. Throws when no `DownloadsProvider` + * wraps the caller: a live multi-download has no sensible static fallback, so a + * missing provider is a wiring bug. + */ +export function useDownloads(): DownloadsContextValue { + const value = use(DownloadsContext); + if (value === null) { + throw new Error('useDownloads must be used within a DownloadsProvider'); + } + return value; +} diff --git a/src/contexts/__tests__/DownloadContext.test.tsx b/src/contexts/__tests__/DownloadContext.test.tsx index 8f3fe83d..dade3ec9 100644 --- a/src/contexts/__tests__/DownloadContext.test.tsx +++ b/src/contexts/__tests__/DownloadContext.test.tsx @@ -131,6 +131,7 @@ describe('DownloadContext', () => { expect(result.current.state).toEqual({ phase: 'downloading' }); expect(invoke).toHaveBeenCalledWith('download_starter', { tier: 'balanced', + key: 'tier:balanced', onEvent: expect.anything(), }); }); @@ -154,6 +155,7 @@ describe('DownloadContext', () => { expect(result.current.state).toEqual({ phase: 'downloading' }); expect(invoke).toHaveBeenCalledWith('download_starter', { tier: 'fast', + key: 'tier:fast', onEvent: expect.anything(), }); }); @@ -186,7 +188,9 @@ describe('DownloadContext', () => { // until the backend Cancelled lands (slot released) so a resume cannot // race; meanwhile `isPausing` is true for instant "Pausing…" feedback. expect(result.current.pausedBytes).toBe(60); - expect(invoke).toHaveBeenCalledWith('cancel_model_download'); + expect(invoke).toHaveBeenCalledWith('cancel_model_download', { + key: 'tier:balanced', + }); expect(result.current.isPaused).toBe(false); expect(result.current.isPausing).toBe(true); @@ -275,6 +279,7 @@ describe('DownloadContext', () => { expect(result.current.state).toEqual({ phase: 'downloading' }); expect(invoke).toHaveBeenCalledWith('download_starter', { tier: 'fast', + key: 'tier:fast', onEvent: expect.anything(), }); }); diff --git a/src/contexts/__tests__/DownloadsContext.test.tsx b/src/contexts/__tests__/DownloadsContext.test.tsx new file mode 100644 index 00000000..919c6b36 --- /dev/null +++ b/src/contexts/__tests__/DownloadsContext.test.tsx @@ -0,0 +1,335 @@ +import { renderHook, act } from '@testing-library/react'; +import { describe, it, expect, beforeEach, afterEach, vi } from 'vitest'; +import type { ReactNode } from 'react'; +import { DownloadsProvider, useDownloads } from '../DownloadsContext'; +import { + invoke, + enableChannelCapture, + getLastChannel, + resetChannelCapture, + type Channel, +} from '../../testUtils/mocks/tauri'; +import { downloadKey } from '../../hooks/downloadKey'; +import type { DownloadEvent } from '../../types/starter'; + +/** The captured download channel, typed for simulateMessage calls. */ +function channel(): Channel { + return getLastChannel() as Channel; +} + +function wrapper({ children }: { children: ReactNode }) { + return {children}; +} + +const STAFF_KEY = downloadKey({ kind: 'staff', id: 'gemma-4-12b' }); +const REPO_KEY = downloadKey({ + kind: 'repo', + repo: 'org/repo', + file: 'w.gguf', +}); + +describe('DownloadsContext', () => { + beforeEach(() => { + invoke.mockReset(); + enableChannelCapture(); + }); + + afterEach(() => { + resetChannelCapture(); + vi.restoreAllMocks(); + }); + + it('throws when useDownloads is called outside a provider', () => { + const spy = vi.spyOn(console, 'error').mockImplementation(() => {}); + expect(() => renderHook(() => useDownloads())).toThrow( + 'useDownloads must be used within a DownloadsProvider', + ); + spy.mockRestore(); + }); + + it('has no downloads when idle', () => { + const { result } = renderHook(() => useDownloads(), { wrapper }); + expect(result.current.get(STAFF_KEY)).toBeUndefined(); + expect(result.current.hasRepoDownload('org/repo')).toBe(false); + }); + + it('starts a Staff Picks download keyed by its id', async () => { + const { result } = renderHook(() => useDownloads(), { wrapper }); + + await act(async () => { + result.current.startStaffPick('gemma-4-12b'); + }); + + expect(result.current.get(STAFF_KEY)?.state).toEqual({ + phase: 'downloading', + }); + expect(invoke).toHaveBeenCalledWith('download_staff_pick', { + id: 'gemma-4-12b', + key: STAFF_KEY, + onEvent: expect.anything(), + }); + }); + + it('advances a download through its channel events to ready', async () => { + const { result } = renderHook(() => useDownloads(), { wrapper }); + await act(async () => { + result.current.startStaffPick('gemma-4-12b'); + }); + + act(() => + channel().simulateMessage({ + type: 'Started', + data: { file: 'w.gguf', total_bytes: 100, resumed_from: 0 }, + }), + ); + act(() => + channel().simulateMessage({ + type: 'Progress', + data: { file: 'w.gguf', bytes: 60, total_bytes: 100 }, + }), + ); + expect(result.current.get(STAFF_KEY)?.combinedBytes).toBe(60); + + act(() => channel().simulateMessage({ type: 'AllDone' })); + expect(result.current.get(STAFF_KEY)?.state).toEqual({ phase: 'ready' }); + }); + + it('prunes an entry when its download is cancelled', async () => { + const { result } = renderHook(() => useDownloads(), { wrapper }); + await act(async () => { + result.current.startStaffPick('gemma-4-12b'); + }); + expect(result.current.get(STAFF_KEY)).toBeDefined(); + + act(() => channel().simulateMessage({ type: 'Cancelled' })); + expect(result.current.get(STAFF_KEY)).toBeUndefined(); + }); + + it('marks a download failed when the start invoke rejects', async () => { + invoke.mockImplementation(async (cmd: string) => { + if (cmd === 'download_staff_pick') + throw 'a download is already in progress'; + }); + const { result } = renderHook(() => useDownloads(), { wrapper }); + + await act(async () => { + result.current.startStaffPick('gemma-4-12b'); + await Promise.resolve(); + }); + + expect(result.current.get(STAFF_KEY)?.state).toEqual({ + phase: 'failed', + kind: 'other', + message: 'a download is already in progress', + }); + }); + + it('ignores a re-entrant start while the same key is already downloading', async () => { + const { result } = renderHook(() => useDownloads(), { wrapper }); + await act(async () => { + result.current.startStaffPick('gemma-4-12b'); + }); + // A second click before the row hides its button must not fire a second + // backend download (which claim_download would reject as a spurious flash). + await act(async () => { + result.current.startStaffPick('gemma-4-12b'); + }); + expect( + invoke.mock.calls.filter((c) => c[0] === 'download_staff_pick'), + ).toHaveLength(1); + expect(result.current.get(STAFF_KEY)?.state).toEqual({ + phase: 'downloading', + }); + }); + + it('cancel targets the keyed download', async () => { + const { result } = renderHook(() => useDownloads(), { wrapper }); + await act(async () => { + result.current.cancel(STAFF_KEY); + }); + expect(invoke).toHaveBeenCalledWith('cancel_model_download', { + key: STAFF_KEY, + }); + }); + + it('retry replays the failed download, clear forgets it', async () => { + const { result } = renderHook(() => useDownloads(), { wrapper }); + await act(async () => { + result.current.startStaffPick('gemma-4-12b'); + }); + act(() => + channel().simulateMessage({ + type: 'Failed', + data: { kind: 'http', message: 'HTTP 500' }, + }), + ); + + await act(async () => { + result.current.retry(STAFF_KEY); + }); + expect( + invoke.mock.calls.filter((c) => c[0] === 'download_staff_pick'), + ).toHaveLength(2); + + // A retry with no entry for the key is a no-op (nothing to replay). + invoke.mockClear(); + await act(async () => { + result.current.retry('staff:does-not-exist'); + }); + expect(invoke).not.toHaveBeenCalled(); + + act(() => + channel().simulateMessage({ + type: 'Failed', + data: { kind: 'http', message: 'again' }, + }), + ); + act(() => { + result.current.clear(STAFF_KEY); + }); + expect(result.current.get(STAFF_KEY)).toBeUndefined(); + // Clearing a key with no entry is a harmless no-op. + act(() => { + result.current.clear('staff:does-not-exist'); + }); + }); + + it('discard removes a kept partial by sha', async () => { + const { result } = renderHook(() => useDownloads(), { wrapper }); + await act(async () => { + await result.current.discard('a'.repeat(64)); + }); + expect(invoke).toHaveBeenCalledWith('discard_partial_download', { + sha256: 'a'.repeat(64), + }); + }); + + it('tracks repo downloads for the re-expand check', async () => { + const { result } = renderHook(() => useDownloads(), { wrapper }); + await act(async () => { + result.current.startRepoDownload('org/repo', 'w.gguf'); + }); + expect(result.current.get(REPO_KEY)?.state).toEqual({ + phase: 'downloading', + }); + expect(result.current.hasRepoDownload('org/repo')).toBe(true); + expect(result.current.hasRepoDownload('other/repo')).toBe(false); + expect(invoke).toHaveBeenCalledWith('download_repo_model', { + repo: 'org/repo', + file: 'w.gguf', + key: REPO_KEY, + onEvent: expect.anything(), + }); + }); + + it('reports zero counts for a repo with no live downloads', () => { + const { result } = renderHook(() => useDownloads(), { wrapper }); + expect(result.current.repoDownloadSummary('org/repo')).toEqual({ + downloading: 0, + verifying: 0, + failed: 0, + }); + }); + + it("counts a repo's live downloads by state, mmproj as downloading, excluding ready and other repos", async () => { + const { result } = renderHook(() => useDownloads(), { wrapper }); + + // a.gguf: plain downloading (default phase on start). + await act(async () => { + result.current.startRepoDownload('org/repo', 'a.gguf'); + }); + + // b.gguf: a second Started flips it to downloading_mmproj (still downloading). + await act(async () => { + result.current.startRepoDownload('org/repo', 'b.gguf'); + }); + const chB = channel(); + act(() => + chB.simulateMessage({ + type: 'Started', + data: { file: 'b.gguf', total_bytes: 100, resumed_from: 0 }, + }), + ); + act(() => + chB.simulateMessage({ + type: 'Started', + data: { file: 'b.mmproj', total_bytes: 50, resumed_from: 0 }, + }), + ); + + // c.gguf: verifying. + await act(async () => { + result.current.startRepoDownload('org/repo', 'c.gguf'); + }); + act(() => + channel().simulateMessage({ + type: 'Verifying', + data: { file: 'c.gguf' }, + }), + ); + + // d.gguf: failed. + await act(async () => { + result.current.startRepoDownload('org/repo', 'd.gguf'); + }); + act(() => + channel().simulateMessage({ + type: 'Failed', + data: { kind: 'http', message: 'HTTP 500' }, + }), + ); + + // e.gguf: ready is terminal-success and must not appear as a live pill. + await act(async () => { + result.current.startRepoDownload('org/repo', 'e.gguf'); + }); + act(() => channel().simulateMessage({ type: 'AllDone' })); + + // A different repo's download must not leak into org/repo's counts. + await act(async () => { + result.current.startRepoDownload('other/repo', 'z.gguf'); + }); + + expect(result.current.repoDownloadSummary('org/repo')).toEqual({ + downloading: 2, + verifying: 1, + failed: 1, + }); + expect(result.current.repoDownloadSummary('other/repo')).toEqual({ + downloading: 1, + verifying: 0, + failed: 0, + }); + }); + + it('excludes Staff Picks downloads from a repo summary', async () => { + const { result } = renderHook(() => useDownloads(), { wrapper }); + await act(async () => { + result.current.startStaffPick('gemma-4-12b'); + }); + expect(result.current.repoDownloadSummary('org/repo')).toEqual({ + downloading: 0, + verifying: 0, + failed: 0, + }); + }); + + it('ignores a late channel event after its entry is cleared', async () => { + const { result } = renderHook(() => useDownloads(), { wrapper }); + await act(async () => { + result.current.startStaffPick('gemma-4-12b'); + }); + const late = channel(); + act(() => { + result.current.clear(STAFF_KEY); + }); + // The download task may still emit; with no entry the event is dropped. + act(() => + late.simulateMessage({ + type: 'Progress', + data: { file: 'w.gguf', bytes: 10, total_bytes: 100 }, + }), + ); + expect(result.current.get(STAFF_KEY)).toBeUndefined(); + }); +}); diff --git a/src/hooks/__tests__/useDownloadModel.test.tsx b/src/hooks/__tests__/useDownloadModel.test.tsx index 7ba24e82..80244d4d 100644 --- a/src/hooks/__tests__/useDownloadModel.test.tsx +++ b/src/hooks/__tests__/useDownloadModel.test.tsx @@ -63,6 +63,7 @@ describe('useDownloadModel', () => { expect(result.current.state).toEqual({ phase: 'downloading' }); expect(invoke).toHaveBeenCalledWith('download_starter', { tier: 'balanced', + key: 'tier:balanced', onEvent: expect.anything(), }); @@ -290,7 +291,9 @@ describe('useDownloadModel', () => { expect(result.current.progress?.bytes).toBe(40); await act(() => result.current.cancel()); - expect(invoke).toHaveBeenCalledWith('cancel_model_download'); + expect(invoke).toHaveBeenCalledWith('cancel_model_download', { + key: 'tier:fast', + }); // State waits for the backend's Cancelled event. expect(result.current.state).toEqual({ phase: 'downloading' }); @@ -325,6 +328,7 @@ describe('useDownloadModel', () => { expect(result.current.state).toEqual({ phase: 'downloading' }); expect(invoke).toHaveBeenLastCalledWith('download_starter', { tier: 'smartest', + key: 'tier:smartest', onEvent: expect.anything(), }); }); @@ -343,6 +347,7 @@ describe('useDownloadModel', () => { expect(invoke).toHaveBeenCalledWith('download_repo_model', { repo: 'owner/repo', file: 'w.gguf', + key: 'repo:owner/repo\nw.gguf', onEvent: expect.anything(), }); act(() => channel().simulateMessage({ type: 'AllDone' })); @@ -364,6 +369,7 @@ describe('useDownloadModel', () => { expect(invoke).toHaveBeenLastCalledWith('download_repo_model', { repo: 'owner/repo', file: 'w.gguf', + key: 'repo:owner/repo\nw.gguf', onEvent: expect.anything(), }); }); @@ -379,6 +385,38 @@ describe('useDownloadModel', () => { }); }); + it('starts a Staff Picks download through download_staff_pick', async () => { + const { result } = renderHook(() => useDownloadModel()); + await act(() => result.current.startById('gemma-4-12b')); + expect(result.current.state).toEqual({ phase: 'downloading' }); + expect(invoke).toHaveBeenCalledWith('download_staff_pick', { + id: 'gemma-4-12b', + key: 'staff:gemma-4-12b', + onEvent: expect.anything(), + }); + act(() => channel().simulateMessage({ type: 'AllDone' })); + expect(result.current.state).toEqual({ phase: 'ready' }); + }); + + it('retries the last Staff Picks download after a failure', async () => { + const { result } = renderHook(() => useDownloadModel()); + await act(() => result.current.startById('gpt-oss-20b')); + act(() => + channel().simulateMessage({ + type: 'Failed', + data: { kind: 'http', message: 'HTTP 500' }, + }), + ); + + await act(() => result.current.retry()); + expect(result.current.state).toEqual({ phase: 'downloading' }); + expect(invoke).toHaveBeenLastCalledWith('download_staff_pick', { + id: 'gpt-oss-20b', + key: 'staff:gpt-oss-20b', + onEvent: expect.anything(), + }); + }); + it('reset returns failed to idle and clears the stale progress', async () => { const { result } = renderHook(() => useDownloadModel()); await act(() => result.current.start('smartest')); @@ -430,6 +468,7 @@ describe('useDownloadModel', () => { expect(result.current.state).toEqual({ phase: 'downloading' }); expect(invoke).toHaveBeenCalledWith('download_starter', { tier: 'balanced', + key: 'tier:balanced', onEvent: expect.anything(), }); }); diff --git a/src/hooks/__tests__/useModelSelection.test.tsx b/src/hooks/__tests__/useModelSelection.test.tsx index 99ec2a16..28767447 100644 --- a/src/hooks/__tests__/useModelSelection.test.tsx +++ b/src/hooks/__tests__/useModelSelection.test.tsx @@ -1,11 +1,18 @@ import { renderHook, act } from '@testing-library/react'; -import { describe, it, expect, beforeEach } from 'vitest'; +import { describe, it, expect, beforeEach, vi } from 'vitest'; import { useModelSelection } from '../useModelSelection'; -import { invoke } from '../../testUtils/mocks/tauri'; +import { + invoke, + listen, + emitTauriEvent, + clearEventHandlers, +} from '../../testUtils/mocks/tauri'; describe('useModelSelection', () => { beforeEach(() => { invoke.mockReset(); + listen.mockClear(); + clearEventHandlers(); }); it('loads active and installed models from the backend', async () => { @@ -384,4 +391,90 @@ describe('useModelSelection', () => { rejectLate(new Error('late')); }); }); + + it('refreshes the picker when thuki://config-updated fires', async () => { + // A model change made from the other window (the Settings panel) writes + // config and broadcasts config-updated; the picker must re-pull so its + // active model and list match the new backend truth without a remount. + invoke + .mockResolvedValueOnce({ + active: 'gemma4:e2b', + all: ['gemma4:e2b', 'qwen2.5:7b'], + ollamaReachable: true, + }) + .mockResolvedValueOnce({ + active: 'qwen2.5:7b', + all: ['gemma4:e2b', 'qwen2.5:7b'], + ollamaReachable: true, + }); + + const { result } = renderHook(() => useModelSelection()); + await act(async () => {}); + expect(result.current.activeModel).toBe('gemma4:e2b'); + + await act(async () => { + emitTauriEvent('thuki://config-updated', null); + }); + + expect(result.current.activeModel).toBe('qwen2.5:7b'); + }); + + it('stops refreshing on config-updated after unmount', async () => { + invoke.mockResolvedValue({ + active: 'gemma4:e2b', + all: ['gemma4:e2b'], + ollamaReachable: true, + }); + + const { unmount } = renderHook(() => useModelSelection()); + await act(async () => {}); + const callsBeforeUnmount = invoke.mock.calls.length; + + unmount(); + await act(async () => { + emitTauriEvent('thuki://config-updated', null); + }); + + expect(invoke.mock.calls.length).toBe(callsBeforeUnmount); + }); + + it('survives a config-updated listen rejection without crashing', async () => { + listen.mockRejectedValueOnce(new Error('event bridge missing')); + invoke.mockResolvedValueOnce({ + active: 'gemma4:e2b', + all: ['gemma4:e2b'], + ollamaReachable: true, + }); + + const { result } = renderHook(() => useModelSelection()); + await act(async () => {}); + + expect(result.current.activeModel).toBe('gemma4:e2b'); + }); + + it('drops a late-arriving config-updated subscription after unmount', async () => { + let resolveListen!: (fn: () => void) => void; + const unlistenSpy = vi.fn(); + listen.mockImplementationOnce( + () => + new Promise<() => void>((resolve) => { + resolveListen = resolve; + }), + ); + invoke.mockResolvedValueOnce({ + active: 'gemma4:e2b', + all: ['gemma4:e2b'], + ollamaReachable: true, + }); + + const { unmount } = renderHook(() => useModelSelection()); + await act(async () => {}); + unmount(); + + await act(async () => { + resolveListen(unlistenSpy); + }); + + expect(unlistenSpy).toHaveBeenCalledTimes(1); + }); }); diff --git a/src/hooks/downloadKey.ts b/src/hooks/downloadKey.ts new file mode 100644 index 00000000..91e8ee65 --- /dev/null +++ b/src/hooks/downloadKey.ts @@ -0,0 +1,32 @@ +/** + * Stable per-download identity and its backend slot key. + * + * The backend keys its concurrent-download slots by an opaque string the + * frontend supplies (see `DownloadState` in `models/mod.rs`). Deriving that key + * in one place keeps the onboarding hook ({@link useDownloadModel}) and the + * Settings download registry ({@link useDownloads}) naming the same download the + * same way, so the backend's per-key dedupe behaves predictably across both. + */ + +/** What a download produces, enough to name it and to replay/display it. */ +export type DownloadIdentity = + | { kind: 'tier'; tier: string } + | { kind: 'staff'; id: string } + | { kind: 'repo'; repo: string; file: string }; + +/** + * The backend slot key for a download. Kind-prefixed so a Staff Picks id can + * never collide with a repo path, and newline-joined for repos (a newline + * cannot appear in a Hugging Face repo id or GGUF filename) so a `repo`/`file` + * pair maps to exactly one key. + */ +export function downloadKey(identity: DownloadIdentity): string { + switch (identity.kind) { + case 'tier': + return `tier:${identity.tier}`; + case 'staff': + return `staff:${identity.id}`; + case 'repo': + return `repo:${identity.repo}\n${identity.file}`; + } +} diff --git a/src/hooks/downloadReducer.ts b/src/hooks/downloadReducer.ts new file mode 100644 index 00000000..bd7378ed --- /dev/null +++ b/src/hooks/downloadReducer.ts @@ -0,0 +1,288 @@ +/** + * Pure state for a single model download, plus the reducer that advances it on + * each backend `DownloadEvent`. + * + * This is the one source of truth for "what a download channel's events mean". + * The single-download onboarding hook ({@link useDownloadModel}) and the + * multi-download Settings registry ({@link useDownloads}) both drive their state + * through {@link reduceDownloadEvent}, so the two never diverge. The reducer is + * pure (no React, no refs, no I/O): the byte accumulators that the old hook kept + * in refs live on the accumulator here, so a registry can hold one per download. + * + * The post-download engine handoff (`installing -> warming_up -> ready`, driven + * by the `engine:status` event when `awaitEngine` is set) is NOT modeled here: + * it is a separate event stream owned by the onboarding hook. This reducer only + * interprets the per-download `DownloadEvent` channel. + */ + +import type { + DownloadEvent, + DownloadFailKind, + StarterTier, +} from '../types/starter'; + +/** Failure kinds the UI can show: the backend's plus the engine handoff's. */ +export type DownloadUiFailKind = DownloadFailKind | 'engine'; + +/** The download UI state machine's discriminated union. */ +export type DownloadUiState = + | { phase: 'idle' } + | { phase: 'confirming'; tier: StarterTier } + | { phase: 'downloading' } + | { phase: 'downloading_mmproj' } + | { phase: 'verifying' } + | { phase: 'installing' } + | { phase: 'warming_up' } + | { phase: 'ready' } + | { phase: 'resume_pending' } + | { phase: 'failed'; kind: DownloadUiFailKind; message: string }; + +/** Last reported byte counts for the file currently downloading. */ +export interface DownloadProgressInfo { + file: string; + bytes: number; + totalBytes: number; +} + +/** One ETA sample: a Progress event's byte count and arrival time. */ +export interface EtaSample { + t: number; + bytes: number; +} + +/** Rolling-rate window: only Progress samples this recent feed the ETA. */ +const ETA_WINDOW_MS = 10_000; + +/** + * Everything needed to render one download and to fold the next event in. The + * fields below `speedBytesPerSec` are internal accumulators (the old hook's + * refs); consumers read the render fields and pass the whole accumulator back + * into {@link reduceDownloadEvent}. + */ +export interface DownloadAccumulator { + state: DownloadUiState; + progress: DownloadProgressInfo | null; + etaSeconds: number | null; + /** + * Cumulative bytes downloaded across every file of the current run (weights + + * vision companion), or null when idle. One continuous figure: never resets + * between the two files. + */ + combinedBytes: number | null; + /** Rolling download rate in bytes per second, or null until measurable. */ + speedBytesPerSec: number | null; + /** Recent Progress samples inside the rolling ETA window. */ + samples: EtaSample[]; + /** How many `Started` events have arrived (1 = weights, 2 = mmproj). */ + startedCount: number; + /** Bytes from files that have already fully completed this run. */ + completedBytes: number; + /** Declared total of the file currently downloading. */ + currentFileTotal: number; +} + +/** A fresh accumulator parked at `idle` with empty counters. */ +export function initialAccumulator(): DownloadAccumulator { + return { + state: { phase: 'idle' }, + progress: null, + etaSeconds: null, + combinedBytes: null, + speedBytesPerSec: null, + samples: [], + startedCount: 0, + completedBytes: 0, + currentFileTotal: 0, + }; +} + +/** An accumulator reset to the start of a fresh run (phase `downloading`). */ +export function startingAccumulator(): DownloadAccumulator { + return { ...initialAccumulator(), state: { phase: 'downloading' } }; +} + +/** + * True while a download is active but not yet terminal: bytes still moving + * (`downloading`/`downloading_mmproj`) or the post-download verify/install/warm + * steps running. False for idle, the pre-flight confirm/resume states, and the + * terminal `ready`/`failed`. + */ +export function isDownloadInFlight(phase: DownloadUiState['phase']): boolean { + return ( + phase === 'downloading' || + phase === 'downloading_mmproj' || + phase === 'verifying' || + phase === 'installing' || + phase === 'warming_up' + ); +} + +/** + * A short, jargon-free reason for a failed download, by kind, so the UI tells + * the user what actually went wrong instead of a generic message. + */ +export function downloadFailureMessage(kind: DownloadUiFailKind): string { + switch (kind) { + case 'offline': + return 'You appear to be offline.'; + case 'http': + return 'Hugging Face had an error. Try again.'; + case 'checksum': + return 'The download did not verify. Retrying starts it fresh.'; + case 'disk_full': + return 'Not enough disk space.'; + case 'engine': + return "Thuki's engine could not start."; + case 'other': + return 'Model download failed.'; + } +} + +/** + * Bytes per second from the rolling sample window, or `null` while the rate is + * not yet measurable (fewer than two samples, zero elapsed time, or no forward + * progress between the window's edges). + */ +export function computeSpeedBytesPerSec(samples: EtaSample[]): number | null { + if (samples.length < 2) return null; + const first = samples[0]; + const last = samples[samples.length - 1]; + const elapsedSeconds = (last.t - first.t) / 1000; + const deltaBytes = last.bytes - first.bytes; + if (elapsedSeconds <= 0 || deltaBytes <= 0) return null; + return deltaBytes / elapsedSeconds; +} + +/** + * Remaining seconds from the rolling sample window, or `null` while the rate is + * not yet measurable (fewer than two samples, zero elapsed time, or no forward + * progress between the window's edges). + */ +export function computeEtaSeconds( + samples: EtaSample[], + bytes: number, + totalBytes: number, +): number | null { + const bytesPerSecond = computeSpeedBytesPerSec(samples); + if (bytesPerSecond === null) return null; + return Math.max(0, Math.round((totalBytes - bytes) / bytesPerSecond)); +} + +/** Appends a sample and drops any that have aged out of the rolling window. */ +function pushSample( + samples: EtaSample[], + sample: EtaSample, + now: number, +): EtaSample[] { + const next = [...samples, sample]; + let start = 0; + while (start < next.length && now - next[start].t > ETA_WINDOW_MS) { + start += 1; + } + return start > 0 ? next.slice(start) : next; +} + +/** + * Folds one backend `DownloadEvent` into the accumulator, returning a new + * accumulator (the input is never mutated). `awaitEngine` decides the terminal + * step: when set, `AllDone` parks in `installing` for the `engine:status` + * handoff; otherwise it goes straight to `ready`. + */ +export function reduceDownloadEvent( + acc: DownloadAccumulator, + event: DownloadEvent, + awaitEngine: boolean, +): DownloadAccumulator { + switch (event.type) { + case 'Started': { + const startedCount = acc.startedCount + 1; + return { + ...acc, + startedCount, + samples: [], + etaSeconds: null, + speedBytesPerSec: null, + currentFileTotal: event.data.total_bytes, + progress: { + file: event.data.file, + bytes: event.data.resumed_from, + totalBytes: event.data.total_bytes, + }, + combinedBytes: acc.completedBytes + event.data.resumed_from, + // The second Started is always the mmproj companion: specs are ordered + // weights first, mmproj second. + state: + startedCount >= 2 + ? { phase: 'downloading_mmproj' } + : { phase: 'downloading' }, + }; + } + case 'Progress': { + const now = Date.now(); + const samples = pushSample( + acc.samples, + { t: now, bytes: event.data.bytes }, + now, + ); + // A resume re-hash labels itself `verifying` before the remaining bytes + // stream; the first streamed Progress returns the label to the active + // downloading phase so the transfer is not mislabeled. Any other phase is + // left untouched. + const state: DownloadUiState = + acc.state.phase === 'verifying' + ? acc.startedCount >= 2 + ? { phase: 'downloading_mmproj' } + : { phase: 'downloading' } + : acc.state; + return { + ...acc, + samples, + state, + progress: { + file: event.data.file, + bytes: event.data.bytes, + totalBytes: event.data.total_bytes, + }, + etaSeconds: computeEtaSeconds( + samples, + event.data.bytes, + event.data.total_bytes, + ), + speedBytesPerSec: computeSpeedBytesPerSec(samples), + combinedBytes: acc.completedBytes + event.data.bytes, + }; + } + case 'Verifying': + return { ...acc, state: { phase: 'verifying' } }; + case 'FileDone': { + // Fold this file's bytes into the completed total and snap the cumulative + // figure to the boundary so the bar never dips. The next Started (mmproj) + // or AllDone moves the state. + const completedBytes = acc.completedBytes + acc.currentFileTotal; + return { + ...acc, + completedBytes, + currentFileTotal: 0, + combinedBytes: completedBytes, + }; + } + case 'AllDone': + return { + ...acc, + state: awaitEngine ? { phase: 'installing' } : { phase: 'ready' }, + }; + case 'Cancelled': + return initialAccumulator(); + case 'Failed': + // Terminal from ANY state, including verifying (finalize failure: the + // manifest write failed, so AllDone never arrives). + return { + ...acc, + state: { + phase: 'failed', + kind: event.data.kind, + message: event.data.message, + }, + }; + } +} diff --git a/src/hooks/useDownloadModel.ts b/src/hooks/useDownloadModel.ts index 8dc51454..64e4a856 100644 --- a/src/hooks/useDownloadModel.ts +++ b/src/hooks/useDownloadModel.ts @@ -1,134 +1,57 @@ /** - * Download-state machine for starter model downloads. + * Download-state machine for a single starter model download (onboarding). * - * Drives the shared download UI (StarterPicker + DownloadProgress) through - * one discriminated-union state, fed by the `download_starter` Tauri channel - * and, optionally, the `engine:status` Tauri event. + * Drives the onboarding download UI (StarterPicker + DownloadProgress) through + * one discriminated-union state, fed by the `download_*` Tauri channel and, + * optionally, the `engine:status` Tauri event. Per-event state transitions live + * in the shared {@link reduceDownloadEvent} reducer so this single-download hook + * and the multi-download Settings registry ({@link useDownloads}) never diverge. * - * Engine handoff: by default `AllDone` transitions straight to `ready`, - * because after a Settings-context download nobody starts the engine until - * the first chat, so waiting on `engine:status` would hang forever. A - * consumer that does prime the engine right after the download (onboarding) - * passes `awaitEngine: true`; then `AllDone` parks in `installing` and the - * `engine:status` listener advances `installing -> warming_up -> ready` - * (or `failed` with kind `engine`). + * Engine handoff: by default `AllDone` transitions straight to `ready`, because + * after a Settings-context download nobody starts the engine until the first + * chat, so waiting on `engine:status` would hang forever. A consumer that does + * prime the engine right after the download (onboarding) passes + * `awaitEngine: true`; then `AllDone` parks in `installing` and the + * `engine:status` listener advances `installing -> warming_up -> ready` (or + * `failed` with kind `engine`). * * The backend emits `AllDone` only after the install is recorded; a finalize - * failure (the manifest write failed) emits `Failed` instead of `AllDone`. - * `Failed` is terminal from any state. Terminal means no *event* moves the - * machine out of it; the user can still leave through `reset`, an explicit - * action that returns the terminal `failed`/`ready` cards to the picker. + * failure (the manifest write failed) emits `Failed` instead. `Failed` is + * terminal from any state. Terminal means no *event* moves the machine out of + * it; the user can still leave through `reset`. */ import { useCallback, useEffect, useRef, useState } from 'react'; import { Channel, invoke } from '@tauri-apps/api/core'; import { listen } from '@tauri-apps/api/event'; +import { + type DownloadAccumulator, + type DownloadProgressInfo, + type DownloadUiState, + initialAccumulator, + reduceDownloadEvent, + startingAccumulator, +} from './downloadReducer'; +import { downloadKey } from './downloadKey'; import type { DownloadEvent, - DownloadFailKind, EngineStatus, StarterTier, } from '../types/starter'; -/** Failure kinds the UI can show: the backend's plus the engine handoff's. */ -export type DownloadUiFailKind = DownloadFailKind | 'engine'; - -/** The download UI state machine's discriminated union. */ -export type DownloadUiState = - | { phase: 'idle' } - | { phase: 'confirming'; tier: StarterTier } - | { phase: 'downloading' } - | { phase: 'downloading_mmproj' } - | { phase: 'verifying' } - | { phase: 'installing' } - | { phase: 'warming_up' } - | { phase: 'ready' } - | { phase: 'resume_pending' } - | { phase: 'failed'; kind: DownloadUiFailKind; message: string }; - -/** - * True while a download is active but not yet terminal: bytes still moving - * (`downloading`/`downloading_mmproj`) or the post-download verify/install/warm - * steps running. False for idle, the pre-flight confirm/resume states, and the - * terminal `ready`/`failed`. Shared by the picker's "Continue setup" line, the - * ambient strip, and the submit soft-block so all three agree on "in flight". - */ -export function isDownloadInFlight(phase: DownloadUiState['phase']): boolean { - return ( - phase === 'downloading' || - phase === 'downloading_mmproj' || - phase === 'verifying' || - phase === 'installing' || - phase === 'warming_up' - ); -} - -/** - * A short, jargon-free reason for a failed download, by kind, so the ambient - * strip tells the user what actually went wrong instead of a generic message. - */ -export function downloadFailureMessage(kind: DownloadUiFailKind): string { - switch (kind) { - case 'offline': - return 'You appear to be offline.'; - case 'http': - return 'Hugging Face had an error. Try again.'; - case 'checksum': - return 'The download did not verify. Retrying starts it fresh.'; - case 'disk_full': - return 'Not enough disk space.'; - case 'engine': - return "Thuki's engine could not start."; - case 'other': - return 'Model download failed.'; - } -} - -/** Last reported byte counts for the file currently downloading. */ -export interface DownloadProgressInfo { - file: string; - bytes: number; - totalBytes: number; -} - -/** One ETA sample: a Progress event's byte count and arrival time. */ -interface EtaSample { - t: number; - bytes: number; -} - -/** Rolling-rate window: only Progress samples this recent feed the ETA. */ -const ETA_WINDOW_MS = 10_000; - -/** - * Bytes per second from the rolling sample window, or `null` while the rate - * is not yet measurable (fewer than two samples, zero elapsed time, or no - * forward progress between the window's edges). - */ -export function computeSpeedBytesPerSec(samples: EtaSample[]): number | null { - if (samples.length < 2) return null; - const first = samples[0]; - const last = samples[samples.length - 1]; - const elapsedSeconds = (last.t - first.t) / 1000; - const deltaBytes = last.bytes - first.bytes; - if (elapsedSeconds <= 0 || deltaBytes <= 0) return null; - return deltaBytes / elapsedSeconds; -} - -/** - * Remaining seconds from the rolling sample window, or `null` while the - * rate is not yet measurable (fewer than two samples, zero elapsed time, - * or no forward progress between the window's edges). - */ -export function computeEtaSeconds( - samples: EtaSample[], - bytes: number, - totalBytes: number, -): number | null { - const bytesPerSecond = computeSpeedBytesPerSec(samples); - if (bytesPerSecond === null) return null; - return Math.max(0, Math.round((totalBytes - bytes) / bytesPerSecond)); -} +// Re-export the shared download vocabulary so existing consumers keep importing +// it from this hook; the definitions now live in `downloadReducer`. +export { + computeEtaSeconds, + computeSpeedBytesPerSec, + downloadFailureMessage, + isDownloadInFlight, +} from './downloadReducer'; +export type { + DownloadProgressInfo, + DownloadUiFailKind, + DownloadUiState, +} from './downloadReducer'; export interface UseDownloadModel { state: DownloadUiState; @@ -154,15 +77,22 @@ export interface UseDownloadModel { */ startRepo: (repo: string, file: string) => Promise; /** - * Invokes `cancel_model_download`. The state flips back to idle when the - * backend's Cancelled event lands; the partial is KEPT, so the caller - * refreshes options to surface resume_pending. + * idle -> downloading for a Staff Picks catalog entry, keyed by its stable + * `id`; invokes `download_staff_pick` with a channel. Same event stream and + * terminal states as `start`; `retry` replays it, and a resume is just + * calling it again (the backend resumes the partial via Range). + */ + startById: (id: string) => Promise; + /** + * Invokes `cancel_model_download` for the run this hook last started. The + * state flips back to idle when the backend's Cancelled event lands; the + * partial is KEPT, so the caller refreshes options to surface resume_pending. */ cancel: () => Promise; /** * failed -> downloading. A checksum failure already deleted the partial * on the backend, so retrying is just starting the same download (starter - * tier or pasted repo, whichever ran last) again. + * tier, staff pick, or pasted repo, whichever ran last) again. */ retry: () => Promise; /** resume_pending -> downloading; the backend resumes via Range. */ @@ -194,131 +124,37 @@ export function useDownloadModel( ): UseDownloadModel { const awaitEngine = options?.awaitEngine === true; - const [state, setState] = useState({ phase: 'idle' }); - const [progress, setProgress] = useState(null); - const [etaSeconds, setEtaSeconds] = useState(null); - const [combinedBytes, setCombinedBytes] = useState(null); - const [speedBytesPerSec, setSpeedBytesPerSec] = useState(null); - - const samplesRef = useRef([]); - const startedCountRef = useRef(0); - /** Bytes from files that have already fully completed this run. */ - const completedBytesRef = useRef(0); - /** Declared total of the file currently downloading. */ - const currentFileTotalRef = useRef(0); - /** Replays the most recent start (tier or repo) for `retry`. */ + const [acc, setAcc] = useState(initialAccumulator); + /** Download key of the run in flight, so `cancel` targets the right slot. */ + const currentKeyRef = useRef(''); + /** Replays the most recent start (tier / repo / id) for `retry`. */ const lastStartRef = useRef<(() => Promise) | null>(null); - const handleEvent = useCallback( - (event: DownloadEvent) => { - switch (event.type) { - case 'Started': { - startedCountRef.current += 1; - samplesRef.current = []; - setEtaSeconds(null); - setSpeedBytesPerSec(null); - currentFileTotalRef.current = event.data.total_bytes; - setProgress({ - file: event.data.file, - bytes: event.data.resumed_from, - totalBytes: event.data.total_bytes, - }); - setCombinedBytes(completedBytesRef.current + event.data.resumed_from); - // The second Started is always the mmproj companion: specs are - // ordered weights first, mmproj second. - setState( - startedCountRef.current >= 2 - ? { phase: 'downloading_mmproj' } - : { phase: 'downloading' }, - ); - break; - } - case 'Progress': { - const now = Date.now(); - const samples = samplesRef.current; - samples.push({ t: now, bytes: event.data.bytes }); - while (samples.length > 0 && now - samples[0].t > ETA_WINDOW_MS) { - samples.shift(); - } - setProgress({ - file: event.data.file, - bytes: event.data.bytes, - totalBytes: event.data.total_bytes, - }); - setEtaSeconds( - computeEtaSeconds( - samples, - event.data.bytes, - event.data.total_bytes, - ), - ); - setSpeedBytesPerSec(computeSpeedBytesPerSec(samples)); - setCombinedBytes(completedBytesRef.current + event.data.bytes); - // A resume re-hash labels itself `verifying` before the remaining - // bytes stream; the first streamed Progress returns the label to the - // active downloading phase so the transfer is not mislabeled. Any - // other phase is left untouched (same reference → no re-render). - setState((prev) => - prev.phase === 'verifying' - ? startedCountRef.current >= 2 - ? { phase: 'downloading_mmproj' } - : { phase: 'downloading' } - : prev, - ); - break; - } - case 'Verifying': - setState({ phase: 'verifying' }); - break; - case 'FileDone': - // Fold this file's bytes into the completed total and snap the - // cumulative figure to the boundary so the bar never dips. The next - // Started (mmproj) or AllDone moves the state. - completedBytesRef.current += currentFileTotalRef.current; - currentFileTotalRef.current = 0; - setCombinedBytes(completedBytesRef.current); - break; - case 'AllDone': - setState(awaitEngine ? { phase: 'installing' } : { phase: 'ready' }); - break; - case 'Cancelled': - setProgress(null); - setEtaSeconds(null); - setSpeedBytesPerSec(null); - setCombinedBytes(null); - completedBytesRef.current = 0; - currentFileTotalRef.current = 0; - setState({ phase: 'idle' }); - break; - case 'Failed': - // Terminal from ANY state, including verifying (finalize failure: - // the manifest write failed, so AllDone never arrives). - setState({ - phase: 'failed', - kind: event.data.kind, - message: event.data.message, - }); - break; - } - }, - [awaitEngine], - ); - useEffect(() => { if (!awaitEngine) return; const unlistenPromise = listen('engine:status', (event) => { const status = event.payload; - setState((prev) => { - if (prev.phase !== 'installing' && prev.phase !== 'warming_up') { + setAcc((prev) => { + if ( + prev.state.phase !== 'installing' && + prev.state.phase !== 'warming_up' + ) { return prev; } - if (status.state === 'starting') return { phase: 'warming_up' }; - if (status.state === 'loaded') return { phase: 'ready' }; + if (status.state === 'starting') { + return { ...prev, state: { phase: 'warming_up' } }; + } + if (status.state === 'loaded') { + return { ...prev, state: { phase: 'ready' } }; + } if (status.state === 'failed') { return { - phase: 'failed', - kind: 'engine', - message: status.error ?? 'the engine could not start', + ...prev, + state: { + phase: 'failed', + kind: 'engine', + message: status.error ?? 'the engine could not start', + }, }; } return prev; @@ -330,40 +166,38 @@ export function useDownloadModel( }, [awaitEngine]); const beginConfirm = useCallback((tier: StarterTier) => { - setState({ phase: 'confirming', tier }); + setAcc((prev) => ({ ...prev, state: { phase: 'confirming', tier } })); }, []); const cancelConfirm = useCallback(() => { - setState({ phase: 'idle' }); + setAcc(initialAccumulator()); }, []); - /** Shared start path: resets per-run trackers, wires the event channel, - * and invokes the given download command. */ + /** Shared start path: resets the accumulator, wires the event channel, and + * invokes the given download command with its download key. */ const run = useCallback( - async (command: string, args: Record) => { - startedCountRef.current = 0; - samplesRef.current = []; - completedBytesRef.current = 0; - currentFileTotalRef.current = 0; - setProgress(null); - setEtaSeconds(null); - setSpeedBytesPerSec(null); - setCombinedBytes(null); - setState({ phase: 'downloading' }); + async (command: string, args: Record, key: string) => { + currentKeyRef.current = key; + setAcc(startingAccumulator()); const channel = new Channel(); - channel.onmessage = handleEvent; + channel.onmessage = (event) => + setAcc((prev) => reduceDownloadEvent(prev, event, awaitEngine)); try { - await invoke(command, { ...args, onEvent: channel }); + await invoke(command, { ...args, key, onEvent: channel }); } catch (err) { - setState({ phase: 'failed', kind: 'other', message: String(err) }); + setAcc((prev) => ({ + ...prev, + state: { phase: 'failed', kind: 'other', message: String(err) }, + })); } }, - [handleEvent], + [awaitEngine], ); const start = useCallback( async (tier: StarterTier) => { - const replay = () => run('download_starter', { tier }); + const replay = () => + run('download_starter', { tier }, downloadKey({ kind: 'tier', tier })); lastStartRef.current = replay; await replay(); }, @@ -372,7 +206,22 @@ export function useDownloadModel( const startRepo = useCallback( async (repo: string, file: string) => { - const replay = () => run('download_repo_model', { repo, file }); + const replay = () => + run( + 'download_repo_model', + { repo, file }, + downloadKey({ kind: 'repo', repo, file }), + ); + lastStartRef.current = replay; + await replay(); + }, + [run], + ); + + const startById = useCallback( + async (id: string) => { + const replay = () => + run('download_staff_pick', { id }, downloadKey({ kind: 'staff', id })); lastStartRef.current = replay; await replay(); }, @@ -380,7 +229,7 @@ export function useDownloadModel( ); const cancel = useCallback(async () => { - await invoke('cancel_model_download'); + await invoke('cancel_model_download', { key: currentKeyRef.current }); }, []); const retry = useCallback(async () => { @@ -393,42 +242,48 @@ export function useDownloadModel( try { await invoke('discard_partial_download', { sha256 }); } catch (err) { - setState({ phase: 'failed', kind: 'other', message: String(err) }); + setAcc((prev) => ({ + ...prev, + state: { phase: 'failed', kind: 'other', message: String(err) }, + })); return; } - setState({ phase: 'idle' }); + setAcc((prev) => ({ ...prev, state: { phase: 'idle' } })); }, []); const enterResumePending = useCallback(() => { - setState({ phase: 'resume_pending' }); + setAcc((prev) => ({ ...prev, state: { phase: 'resume_pending' } })); }, []); const reset = useCallback(() => { - setState((prev) => - prev.phase === 'failed' || prev.phase === 'ready' - ? { phase: 'idle' } - : prev, + setAcc((prev) => + prev.state.phase === 'failed' || prev.state.phase === 'ready' + ? initialAccumulator() + : { + // Stale byte counts from the run that just ended; the next start + // reseeds them. Callers only invoke reset from the terminal cards. + ...prev, + progress: null, + etaSeconds: null, + speedBytesPerSec: null, + combinedBytes: null, + completedBytes: 0, + currentFileTotal: 0, + }, ); - // Stale byte counts from the run that just ended; the next start - // reseeds them. Callers only invoke reset from the terminal cards. - setProgress(null); - setEtaSeconds(null); - setSpeedBytesPerSec(null); - setCombinedBytes(null); - completedBytesRef.current = 0; - currentFileTotalRef.current = 0; }, []); return { - state, - progress, - etaSeconds, - combinedBytes, - speedBytesPerSec, + state: acc.state, + progress: acc.progress, + etaSeconds: acc.etaSeconds, + combinedBytes: acc.combinedBytes, + speedBytesPerSec: acc.speedBytesPerSec, beginConfirm, cancelConfirm, start, startRepo, + startById, cancel, retry, resume: start, diff --git a/src/hooks/useModel.ts b/src/hooks/useModel.ts index 266e9b86..06ccf2ef 100644 --- a/src/hooks/useModel.ts +++ b/src/hooks/useModel.ts @@ -13,6 +13,7 @@ import type { export type EngineErrorKind = | 'EngineUnreachable' | 'EngineStartFailed' + | 'ModelUnsupported' | 'ModelNotFound' | 'NoModelSelected' | 'Other'; diff --git a/src/hooks/useModelSelection.ts b/src/hooks/useModelSelection.ts index fd145aca..8373c616 100644 --- a/src/hooks/useModelSelection.ts +++ b/src/hooks/useModelSelection.ts @@ -1,7 +1,16 @@ import { useCallback, useEffect, useRef, useState } from 'react'; import { invoke } from '@tauri-apps/api/core'; +import { listen, type UnlistenFn } from '@tauri-apps/api/event'; import type { ModelPickerState } from '../types/model'; +/** + * Backend broadcast fired after any in-app config write replaces the in-memory + * `AppConfig` (including a model change made from the other webview, e.g. the + * Settings panel). Mirrors the Rust-side `CONFIG_UPDATED_EVENT`. Kept as a + * string literal to avoid a Rust-codegen dependency in the frontend. + */ +const CONFIG_UPDATED_EVENT = 'thuki://config-updated'; + /** * Runtime guard for the IPC boundary. The Rust backend is trusted, but this * keeps the hook robust against shape drift (schema changes, legacy builds, @@ -140,6 +149,32 @@ export function useModelSelection(): UseModelSelectionResult { void refreshModels(); }, [refreshModels]); + // Re-pull when any window writes config (a model change in the Settings + // panel broadcasts this). Without it the active-model chip and list would + // only resync on the next picker-open or summon, so a change made elsewhere + // would look stale until then. `mountedRef` gates a late subscription so an + // unmount before `listen` resolves still tears the handler down. + useEffect(() => { + let unlisten: UnlistenFn | null = null; + void listen(CONFIG_UPDATED_EVENT, () => { + void refreshModels(); + }) + .then((stop) => { + if (!mountedRef.current) { + stop(); + return; + } + unlisten = stop; + }) + .catch(() => { + // Event bridge unavailable (test env / Tauri not ready). The mount + // fetch and explicit refreshes still work; only the live push is lost. + }); + return () => { + unlisten?.(); + }; + }, [refreshModels]); + const setActiveModel = useCallback( async (model: string): Promise => { latestTokenRef.current += 1; diff --git a/src/settings/SettingsWindow.test.tsx b/src/settings/SettingsWindow.test.tsx index 93d2c44a..f2facc8e 100644 --- a/src/settings/SettingsWindow.test.tsx +++ b/src/settings/SettingsWindow.test.tsx @@ -85,6 +85,14 @@ function defaultInvoke(cmd: string): unknown { return true; case 'check_screen_recording_permission': return true; + case 'get_model_picker_state': + return { active: null, all: [], displayNames: {}, ollamaReachable: true }; + case 'list_installed_models': + return []; + case 'get_engine_status': + return { state: 'stopped', model_path: '', port: null, error: null }; + case 'get_loaded_model': + return null; case 'get_updater_state': return { last_check_at_unix: null, @@ -116,7 +124,7 @@ describe('SettingsWindow', () => { it('renders the five tab labels after config loads', async () => { render(); await waitFor(() => - expect(screen.getByRole('tab', { name: /AI/ })).toBeInTheDocument(), + expect(screen.getByRole('tab', { name: /Models/ })).toBeInTheDocument(), ); expect(screen.getByRole('tab', { name: /Behavior/ })).toBeInTheDocument(); expect(screen.getByRole('tab', { name: /Web/ })).toBeInTheDocument(); @@ -141,16 +149,42 @@ describe('SettingsWindow', () => { ).toBeInTheDocument(); }); - it('starts on the AI tab', async () => { + it('starts on the Models tab', async () => { render(); await waitFor(() => - expect(screen.getByRole('tab', { name: /AI/ })).toHaveAttribute( + expect(screen.getByRole('tab', { name: /Models/ })).toHaveAttribute( 'aria-selected', 'true', ), ); }); + // Regression: the Settings window is its own webview root. The Discover panes + // read the app-root download context, so the Settings tree must provide a + // DownloadProvider or opening Discover throws and blanks the window. + it('opens Discover without crashing the Settings window', async () => { + // Built-in active so Discover renders ungated; this test guards the + // DownloadProvider wiring, not the non-built-in gate (covered in ModelTab). + const builtinActive: RawAppConfig = { + ...SAMPLE, + inference: { ...SAMPLE.inference, active_provider: 'builtin' }, + }; + invokeMock.mockImplementation(async (cmd: string) => { + if (cmd === 'get_config') return builtinActive; + if (cmd === 'get_staff_picks') return []; + return defaultInvoke(cmd); + }); + render(); + await waitFor(() => screen.getByRole('tab', { name: /Models/ })); + await act(async () => { + fireEvent.click(screen.getByRole('tab', { name: 'Discover' })); + await Promise.resolve(); + }); + expect( + await screen.findByRole('tab', { name: 'Staff picks' }), + ).toBeInTheDocument(); + }); + it('switching tabs swaps the active tab body', async () => { render(); await waitFor(() => screen.getByRole('tab', { name: /Display/ })); @@ -171,7 +205,7 @@ describe('SettingsWindow', () => { .spyOn(globalThis, 'requestAnimationFrame') .mockImplementation(() => 0); const { container } = render(); - await waitFor(() => screen.getByRole('tab', { name: /AI/ })); + await waitFor(() => screen.getByRole('tab', { name: /Models/ })); const body = container.querySelector('[role="tabpanel"]')!; expect(body.className).not.toMatch(/bodyScrollable/); @@ -191,9 +225,9 @@ describe('SettingsWindow', () => { it('ArrowRight rotates focus to the next tab', async () => { render(); - await waitFor(() => screen.getByRole('tab', { name: /AI/ })); + await waitFor(() => screen.getByRole('tab', { name: /Models/ })); - const modelTab = screen.getByRole('tab', { name: /AI/ }); + const modelTab = screen.getByRole('tab', { name: /Models/ }); fireEvent.keyDown(modelTab, { key: 'ArrowRight' }); expect(screen.getByRole('tab', { name: /Behavior/ })).toHaveAttribute( 'aria-selected', @@ -203,9 +237,9 @@ describe('SettingsWindow', () => { it('ArrowLeft wraps to the last tab when starting on the first', async () => { render(); - await waitFor(() => screen.getByRole('tab', { name: /AI/ })); + await waitFor(() => screen.getByRole('tab', { name: /Models/ })); - const modelTab = screen.getByRole('tab', { name: /AI/ }); + const modelTab = screen.getByRole('tab', { name: /Models/ }); await act(async () => { fireEvent.keyDown(modelTab, { key: 'ArrowLeft' }); await Promise.resolve(); @@ -219,9 +253,9 @@ describe('SettingsWindow', () => { it('non-arrow keys are ignored by the tab key handler', async () => { render(); - await waitFor(() => screen.getByRole('tab', { name: /AI/ })); + await waitFor(() => screen.getByRole('tab', { name: /Models/ })); - const modelTab = screen.getByRole('tab', { name: /AI/ }); + const modelTab = screen.getByRole('tab', { name: /Models/ }); fireEvent.keyDown(modelTab, { key: 'Enter' }); expect(modelTab).toHaveAttribute('aria-selected', 'true'); }); @@ -276,7 +310,7 @@ describe('SettingsWindow', () => { it('Cmd+, on the document re-focuses the settings window', async () => { render(); - await waitFor(() => screen.getByRole('tab', { name: /AI/ })); + await waitFor(() => screen.getByRole('tab', { name: /Models/ })); __mockWindow.setFocus.mockClear(); fireEvent.keyDown(document, { key: ',', metaKey: true }); @@ -285,7 +319,7 @@ describe('SettingsWindow', () => { it('Other keystrokes do not trigger setFocus', async () => { render(); - await waitFor(() => screen.getByRole('tab', { name: /AI/ })); + await waitFor(() => screen.getByRole('tab', { name: /Models/ })); __mockWindow.setFocus.mockClear(); fireEvent.keyDown(document, { key: ',' }); // no Meta @@ -295,7 +329,7 @@ describe('SettingsWindow', () => { it('Cmd+W on the document hides the settings window', async () => { render(); - await waitFor(() => screen.getByRole('tab', { name: /AI/ })); + await waitFor(() => screen.getByRole('tab', { name: /Models/ })); __mockWindow.hide.mockClear(); fireEvent.keyDown(document, { key: 'w', metaKey: true }); @@ -304,7 +338,7 @@ describe('SettingsWindow', () => { it('the close button hides the window instead of quitting', async () => { render(); - await waitFor(() => screen.getByRole('tab', { name: /AI/ })); + await waitFor(() => screen.getByRole('tab', { name: /Models/ })); __mockWindow.hide.mockClear(); fireEvent.click(screen.getByRole('button', { name: /Close/ })); expect(__mockWindow.hide).toHaveBeenCalled(); @@ -312,11 +346,11 @@ describe('SettingsWindow', () => { it('mousedown on the chrome triggers startDragging when not on an interactive element', async () => { render(); - await waitFor(() => screen.getByRole('tab', { name: /AI/ })); + await waitFor(() => screen.getByRole('tab', { name: /Models/ })); __mockWindow.startDragging.mockClear(); // Click on the body container itself (not on a button/input). const root = screen - .getByRole('tab', { name: /AI/ }) + .getByRole('tab', { name: /Models/ }) .closest('[role="tablist"]')!.parentElement!; fireEvent.mouseDown(root, { target: root }); // The root is a div; not in INTERACTIVE_TAGS, so dragging fires. @@ -325,9 +359,9 @@ describe('SettingsWindow', () => { it('mousedown that originates from an interactive element does NOT trigger drag', async () => { render(); - await waitFor(() => screen.getByRole('tab', { name: /AI/ })); + await waitFor(() => screen.getByRole('tab', { name: /Models/ })); __mockWindow.startDragging.mockClear(); - fireEvent.mouseDown(screen.getByRole('tab', { name: /AI/ })); + fireEvent.mouseDown(screen.getByRole('tab', { name: /Models/ })); expect(__mockWindow.startDragging).not.toHaveBeenCalled(); }); @@ -349,10 +383,10 @@ describe('SettingsWindow', () => { it('mousedown with a non-primary button is ignored (no drag, lets context menus through)', async () => { render(); - await waitFor(() => screen.getByRole('tab', { name: /AI/ })); + await waitFor(() => screen.getByRole('tab', { name: /Models/ })); __mockWindow.startDragging.mockClear(); const root = screen - .getByRole('tab', { name: /AI/ }) + .getByRole('tab', { name: /Models/ }) .closest('[role="tablist"]')!.parentElement!; fireEvent.mouseDown(root, { target: root, button: 2 }); expect(__mockWindow.startDragging).not.toHaveBeenCalled(); @@ -396,7 +430,7 @@ describe('SettingsWindow', () => { await Promise.resolve(); await Promise.resolve(); }); - expect(screen.getByRole('status')).toHaveTextContent('Saved'); + expect(screen.getByText('✓ Saved')).toHaveTextContent('Saved'); // Second save before pill auto-hides — clearTimeout(savedTimerRef.current) fires. fireEvent.click(incBtns()[0]); @@ -406,7 +440,7 @@ describe('SettingsWindow', () => { await Promise.resolve(); await Promise.resolve(); }); - expect(screen.getByRole('status')).toHaveTextContent('Saved'); + expect(screen.getByText('✓ Saved')).toHaveTextContent('Saved'); }); it('unmount with the savedPill timer still pending clears it cleanly', async () => { @@ -460,7 +494,7 @@ describe('SettingsWindow', () => { await Promise.resolve(); }); - expect(screen.getByRole('status')).toHaveTextContent('Saved'); + expect(screen.getByText('✓ Saved')).toHaveTextContent('Saved'); // After SAVED_PILL_DURATION_MS the pill toggles back to invisible. We // don't assert on that visibility here because the underlying class @@ -484,7 +518,7 @@ describe('SettingsWindow', () => { return defaultInvoke(cmd); }); render(); - await waitFor(() => screen.getByRole('tab', { name: /AI/ })); + await waitFor(() => screen.getByRole('tab', { name: /Models/ })); await waitFor(() => expect(screen.getByText(/0\.8\.0 is ready/)).toBeInTheDocument(), ); @@ -545,7 +579,7 @@ describe('SettingsWindow', () => { return defaultInvoke(cmd); }); render(); - await waitFor(() => screen.getByRole('tab', { name: /AI/ })); + await waitFor(() => screen.getByRole('tab', { name: /Models/ })); // Allow time for updater state to load await act(async () => { await Promise.resolve(); @@ -554,3 +588,50 @@ describe('SettingsWindow', () => { expect(screen.queryByText(/0\.8\.0 is ready/)).not.toBeInTheDocument(); }); }); + +describe('SettingsWindow left sidebar (Phase 3)', () => { + it('renders the section nav as a vertical sidebar', async () => { + render(); + await waitFor(() => screen.getByRole('tab', { name: /Models/ })); + // Scope to the sidebar: the Models pane also renders a (horizontal) + // segmented tablist for Library/Discover/Providers. + expect( + screen.getByRole('tablist', { name: 'Settings sections' }), + ).toHaveAttribute('aria-orientation', 'vertical'); + }); + + it('renders Models as the first section label', async () => { + render(); + await waitFor(() => + expect(screen.getByRole('tab', { name: /Models/ })).toBeInTheDocument(), + ); + }); + + it('ArrowDown rotates focus to the next sidebar section', async () => { + render(); + await waitFor(() => screen.getByRole('tab', { name: /Models/ })); + fireEvent.keyDown(screen.getByRole('tab', { name: /Models/ }), { + key: 'ArrowDown', + }); + expect(screen.getByRole('tab', { name: /Behavior/ })).toHaveAttribute( + 'aria-selected', + 'true', + ); + }); + + it('ArrowUp wraps to the last sidebar section from the first', async () => { + render(); + await waitFor(() => screen.getByRole('tab', { name: /Models/ })); + await act(async () => { + fireEvent.keyDown(screen.getByRole('tab', { name: /Models/ }), { + key: 'ArrowUp', + }); + await Promise.resolve(); + await Promise.resolve(); + }); + expect(screen.getByRole('tab', { name: /About/ })).toHaveAttribute( + 'aria-selected', + 'true', + ); + }); +}); diff --git a/src/settings/SettingsWindow.tsx b/src/settings/SettingsWindow.tsx index f33138b7..afe0f58c 100644 --- a/src/settings/SettingsWindow.tsx +++ b/src/settings/SettingsWindow.tsx @@ -23,6 +23,7 @@ import { import { invoke } from '@tauri-apps/api/core'; import { getCurrentWindow } from '@tauri-apps/api/window'; +import { DownloadsProvider } from '../contexts/DownloadsContext'; import { useConfigSync } from './hooks/useConfigSync'; import { useSettingsAutoResize } from './hooks/useSettingsAutoResize'; import { ModelTab } from './tabs/ModelTab'; @@ -44,8 +45,8 @@ const TABS: ReadonlyArray<{ }> = [ { id: 'general', - label: 'AI', - // Brain — visual cue that this tab is for the AI itself. + label: 'Models', + // Grid — the model library / management surface. icon: ( - - + + + + ), }, @@ -151,14 +154,17 @@ const SAVED_PILL_DURATION_MS = 1500; /** * Static chrome offset from inner content to total window height: - * window padding-top (8) + WindowControls strip (~28) + tab bar (~70) + * window padding-top (8) + WindowControls strip (~28) * + body padding top+bottom (18 + 24 = 42). + * The section nav now lives in a left sidebar beside the content, so it no + * longer adds vertical chrome (the old top tab bar did). The sidebar's own + * height is seated by the hook's MIN_HEIGHT floor instead. * Empirically measured against the rendered Settings window. If any of * the chrome surfaces change height, update this constant rather than * trying to read `offsetHeight` at runtime — the auto-resize hook fires * before paint settles, so dynamic measurement of chrome would miss. */ -const CHROME_HEIGHT = 148; +const CHROME_HEIGHT = 78; /** Recovery banner height when the corrupt-config marker is shown. */ const BANNER_HEIGHT = 56; @@ -299,132 +305,152 @@ export function SettingsWindow() { if (!config) return null; + // The Settings window is its own webview root (see `main.tsx`), so it hosts + // its own download registry: the Discover panes read their downloads from it, + // and hosting it here (above the section nav and the Models segmented control) + // keeps every in-flight download alive across each in-window tab switch. It is + // independent of the main overlay's onboarding provider; the backend's keyed + // slots are the real cross-window coordinator. return ( -
- + +
+ - {marker && !markerDismissed ? ( -
- - ⚠ - - - Your previous config.toml had a syntax error and was - saved as {baseName(marker.path)}. Defaults are now - active. - - - - - -
- ) : null} + {marker && !markerDismissed ? ( +
+ + ⚠ + + + Your previous config.toml had a syntax error and was + saved as {baseName(marker.path)}. Defaults are now + active. + + + + + +
+ ) : null} - {updater.state.update && !settingsSnoozed ? ( - void updater.openWindow()} - onLater={() => void updater.snoozeSettings(24)} - /> - ) : null} + {updater.state.update && !settingsSnoozed ? ( + void updater.openWindow()} + onLater={() => void updater.snoozeSettings(24)} + /> + ) : null} -
- {TABS.map((tab) => { - const active = tab.id === activeTab; - return ( - - ); - })} -
+ {TABS.map((tab) => { + const active = tab.id === activeTab; + return ( + + ); + })} +
+
+
-
-
- {activeTab === 'general' ? ( - - ) : null} - {activeTab === 'behavior' ? ( - - ) : null} - {activeTab === 'search' ? ( - - ) : null} - {activeTab === 'display' ? ( - - ) : null} - {activeTab === 'about' ? ( - - ) : null} +
+
+
+ {activeTab === 'general' ? ( + + ) : null} + {activeTab === 'behavior' ? ( + + ) : null} + {activeTab === 'search' ? ( + + ) : null} + {activeTab === 'display' ? ( + + ) : null} + {activeTab === 'about' ? ( + + ) : null} +
+
+
-
- -
+ + + ); } diff --git a/src/settings/components/index.tsx b/src/settings/components/index.tsx index 6ca8eedb..6c22d43b 100644 --- a/src/settings/components/index.tsx +++ b/src/settings/components/index.tsx @@ -408,6 +408,7 @@ export function ConfirmDialog({ confirmLabel, cancelLabel = 'Cancel', destructive = false, + primary = false, onConfirm, onCancel, }: { @@ -417,6 +418,9 @@ export function ConfirmDialog({ confirmLabel: string; cancelLabel?: string; destructive?: boolean; + /** Accent-fill the confirm button (the affirmative primary action). Ignored + * when `destructive` is set, which takes visual precedence. */ + primary?: boolean; onConfirm: () => void; onCancel: () => void; }) { @@ -455,7 +459,13 @@ export function ConfirmDialog({ - - -
- - Release after - - { - minFocusedRef.current = true; - }} - onChange={(e) => { - const n = parseInt(e.target.value, 10); - if (Number.isNaN(n)) { - setRawMin(e.target.value); - } else { - const clamped = Math.max(-1, Math.min(1440, n)); - setRawMin(String(clamped)); - setInactivityMin(clamped); - } - }} - onBlur={() => { - minFocusedRef.current = false; - if (Number.isNaN(parseInt(rawMin, 10))) { - setRawMin('0'); - setInactivityMin(0); - } - }} - /> - min -
- - - {/* Row 2: residency status on left | Unload now on right. */} - {activeKind === 'builtin' ? ( -
- - Engine: {engineState} - - -
- ) : ( -
-
- {loadedModel !== null ? ( -
-
- ) : ( - - No model loaded - - )} -
- - -
- )} - - ) : null} - -
-
- {/* Label row: "Context window" left + editable token chip right */} -
- Context window -
- setCtxChip(e.target.value)} - onBlur={() => { - const n = parseInt(ctxChip, 10); - if (!Number.isNaN(n) && n >= CTX_MIN) { - // Clamp upper bound so the UI mirrors the backend - // BOUNDS_NUM_CTX cap and the slider stays in sync. - commitCtx(Math.min(n, CTX_MAX)); - } else { - setCtxChip(String(numCtx)); - } - }} - onKeyDown={(e) => { - if (e.key === 'Enter') (e.target as HTMLInputElement).blur(); - }} - /> - tokens -
-
- - {/* Log-scale slider — fill percentage tracked via CSS custom property */} - { - ctxDraggingRef.current = true; - const pos = Number(e.target.value); - setCtxPos(pos); - setCtxChip(String(posToCtx(pos))); - }} - onMouseUp={() => { - ctxDraggingRef.current = false; - commitCtx(posToCtx(ctxPos)); - }} - onTouchEnd={() => { - ctxDraggingRef.current = false; - commitCtx(posToCtx(ctxPos)); - }} - onKeyUp={() => { - if (!ctxDraggingRef.current) commitCtx(posToCtx(ctxPos)); - }} + + + ) : null} - - - {activeKind === 'builtin' && - (engineState === 'starting' || engineState === 'stopping') ? ( -
- Applying… the engine restarts with the new context on your next - message. -
- ) : null} - -
- ~{ctxTurns.toLocaleString()} turns of context - {' · '} - {activeKind === 'builtin' - ? 'Passed to the engine as --ctx-size at start; changing it restarts the engine.' - : activeKind === 'openai' - ? 'Informational only; your server controls the actual context.' - : "Ollama caps to your model's trained maximum."} -
- -
- - - The KV cache scales linearly with context length, so doubling the - context roughly doubles its memory footprint (model weights stay - the same). Benchmark with your hardware before pushing it high.{' '} - - -
-
-
+ {view === 'discover' ? ( + + + + ) : null} -
- ( - <> -