Every core action in Phoneme is fully accessible from the command line interface via phoneme.exe (the client) and phoneme-daemon.exe (the engine).
These apply to any subcommand:
| Flag | Effect |
|---|---|
--json |
JSON-lines output where supported |
--no-color |
Disable colored output (or set NO_COLOR=1) |
-v, --verbose |
Verbose tracing to stderr |
The CLI auto-spawns the daemon when needed — but only for commands that
create work. Read-only / inspection commands never start a daemon: a
daemon-is-down state is itself the answer for them, so they print
daemon not reachable and exit with code 3 instead of silently starting one.
| Behavior | Commands |
|---|---|
| Auto-spawn (start the daemon if it's not running, then send) | record, meeting start/stop/toggle/rename, import, retranscribe, cleanup, summarize, digest (generate), suggest-tags, suggest-entities, suggest-tasks, chapters (generate and --show), versions, ask, notes, edit, find-replace, clip, speaker rename/clear/reassign/merge/split, reembed, refire-hook, delete, queue pause/resume/reorder/cancel/cancel-processing/cancel-all/clear-failed/dismiss-failed, tag (every subcommand — list/for/usage/suggestions and the mutating ones), entities add/edit/delete/merge, tasks done/undone/add/edit/delete/reorder, dictation regrab/forget/clear, profile use, hook test, export (zip and --captions), import-backup, config reload, daemon start |
| Observe-only (fail fast with exit 3 when no daemon) | list, show, search, watch, doctor, daemon status, queue list/counts/status, queue skip*, meeting tracks, digest --show, entities (list), tasks (list, --open), dictation history, profile list |
| Purely local (no daemon involved at all) | config (print), config path, config set, profile save, speaker calibrate, completions, version |
* queue skip mutates, but only a live daemon mid-LLM-stage has anything to
skip — spawning one just to skip nothing would mask reality.
daemon stop is its own special case: it stops a running daemon but never
spawns one just to stop it (stopping an already-stopped daemon succeeds).
Exit codes are stable API — scripts can branch on them. Every command maps a daemon error to the same code via one shared table:
| Code | Meaning |
|---|---|
0 |
Success |
1 |
Generic failure (including internal daemon errors) |
2 |
Usage error (bad flags — clap's own) |
3 |
Daemon not reachable (also: pipe in use, daemon shutting down) |
4 |
Whisper backend unreachable or timed out |
5 |
Hook failed |
6 |
Invalid config (e.g. a rejected config set value) |
7 |
Recording / tag / path not found |
Start, stop, or run a one-shot recording. The non-blocking controls are
subcommands (record start, record stop, …), matching meeting and the
rest of the CLI; bare phoneme record (no subcommand) is the blocking
push-to-talk mode.
# Non-blocking: starts the recording and immediately returns.
phoneme record start
# Non-blocking: stops the current recording and begins transcription/hooks.
phoneme record stop
# Non-blocking: start if idle, otherwise stop the active recording (atomic —
# ideal for a single hotkey binding). Takes --in-place.
phoneme record toggle
# In-Place Mode: the transcript is typed out as simulated keystrokes into the
# currently focused application window. `-i` is the short form of `--in-place`
# (valid on bare `record`, `record start`, and `record toggle`).
phoneme record start --in-place
# Discard the active recording without saving.
phoneme record cancel
# Non-blocking: pause / resume the active recording (or every track of the
# active meeting). Exit 0.
phoneme record pause
phoneme record resume
# Blocking: starts recording, waits for you to press Enter (or timeout),
# then stops, transcribes, and prints the result.
phoneme record --oneshot
# Record exactly 10 seconds (blocking).
phoneme record --duration 10
# Run a specific Playbook recipe (by id or name) instead of the default
# pipeline. Works on the blocking default, `record start`, and `record toggle`.
phoneme record --recipe meeting_notes
phoneme record start --recipe "Meeting notes"
phoneme record toggle --recipe prompt_captureEach non-blocking subcommand sends a single request (RecordStart,
RecordStop, RecordToggle, RecordCancel, RecordPause, RecordResume) and
exits 0. --oneshot / --duration modify the blocking default.
--recipe <ID|NAME>: pick a Playbook recipe like the GUI's recipe picker. The value is matched against your configured recipes — by id first, then case-insensitively by name — and the resolved id is sent to the daemon. Omit it for the default pipeline. A value that matches no recipe is an error (it lists the available recipes); it never silently falls back to default. Available on barephoneme record(blocking /--oneshot/--duration),record start, andrecord toggle.
Breaking change: the pre-1.8 flag spellings (
record --start,--stop,--toggle,--cancel,--pause,--resume) were removed — use the subcommands. Update any hotkey bindings or scripts accordingly.
Capture source:
phoneme recordalways records the global[recording].source(microphone by default;system_audiofor WASAPI loopback) — there's no--sourceflag. The per-keybind capture-source override ([[hotkeys]].source) is a GUI/config-only feature for custom hotkeys, not the CLI; set it in Settings → Hotkeys or inconfig.toml. See the config reference.
Start a dual-track Meeting Mode recording.
# Start capturing mic + system audio
phoneme meeting start
# Stop the meeting and transcribe both tracks
phoneme meeting stop
# Start if no meeting is active, otherwise stop it (atomic, for hotkey bindings)
phoneme meeting toggle
# List every recording (track) belonging to a meeting session
phoneme meeting tracks 20260519T143500823
# Rename a meeting
phoneme meeting rename 20260519T143500823 "Q3 Planning Sync"
# Clear a meeting's name (omit NAME and pass --clear) — reverts to the
# auto-generated label.
phoneme meeting rename 20260519T143500823 --clear
# Generate (or regenerate) the whole-meeting digest — one AI synthesis across
# ALL of a meeting's tracks (mic + system together), distinct from the
# per-recording `phoneme summarize`. Reuses the configured summary provider;
# `--model` overrides the summary model for this run only. The daemon ACKs
# immediately and generates in the background.
phoneme meeting digest 20260519T143500823
phoneme meeting digest 20260519T143500823 --model llama3.2:3b
# Run a specific MEETING TEMPLATE for this digest only (a `scope = meeting`
# recipe — the seeds ship `standup` and `interview`). Overrides the configured
# `meeting_recipe_id` for this run; an unknown id falls back to the built-in
# digest. Templates differ only by prompt; build your own in Settings → Playbook.
phoneme meeting digest 20260519T143500823 --template standupA digest is also generated automatically when a meeting finishes (after both
tracks transcribe), gated on the same [summary].auto switch as the
per-recording auto-summary — so meetings get a digest with no extra step when
auto-summary is on. Which template the auto-digest uses is set by the top-level
meeting_recipe_id config key (empty = the built-in digest; see the config
reference).
Import an existing audio file (wav/mp3/m4a/flac) and transcribe it — or pass an
http(s) URL (e.g. a YouTube link) to download its audio with yt-dlp and
import that.
phoneme import my_meeting.mp3
# From a URL — downloads audio-only via yt-dlp, then imports it
phoneme import "https://www.youtube.com/watch?v=VIDEO_ID"
# Choose the extracted format for URL imports (default m4a)
phoneme import --format flac "https://youtu.be/VIDEO_ID"
# Transcribe the import through a specific Playbook recipe (by id or name)
# instead of the default pipeline — one pass, no import-then-retranscribe.
phoneme import "https://youtu.be/VIDEO_ID" --recipe lecture-clean
# Idempotent import: tag with your own key; a re-run with the same key is a no-op.
phoneme import "https://youtu.be/VIDEO_ID" --ext-ref "yt:VIDEO_ID"| Flag | Default | Notes |
|---|---|---|
--format <m4a|mp3|flac|wav> |
m4a |
Audio format yt-dlp extracts to (URL imports only). m4a/mp3 are lossy but transparent for speech; flac/wav avoid a re-encode. |
--recipe <ID|NAME> |
default pipeline | Run this import through a chosen Playbook recipe, the same picker record/retranscribe use. Resolved (id first, then name) and rejected if it names a meeting template — before any download, so a typo or wrong scope fails fast. Omit for the default pipeline. Use phoneme recipes to list the choices. |
--ext-ref <KEY> |
(none) | Caller-supplied external-reference key for idempotent import (e.g. a video id). If a recording already carries this key, the import is a no-op that returns it (already imported … (matched --ext-ref); {"id":…,"reused":true} with --json) instead of importing a duplicate. The key rides phoneme list --json as ext_ref so a caller can reconcile what's already imported. Omit for a normal import that always creates a new recording. |
URL import requires yt-dlp and ffmpeg on PATH (python -m pip install -U yt-dlp). The download lands in a temp folder and is removed after import —
Phoneme keeps only its own decoded copy. --recipe closes the gap that used to
force an import (default) then retranscribe --recipe — a full double
transcription — into a single pass; for an already-imported recording, change
its recipe with retranscribe --recipe.
List the configured Playbook recipes — the same recipes the GUI's recipe picker
and the --recipe flag (record / import / retranscribe) draw from. Reads
the same config the daemon does, so it works without the daemon running.
phoneme recipes # human-readable: id, name, scope, description
phoneme --json recipes # machine-readable JSON array (for scripting / clients)The --json form emits one array of {id, name, description, builtin, scope, steps}; a client picking a recipe for import filters to scope == "recording"
(meeting templates are scope == "meeting").
Query the local SQLite recording catalog.
# List all recordings
phoneme list
# List recordings in a date range (ISO 8601, both bounds inclusive)
phoneme list --since 2026-05-19
phoneme list --since 2026-05-01 --until 2026-05-31
# Filter by status: recording, paused, queued, transcribing, cleaning_up,
# summarizing, tagging, hook_running, done, transcribe_failed, hook_failed,
# cleanup_failed, summarize_failed, title_failed, tag_failed, or cancelled.
# - queued: waiting in the transcription queue (flips to transcribing when the
# worker claims it) — so a recording that's only waiting no longer reads as
# "transcribing".
# - *_failed for an optional step (cleanup/summary/title/tag) is terminal like
# hook_failed: the transcript is intact, only that enrichment didn't land, and
# the reason is stored on the row so you can find and re-run it.
# - cancelled: a run the user cancelled — terminal, but not a failure.
phoneme list --status done
phoneme list --status cancelled
phoneme list --status tag_failed # find recordings whose auto-tagging failed
# Limit the number of results returned (with optional offset for pagination)
phoneme list --limit 10
phoneme list --limit 10 --offset 20
# Filter by tag (numeric id or tag name)
phoneme list --tag work
# Full-Text Search via FTS5
phoneme list --search "rust migration"
# Semantic (embedding) search instead of an FTS5/list query — same engine as
# `phoneme search`, reusing --limit (default 20) as the result cap
phoneme list --semantic "database migration plan"
# Filter by recording type: all (default), single (voice notes), or meeting.
# Applied by the daemon in SQL, before --limit/--offset, so pages stay full.
phoneme list --kind meeting
# Run (or list) a saved search by id — the daemon parses the stored filter and
# runs the list query server-side, so a saved search is reproducible from the CLI.
phoneme list --saved # list the saved searches (id + name)
phoneme list --saved ss_a1b2 # run the saved search with that idDisplay the details of a single recording by its ID.
phoneme show 20260519T143500823
# Print only the audio path (useful for shell piping)
phoneme show 20260519T143500823 --audio-path-only
# Print the preserved ORIGINAL (machine) transcript, before AI cleanup
phoneme show 20260519T143500823 --original
# Print the unedited pipeline transcript (transcribed + cleaned, before your
# hand edits)
phoneme show 20260519T143500823 --unedited
# Print the machine transcript segments as a timeline: start-end offsets,
# speaker label (when diarized), and text per line. Empty for recordings
# transcribed before segment capture existed -- retranscribe to backfill.
phoneme show 20260519T143500823 --segmentsRe-transcribe a saved recording using your current model settings.
phoneme retranscribe 20260519T143500823
# Use a different transcription model for this run only
phoneme retranscribe 20260519T143500823 --model ggml-large-v3.bin
# Force hooks on / off for this run (overrides the configured behavior)
phoneme retranscribe 20260519T143500823 --run-hooks
phoneme retranscribe 20260519T143500823 --no-run-hooks
# Skip the LLM cleanup step for this run only (produces the raw transcript)
phoneme retranscribe 20260519T143500823 --no-post-process
# Re-run through a specific Playbook recipe (by id or name) instead of the
# default pipeline — the CLI face of the GUI ↻ Re-run modal's "Run through" picker.
phoneme retranscribe 20260519T143500823 --recipe meeting_notes
phoneme retranscribe 20260519T143500823 --recipe "Meeting notes"
--recipe <ID|NAME>: re-run the recording through a chosen Playbook recipe, matching the Run through picker in the GUI's ↻ Re-run modal ("Just this run" scope). The value is resolved against your configured recipes — by id first, then case-insensitively by name — and the resolved id is sent to the daemon. Omit it for thedefaultpipeline. A value matching no recipe is an error that lists the available recipes (no silent fallback to default). The--modeloverride still applies independently as a one-time transcription-model override.
Re-run only the LLM cleanup ("post-processing") step on a recording's stored
transcript, without re-transcribing the audio. The preserved original transcript
is always the input, so cleanup is idempotent. Overrides apply to this run only
and are never written to config; passing --provider also forces cleanup on.
| Flag | Effect |
|---|---|
--provider <PROVIDER> |
Use this cleanup provider for this run (also forces cleanup on). |
--model <MODEL> |
Use this cleanup model for this run. |
--prompt <PROMPT> |
Use this cleanup prompt for this run. |
--api-url <URL> |
Point cleanup at this endpoint for this run. |
--api-key <KEY> |
Authenticate cleanup with this key for this run. |
Passing a key via
--api-keyexposes it to any local process that can read the process table (ps, Task Manager, shell history). Prefer thePHONEME_CLEANUP_API_KEYenvironment variable —--api-keyreads from it when the flag is omitted, and the env var stays out of the process table and shell history.
phoneme cleanup 20260519T143500823
phoneme cleanup 20260519T143500823 --provider ollama --model llama3.1
phoneme cleanup 20260519T143500823 --prompt "Fix grammar only"
# Point this run at a different OpenAI-compatible endpoint + credentials
phoneme cleanup 20260519T143500823 \
--provider openai \
--api-url https://api.example.com/v1 \
--api-key sk-...Generate (or regenerate) an LLM summary of a recording's current transcript and
store it. --model / --prompt override the configured summary settings for
this run only.
phoneme summarize 20260519T143500823
phoneme summarize 20260519T143500823 --model llama3.1
phoneme summarize 20260519T143500823 --prompt "Three bullet points, no preamble."Generate (or view) a period digest — one LLM rollup across every recording
in a date window (what was discussed, decisions reached, open/action items).
Distinct from the per-recording summarize and the meeting-scoped
meeting digest. The daemon selects the window's recordings, concatenates their
transcripts (each prefixed with its date + title, oldest first), and runs the
merged text through the configured summary provider; the result is stored keyed
by the range, so re-running the same window overwrites in place. A very large
window is truncated to a size cap (and the digest notes it) so it can't overflow
the model's context.
The range is one of:
--daily(the default): the current calendar day (local midnight → end of today).--weekly: the last 7 calendar days (six days ago at midnight → end of today).--since <DATE> --until <DATE>: an explicit, inclusive range. Each accepts a bareYYYY-MM-DD(--sinceat local start-of-day,--untilextended to end-of-day so the final day is included) or a full RFC 3339 timestamp. The two are required together and are mutually exclusive with--daily/--weekly.
--daily/--weekly snap to whole calendar days, so the range is stable within
the day — a --show later the same day fetches the digest a prior generate
stored.
Generating ACKs immediately and rolls up in the background (like summarize),
emitting period_digest_updated / period_digest_failed. --show reads the
stored digest for the resolved window instead, printing "no digest yet" when none
exists for that exact range. --model overrides the summary model for this run
only (ignored with --show).
phoneme digest # today → now
phoneme digest --weekly # last 7 days
phoneme digest --since 2026-06-15 --until 2026-06-20
phoneme digest --weekly --model llama3.2:3b # one-off model override
phoneme digest --show # view today's stored digest
phoneme digest --show --since 2026-06-15 --until 2026-06-20Re-run the LLM tag-suggestion step on a recording on demand (the CLI face of the
GUI ✨ Suggest button), regardless of the auto_tag.auto gate. The command
awaits the model, then returns; the suggestions land on the recording. Review
them with phoneme tag suggestions <ID>. Errors when the recording has no
transcript yet (exit 6) or the id is unknown (exit 7).
phoneme suggest-tags 20260519T143500823Re-run the LLM entity-extraction step on a recording on demand (the CLI face of
the GUI 🔎 Extract button), regardless of whether the recipe includes an
entities step. Like suggest-tags, the command awaits the model, then returns;
the structured, typed entities (person / org / topic / term) land on the
recording, replacing any previous set. Review them with phoneme show <ID>.
Errors when the recording has no transcript yet (exit 6) or the id is unknown
(exit 7).
phoneme suggest-entities 20260519T143500823Generate a recording's topic chapters on demand and print them (the CLI face of
the Chapters detail view). The command sends the recording's timed segments to
the model, awaits it, stores the resulting chapters (each boundary snapped to a
real segment start), then prints the time-coded list. Pass --show to print the
stored chapters without regenerating. A recording with no timing prints an empty
list; errors when there's no transcript to chapter (exit 6) or the id is unknown
(exit 7).
phoneme chapters 20260519T143500823 # generate + print
phoneme chapters 20260519T143500823 --show # print stored chapters onlyPrint a recording's transcript-version chain — the raw ASR output, then each
pipeline step that rewrote it (cleanup, your hand edits, …), down to the live
transcript — for side-by-side comparison. This is the CLI face of the GUI
Compare-versions view, and a cross-platform alternative to the daemon named pipe
(the only other surface that exposes versions): the same chain is also a REST read
at GET /api/recordings/{id}/versions.
Sends ListTranscriptVersions and prints the chain in idx order, one
idx label (model) line per version (the model is shown when the step recorded
one). A recording with nothing beyond the raw transcript prints just that entry;
one with no stored versions prints no transcript versions stored for this recording.
phoneme versions 20260519T143500823
# Machine-readable: the raw version array (each item carries idx/label/model/text)
phoneme --json versions 20260519T143500823Re-run the LLM task-extraction step on a recording on demand (the CLI face of the
GUI ✅ Extract button), regardless of whether the recipe includes a tasks step.
Like suggest-entities, the command awaits the model, then returns; the
structured action items ({text, due_hint?}) land on the recording, replacing
the previous set — but any task you already checked off is preserved when its
text survives the re-extraction. Review them with phoneme tasks or
phoneme show <ID>. Errors when the recording has no transcript yet (exit 6) or
the id is unknown (exit 7).
phoneme suggest-tasks 20260519T143500823Edit a recording's transcript and/or metadata. Any combination of the edits below applies in one invocation:
- Transcript —
--text "…", or from stdin when no metadata flag and no--textis given. - Title —
--title "…"sets a user-owned title (the pipeline never overwrites it on a later retranscribe);--clear-title(or--title "") reverts to auto-generation (the title empties now and regenerates on the next pipeline run). - Favorite —
--favorite/--unfavoritestar or unstar the recording (the Favorites view). - Pinned —
--pin/--unpinpin or unpin the recording. Pinned recordings sort to the top of the library, independent of favorites (the Pinned view).
# Transcript edit (the original behavior): --text or stdin
phoneme edit 20260519T143500823 --text "Corrected transcript."
echo "Corrected transcript." | phoneme edit 20260519T143500823
# Set or clear the display title
phoneme edit 20260519T143500823 --title "Q3 Planning Sync"
phoneme edit 20260519T143500823 --clear-title
# Star / unstar
phoneme edit 20260519T143500823 --favorite
phoneme edit 20260519T143500823 --unfavorite
# Pin / unpin (sorts to the top of the library)
phoneme edit 20260519T143500823 --pin
phoneme edit 20260519T143500823 --unpin
# Combine: fix the text and set a title in one call
phoneme edit 20260519T143500823 --text "Fixed." --title "Standup notes"A metadata-only edit (e.g. just --favorite or --pin) never blocks reading stdin.
Find-and-replace literal text (not a regex) across a recording's stored
transcript. Case-insensitive by default; pass --case-sensitive for an exact
match. Only the live transcript is rewritten — the preserved original (machine)
and unedited (pipeline) copies stay intact, so the change is revertible, and the
timing layers are re-flowed onto the result like any hand edit. A no-match is a
no-op (nothing is written). Prints the number of occurrences replaced.
Pass --library to run the same literal replacement across every
recording in one shot — the positionals are then FIND REPLACE (no id).
Recordings with no match are left untouched (no version churn, no event), so
only the ones that actually change are rewritten. Prints how many occurrences
were replaced and across how many recordings.
# Fix a recurring misspelling across the whole transcript
phoneme find-replace 20260519T143500823 "teh" "the"
# Exact-case replacement
phoneme find-replace 20260519T143500823 "API" "api" --case-sensitive
# Machine-readable count
phoneme --json find-replace 20260519T143500823 "teh" "the" # → {"replaced":3}
# Library-wide: fix a name everywhere at once
phoneme find-replace --library "Jon" "John"
phoneme --json find-replace --library "Jon" "John"
# → {"recordings_changed":4,"total_replacements":11}Export a time range of a recording's audio to a new WAV. START and END are
seconds as floats (e.g. 12.5); the range is [start, end), sliced on
sample-frame boundaries, and END is clamped to the recording's duration. The
clip is written in the source's audio format. When OUT is omitted the clip
lands next to the source recording with a _clip_<start>-<end> suffix. Prints
the path written.
# Cut 12.5s–30s into a sibling _clip_ file next to the source audio
phoneme clip 20260519T143500823 12.5 30
# Write to an explicit path
phoneme clip 20260519T143500823 12.5 30 highlight.wavA non-finite or negative bound, start >= end, or two seconds that round to the
same millisecond are rejected locally (exit 1) before any daemon work. With
--json, prints {"path": "<written-wav>"}.
Get or set a recording's free-form notes (independent of the transcript).
# Print the current notes
phoneme notes 20260519T143500823
# Set the notes
phoneme notes 20260519T143500823 --set "Follow up with Alex."Name and correct a recording's diarized speaker labels (the CLI face of the GUI
speaker chips and the in-recording speaker correction). Every <LABEL> is the
1-based [Speaker N] index from the transcript; <IDX> values are the 0-based
segment indices from phoneme show --segments.
Naming (rename / clear) never rewrites the transcript text — names are
applied at display/export time, so a rename is reversible:
# Give [Speaker 2] a display name
phoneme speaker rename 20260519T143500823 2 "Sarah"
# Clear a speaker label's custom name (revert to "Speaker N")
phoneme speaker clear 20260519T143500823 2Correcting assignments (reassign / merge / split) actually changes
which segment belongs to which speaker. The stored transcript_segments stays
authoritative (the timeline / Synced views re-derive from it) and the prose
transcript's [Speaker N]: markers are rebuilt to match, in one transaction —
so the change shows up everywhere the user sees speakers:
# Reassign segment 5 to [Speaker 2] (a brand-new label simply starts existing)
phoneme speaker reassign 20260519T143500823 5 2
# Merge [Speaker 2] into [Speaker 1]: every 2-segment becomes 1, then 2 is gone.
# Speaker 1 keeps its name (adopts 2's only if 1 is unnamed); 2's captured
# voiceprint is dropped (the centroid is per-label — a retranscribe re-captures
# the merged label) and any affected named voice is recomputed.
phoneme speaker merge 20260519T143500823 2 1
# Split segments 4 and 7 off [Speaker 1] onto a fresh [Speaker 3]
# (the new label has no name or voiceprint until you name / re-enroll it)
phoneme speaker split 20260519T143500823 1 3 4 7A label below 1, a self-merge, a split onto the same label, or a negative index
is rejected locally (exit 1) before any request is sent. An unknown segment
index, or a merge from / split index that doesn't currently carry the named
label, errors with no partial write.
Calibrating the recognition threshold (calibrate) is read-only — it tunes
nothing on its own, it just tells you what to set. It scores every
same-named-voice pair (genuine) against every different-named-voice pair
(impostor) with the recognizer's own cosine, finds the equal-error-rate (EER)
threshold that best separates the two, and prints that suggested value beside the
current [diarization].voiceprint_match_threshold.
It never writes — apply the suggestion yourself with
phoneme config set diarization.voiceprint_match_threshold <value> if you agree.
It needs enough labelled data to be meaningful — at least two named voices, each
with two or more captures — and reports not enough labelled data below that.
# Suggest an EER-optimal match threshold from your enrolled voices
phoneme speaker calibrateManage the named-voice library — the cross-recording identities a recognized
speaker is matched against (#9). Distinct from phoneme speaker, which names the
diarized labels within one recording; voice manages the library those names
come from (the CLI face of the GUI Speaker Library manager).
phoneme voice list # name · sample count · id (also: bare `phoneme voice`)
phoneme voice rename <ID> "Sarah Chen" # rename a named voice
phoneme voice forget <ID> # reversibly forget a voice (undo below)
phoneme voice restore <ID> # undo a forget
phoneme voice merge <FROM_ID> <INTO_ID> # fold FROM into INTO (FROM removed)list is observe-only (won't spawn a daemon), like phoneme entities. forget
soft-deletes the library entry (it leaves recognition; the raw per-recording
voiceprints stay) and is undone by restore. forget / restore / merge
exit non-zero when nothing changed (unknown id), so scripts can detect a no-op;
add --json for the raw {removed|restored|merged: bool} reply.
Semantic (embedding) search over transcripts. Requires semantic search to be
enabled and the embedding model present. Prints score id preview per hit.
phoneme search "database migration plan"
phoneme search "database migration plan" --limit 5
# Scope a meaning-search like the Library: --tag (id or name), --status, --kind
# (single|meeting). The scope restricts the candidate set; an unscoped search is
# unchanged. Combinable.
phoneme search "budget" --tag work
phoneme search "budget" --status done --kind meeting
phoneme list --semantic "<query>"runs the same search, reusing--limitand forwarding any--tag/--status/--kindscope.
--like <RECORDING_ID> — "more like this": instead of embedding a text
query, rank the library by similarity to a stored recording, using its
already-stored vectors. The source recording (and the other track of its own
meeting) never appears in the results. Works even when the embedding model
isn't loaded — only requires that the source recording is indexed; a
recording with no embeddings yet errors with a clear "isn't indexed yet"
message (re-embed or wait for the pipeline). --like and a text query are
mutually exclusive; --limit applies as usual.
phoneme search --like 20260519T143500823
phoneme search --like 20260519T143500823 --limit 5Ask a natural-language question answered only from your own transcripts —
local RAG with citations. The daemon embeds the question, retrieves the top
grounding chunks via the same hybrid (vector + FTS5/RRF) retriever as
phoneme search, builds a citation-instructed prompt, and streams the answer
through the configured [llm_post_process] provider. So it needs both
semantic search enabled (the embedding model loaded) and an LLM provider
configured; either missing exits 6 (invalid config). Nothing is persisted.
Output: the numbered Sources first ([n] label (relevance%)), then the
answer streamed to stdout — its inline [n] markers map back to those sources.
If nothing in your recordings matches, it says so and never invents an answer.
phoneme ask "what did we decide about the database migration?"
phoneme ask "summarize the open questions from my 1:1s" --top-k 12
# Scope the answer like the Library: --tag (id or name), --status, --kind
# (single|meeting). Combinable; an unscoped ask searches the whole library.
phoneme ask "what are my action items?" --tag work --kind meeting--top-k <N>— max grounding chunks to retrieve (default8, clamped server-side).--json— collect the whole stream into{ "answer": "...", "sources": [...] }instead of printing it live, where each source carries{ n, recording_id, label, chunk_index, snippet, relevance }so[n]resolves tosources[n-1].recording_id.
A provider failure mid-answer (e.g. the model truncated, or the endpoint went away) prints to stderr and exits non-zero; the partial answer printed so far is left intact.
Clear every stored embedding and re-embed the whole library with the currently-configured embedding model. Run this after changing the embedding model — a different model/dimension makes old vectors unsearchable. Returns immediately; the re-embed runs in the background on the daemon (watch progress in the daemon log).
phoneme reembedRe-run the post-processing hook against a recording's already-stored transcript,
without re-transcribing. The hook runs in the background; observe the result via
phoneme watch (hook_done / hook_failed events). --command re-fires a
specific hook instead of the configured default — for safety the daemon only
accepts a command already present in the configured hook allowlist.
phoneme refire-hook 20260519T143500823
phoneme refire-hook 20260519T143500823 --command "python notify.py"Inspect and manage the transcription pipeline queue. With no subcommand,
defaults to list.
# List the in-flight item plus everything still pending (table)
phoneme queue
phoneme queue list
# Inbox depth counts (pending / processing / done / failed)
phoneme queue counts
# Pause / resume the queue, or check whether it's paused
phoneme queue pause
phoneme queue resume
phoneme queue status
# Set the exact pending claim order (worker claims in this order)
phoneme queue reorder 20260519T143500823 20260519T143501999
# Remove one still-pending recording from the queue
phoneme queue cancel 20260519T143500823
# Cancel the item currently being processed (abort the in-flight work)
phoneme queue cancel-processing 20260519T143500823
# Skip the LLM step (cleanup / summary / tagging) currently running for the
# active item — the pipeline continues with whatever comes next. A no-op when
# no LLM stage is streaming (transcription and hooks aren't skippable; use
# cancel-processing for those). Mirrors the queue panel's ⏭ button.
phoneme queue skip
# Remove ALL still-pending items at once
phoneme queue cancel-all
# Empty the inbox failed/ quarantine ("dismiss failed")
phoneme queue clear-failed
# Dismiss ONE quarantined item by id (the per-item counterpart to clear-failed)
phoneme queue dismiss-failed 20260519T143500823Work with the opt-in dictation re-grab history — a short, bounded record of
recent in-place dictations (the text as typed, no audio) so a past one can be
re-inserted or re-copied. Empty unless [in_place].keep_history is on (Settings →
Dictation, or set it in config). The history is bounded to the newest 50.
# Recent dictations, newest first (id · time · app · chars · text preview)
phoneme dictation history
phoneme dictation history --limit 10
phoneme dictation history --json
# Re-insert a past dictation's text at the CURRENT cursor. It lands wherever the
# caret is NOW (the window you dictated into is long gone). Defaults to the
# configured type_mode; --paste / --type override how it lands. Needs a daemon.
phoneme dictation regrab 12
phoneme dictation regrab 12 --paste
# Forget one dictation by id; clear the whole history
phoneme dictation forget 12
phoneme dictation clearPrivacy: the history retains the exact text that was typed, including ephemeral dictations (
save_to_library = false) that otherwise leave nothing behind — a dictated password or note is kept until you clear it. It is never logged. Turn it off (or clear it) whenever you don't want it kept.
Delete a recording and its associated audio file.
phoneme delete 20260519T143500823
# Keep the original .wav file on disk, just remove the catalog entry
phoneme delete 20260519T143500823 --keep-audioTest and manage your post-processing hooks.
# Run the configured hook with a mock payload to test your script
phoneme hook testBulk export all audio and metadata into a zip archive, or export a recording's transcript segments as a caption file (SRT or WebVTT).
Library zip export
phoneme export backup.zipCaption export flags
| Flag | Description |
|---|---|
--captions <RECORDING_ID> |
Export captions for this recording instead of zipping the library. |
--format <srt|vtt> |
Caption format: srt (default) or vtt. |
-o, --out <FILE> |
Write captions to FILE. Use - for stdout. Defaults to <recording-id>.srt / <recording-id>.vtt in the current directory. |
Examples
# Export captions as SRT (default) for a recording — writes 20260519T143500823.srt
phoneme export --captions 20260519T143500823
# Export as WebVTT to an explicit path
phoneme export --captions 20260519T143500823 --format vtt -o captions/meeting.vtt
# Pipe SRT directly to another tool
phoneme export --captions 20260519T143500823 -o -Recordings that have no stored segments (e.g. transcribed before timing data was captured) print a clear message and exit non-zero — retranscribe the recording to generate segments first.
Restore a library backup zip — the inverse of phoneme export <FILE>. Each
recording in the archive is re-inserted into the catalog and its audio copied
into the configured audio directory.
phoneme import-backup backup.zipThe daemon holds the catalog database open while it runs, so import-backup
shuts a running daemon down first and waits for it to release the file (like
doctor --rebuild-catalog). Start the daemon again afterwards with
phoneme daemon start.
Restore is idempotent: a recording whose id already exists in the catalog is skipped (counted, never overwritten), so re-running on the same backup never duplicates a row or reverts a hand edit you made since. The command prints how many recordings it imported and how many it skipped. What round-trips is what the export captured — the recording metadata, transcript, and tags, plus the audio; derived data (segments, embeddings, voiceprints) is regenerated by a retranscribe.
Manage recording tags. Wherever a <TAG> is taken (attach / detach / merge), it
accepts either a numeric tag id or a tag name.
# List tags attached to a recording; --all also includes orphaned (unused) tags
phoneme tag list
phoneme tag list --all
# Add a new tag with an optional color
phoneme tag add work --color "#ff0000"
# Rename and/or recolor an existing tag (by id)
phoneme tag update 1 work --color "#4caf50"
# Delete a tag by ID
phoneme tag delete 1
# Attach / detach a tag (by name or id) to a recording
phoneme tag attach 20260519T143500823 work
phoneme tag detach 20260519T143500823 work
# List the tags attached to one recording
phoneme tag for 20260519T143500823
# Show how many recordings each tag is attached to
phoneme tag usage
# Review one recording's pending auto-tag suggestions
phoneme tag suggestions 20260519T143500823
# Approve a suggestion (creates + attaches the real tag, drops the proposal)
phoneme tag suggestions 20260519T143500823 --approve work
# Dismiss a suggestion (drops the proposal, attaches nothing)
phoneme tag suggestions 20260519T143500823 --dismiss spam
# Drop every pending auto-tag suggestion across the whole library (approved
# tags stay attached; only not-yet-decided proposals are discarded)
phoneme tag clear-suggestions
# Merge one tag into another: re-point all recordings, then delete the source
phoneme tag merge old-name workList the cross-recording entity facet — every distinct entity the LLM
entity-extraction step pulled across the whole library (people, organizations,
topics, terms), each with the number of recordings that mention it, grouped by
kind. The entity counterpart of phoneme tag list, and the CLI face of the GUI
sidebar's browse-by-entity section. Extract entities for a recording with
phoneme suggest-entities <ID>.
To then list the recordings for one entity, the GUI sidebar applies the entity
filter on the recording list (ListFilter.entity_value / entity_kind); from
the CLI, narrow by kind and read the values you care about.
# List every extracted entity across the library, grouped by kind, with counts
phoneme entities
# Show only one kind (person / org / topic / term)
phoneme entities --kind person
# Machine-readable (one JSON object per facet row: {kind, value, count})
phoneme entities --jsonThe mutating sub-actions mirror the GUI's per-recording chips and the Entity
manager (same IPC; changes show up live in the app). Hand-curated entities are
source='manual' and survive re-extraction.
# Add an entity to a recording by hand (kind ∈ person/org/topic/term)
phoneme entities add 20260519T143500823 person "Ada Lovelace"
# Edit one entity, keyed by its current kind + value; change either field
phoneme entities edit 20260519T143500823 org "acme" --to-value "Acme Corp"
phoneme entities edit 20260519T143500823 topic "ml" --to-kind term --to-value "ML"
# Delete one entity from a recording
phoneme entities delete 20260519T143500823 topic "roadmap"
# Library-wide merge: fold variant values of a kind into one canonical value
phoneme entities merge org "Acme Corp" "acme" "ACME" "acme corp"List the cross-recording task list — every action item the LLM
task-extraction step pulled across the whole library — open first, then newest
recording first, each line showing [x]/[ ] done state, the text, an optional
due hint, and the recording it came from. The CLI face of the GUI sidebar's Tasks
section. Extract tasks for a recording with phoneme suggest-tasks <ID>. The
done / undone sub-actions toggle one task by its row id (shown in the list
and by phoneme show).
# List every extracted task across the library (open first)
phoneme tasks
# Only the still-open tasks
phoneme tasks --open
# Machine-readable (one JSON object per task: {recording_id, title, id, text, due_hint, done})
phoneme tasks --json
# Mark task #3 of a recording done (or undone)
phoneme tasks done 20260519T143500823 3
phoneme tasks undone 20260519T143500823 3The mutating sub-actions mirror the GUI task manager (same IPC). Hand-added /
edited tasks are source='manual' and survive re-extraction.
# Add a task by hand; --due is an optional free-text hint
phoneme tasks add 20260519T143500823 "Send the roadmap" --due "by Friday"
# Edit a task's text; the due hint is preserved unless changed/cleared
phoneme tasks edit 20260519T143500823 3 "Send the v2 roadmap"
phoneme tasks edit 20260519T143500823 3 "Send it" --due "Monday"
phoneme tasks edit 20260519T143500823 3 "Send it" --clear-due
# Delete a task, or set the task order (ids in the order you want)
phoneme tasks delete 20260519T143500823 3
phoneme tasks reorder 20260519T143500823 5 2 4 1Manage config profiles (named full-config snapshots).
# List saved profiles
phoneme profile list
# Save the current config as a named profile snapshot
phoneme profile save work_mode
# Switch the active config to a saved profile and reload the daemon
phoneme profile use work_modesave and list are purely local (they copy/read files under
%APPDATA%\phoneme\profiles\); use overwrites the live config.toml with
the snapshot and sends the daemon a reload. The GUI equivalent is
Settings → Managers → Profiles.
Run a health check on your system.
Checks: config file presence, audio-directory writability, free disk space on the volumes holding the recordings and the app data (catalog/queue/models), hook command resolvability, model-file integrity (the Whisper model — plus the live-preview, semantic-search and diarization models when those features are on: missing, 0-byte and implausibly small files are all caught), Whisper server reachability, the dedicated live-preview server (when configured on its own port), and Ollama (optional).
Every check carries a category describing how severe a failure is:
- critical — recording or transcription is broken (unwritable audio dir, missing/corrupt Whisper model, unreachable Whisper server, under ~500 MB of free disk);
- warning — something is degraded but capture + transcription still work (hook not resolvable, optional model missing, under ~2 GB of free disk);
- info — informational only; never fails the run.
Passing checks print as one line. Failing checks get a colored category badge
plus two indented lines: what the check verifies, and a fix: hint with the
next step. The exit code is non-zero when any warning- or critical-category
check fails.
phoneme doctor
# Attempt repairs for failed checks: when the Whisper / live-preview server
# probe fails, asks the daemon to sweep hung/orphaned whisper-server processes
# and respawn them from config, then re-probes and reports the fresh results.
phoneme doctor --fix
# Force the catalog to rebuild itself from orphan files on disk. Asks a
# running daemon to shut down and WAITS (up to 15s) for it to actually exit
# before deleting catalog.db (plus its -wal/-shm sidecars) — if the daemon
# won't die in time, the command fails and leaves the catalog untouched.
phoneme doctor --rebuild-catalog
# NON-destructive recovery: ask the running daemon to scan the audio folder and
# re-link any .wav files that have no catalog row (re-importing + re-transcribing
# them), leaving every existing recording untouched. Prefer this over
# --rebuild-catalog when you've just lost rows, not the whole catalog.
phoneme doctor --reimport
# Write an opt-in, local-only diagnostics bundle for bug reports and print the
# file path (nothing else). The bundle holds app + OS info, the MASKED config
# (secrets redacted), and a tail of the daemon log — no audio, transcripts or
# catalog. Parity with the GUI Doctor's "Export diagnostics" button; needs a
# running daemon (errors if one isn't up).
phoneme doctor --diagnosticsWith --json, each check object keeps the original name/ok/detail keys
and additionally carries category ("critical" | "warning" | "info"),
explanation, and fix_hint (string or null) — additive only, so existing
consumers keep working. The Model storage check reports the total disk used by
downloaded transcription models — usually the largest thing under app-data.
Manage the downloaded local transcription (whisper.cpp) models on disk. They are 75 MB–3 GB each, download on demand, and are never auto-removed, so they're the usual answer to "what's filling up app-data". Operates on the model files directly — no daemon required — and a removed model re-downloads the next time it's selected.
phoneme model # defaults to `ls`
phoneme model ls # list downloaded models with sizes + a running total
phoneme model ls --json # [{name, path, bytes, active}]
# Download a model (the headless equivalent of the desktop "Download" cards):
# fetches from the pinned whisper.cpp source and verifies its SHA-256.
phoneme model get ggml-small.en.bin
# Select a downloaded model for transcription (writes whisper.model_path + reloads
# a running daemon) — the headless equivalent of "Select":
phoneme model use ggml-small.en.bin
# Delete one to reclaim space (name as shown by `model ls`):
phoneme model rm ggml-large-v3.bin
# Refuses a model that's currently configured for transcription / live-preview /
# dictation — pass --force to remove it anyway:
phoneme model rm ggml-small.en.bin --forceget, use, and rm only accept known whisper model filenames (an allow-list),
so they can never download to, select, or delete anything outside the models
directory; get deletes the file and fails on a checksum mismatch. This is full
parity with Settings → Whisper (download cards · Select · the Remove
button). Run phoneme model ls with no models downloaded to see the names.
Manage configuration.
# With no subcommand: print the active config as TOML. Secret values (API keys,
# the webhook HMAC secret) are masked as <redacted> so the dump is safe to paste
# or pipe.
phoneme config
# Print the real secret values instead of <redacted> — pass it only when you
# deliberately need the keys.
phoneme config --show-secrets
# Print the path to the active config file
phoneme config path
# Set a config value (parses bool/int/float, else string)
phoneme config set whisper.mode external
# Hot-reload the configuration file from disk. The daemon immediately applies
# changes (hotkeys, models, …) without restarting.
phoneme config reloadconfig set semantics:
- It writes the file the daemon actually reads — the
PHONEME_CONFIGoverride when that env var is set, otherwise the per-user default (config pathprints the default; the override wins everywhere). - The full updated config is validated first. A value with the wrong type
for its field, or one that fails the same
validate()the daemon runs on load (e.g. an out-of-rangerecording.sample_rate), is rejected with exit code for invalid config and nothing is written —config setcan never produce a file the daemon refuses to load. - The write is atomic: the new content lands in a
.toml.tmpsibling and is renamed over the real file, so a crash mid-write leaves the previous config intact rather than a truncated half-file.
The config is validated automatically when the daemon loads or reloads it; an invalid file is rejected with an error. There is no separate
config validatesubcommand.
Subscribe to live daemon events as a stream of JSON objects. Useful for building your own UI or integration on top of Phoneme.
phoneme watchSend daemon control commands.
# Spawn the daemon in a detached background process
phoneme daemon start
# Print the daemon's status
phoneme daemon status
# Graceful shutdown: sends the Shutdown IPC and waits (up to ~5s) for the
# daemon to actually exit
phoneme daemon stopdaemon stop is the full shutdown chain: the daemon acknowledges the request
before exiting, stops and queues any in-flight recording (nothing is
corrupted mid-write; the next daemon run transcribes it), kills the
whisper-server(s) it spawned, and stops an Ollama it auto-launched — an Ollama
you started yourself is never touched. Stopping an already-stopped daemon
prints daemon is not running and succeeds (it never spawns one just to stop
it).
Print version and commit info.
phoneme versionPrint a shell-completion script for the chosen shell to stdout. This is pure
local generation — it never contacts the daemon, so it works before the daemon
is even running. Supported shells: bash, zsh, fish, powershell,
elvish.
Install one-liners:
# bash — drop the script where bash-completion looks for it
phoneme completions bash > ~/.local/share/bash-completion/completions/phoneme
# zsh — write into a directory on your $fpath (here a personal completions dir),
# then ensure that dir is on fpath and compinit runs in ~/.zshrc
mkdir -p ~/.zfunc
phoneme completions zsh > ~/.zfunc/_phoneme
# in ~/.zshrc, before `compinit`: fpath=(~/.zfunc $fpath)
# fish
phoneme completions fish > ~/.config/fish/completions/phoneme.fish# PowerShell — load completions for the current session only
phoneme completions powershell | Out-String | Invoke-Expression
# Persist across sessions: append the same line to your profile
'phoneme completions powershell | Out-String | Invoke-Expression' |
Add-Content $PROFILEWhile the daemon is usually auto-spawned by the CLI, the System Tray application, or phoneme daemon start, you can run it directly:
# Run the daemon in the foreground
phoneme-daemon
# Run with explicit debug logging (PowerShell)
$env:RUST_LOG = "debug"; phoneme-daemon