From 3ef9943658cccf0a05dff51a8a90bc8dc9d5d576 Mon Sep 17 00:00:00 2001 From: Fin Griffin Date: Thu, 2 Jul 2026 10:06:35 +0100 Subject: [PATCH 1/3] feat(docs): update README and documentation --- README.md | 159 +++++++++++++++++++++++++++++++++++++-- cspell/library-words.txt | 1 + cspell/project-words.txt | 1 + docs/00_data.md | 39 +++++++++- docs/01_stylometry.md | 13 ++-- docs/02_evals.md | 6 +- docs/03_rl.md | 50 +++++++++++- docs/04_cli.md | 6 +- 8 files changed, 252 insertions(+), 23 deletions(-) diff --git a/README.md b/README.md index f6a441d..a0ecb16 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,53 @@ -# 🗣️ VOICE +[![GPL-3 License](https://img.shields.io/badge/License-GPLv3-brightgreen.svg)](https://opensource.org/licenses/GPL-3.0) +[![Issues](https://img.shields.io/github/issues-raw/acceleratescience/voice.svg?maxAge=25000)](https://github.com/acceleratescience/voice/issues) +[![GitHub contributors](https://img.shields.io/github/contributors/acceleratescience/voice.svg?style=flat)]() +[![GitHub pull requests](https://img.shields.io/github/issues-pr/acceleratescience/voice.svg?style=flat)]() +[![PR's Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg?style=flat)](http://makeapullrequest.com) +
+[![GitHub stars](https://img.shields.io/github/stars/acceleratescience/voice.svg?style=social&label=Star)]() +[![GitHub watchers](https://img.shields.io/github/watchers/acceleratescience/voice.svg?style=social&label=Watch)]() +[![GitHub forks](https://img.shields.io/github/forks/acceleratescience/voice.svg?style=social&label=Fork)]() -... +
+
+

🗣️VOICE

+

+ Research software for fine-tuning large language models to match a target author's writing style, + combining a calibrated stylometric evaluation suite with LoRA supervised fine-tuning and GRPO reinforcement learning. +

+

+ Documentation + · + Report Bug + · + Request Feature +

+
+ +
+ Table of Contents +
    +
  1. Overview
  2. +
  3. Documentation
  4. +
  5. Installation
  6. +
  7. Quick Start
  8. +
  9. Python API
  10. +
  11. Contributing
  12. +
  13. License
  14. +
+
+ +--- + +## Overview + +VOICE is an NLP research toolkit for **stylometric style alignment**: fine-tuning a large language model so its outputs are stylistically consistent with a target author. The toolkit provides three integrated components: + +- **Stylometry**: a suite of surface writing style metrics (word length moments, vocabulary richness, function word frequency, character n-gram diversity) organised into four metric groups. +- **Evaluation**: a calibrated alignment score $\mathcal{S}\in[0, 1]$ comparing model completions to a reference corpus using Wasserstein distance, normalised against within-author variation estimated from the training split via bootstrap resampling. Uncertainty estimates are provided via jackknife resampling. +- **Fine-tuning**: a CLI for running LoRA experiments (single runs or hyperparameter sweeps) via [axolotl](https://github.com/axolotl-ai-cloud/axolotl), with style alignment scoring built in. Both supervised fine-tuning and GRPO are made available, with the latter using a custom *typicality reward* function. + +

(back to top)

--- @@ -8,7 +55,109 @@ ``` docs/ -├── 00_data.md — Included datasets -├── 01_evals.md — Evaluation suite: scoring and interpretation -└── 02_stylometry.md — Stylometric metrics: definition and catalogue +├── 00_data.md - Datasets: format and Hugging Face references +├── 01_stylometry.md - Stylometric metrics: definitions and catalogue +├── 02_evals.md - Evaluation suite: scoring methodology and API +├── 03_rl.md - Reward functions for GRPO training +└── 04_cli.md - Fine-tuning CLI: single runs and sweeps +``` + +

(back to top)

+ +--- + +## Installation + +VOICE requires Python 3.12. Training functionality requires Linux (*pinned axolotl version is Linux only*); the evaluation and stylometry components run on all platforms. + +Using [uv](https://github.com/astral-sh/uv) (recommended): + +```bash +git clone https://github.com/acceleratescience/voice +cd voice +uv sync +source .venv/bin/activate +``` + +Authenticate with Hugging Face before running fine-tuning jobs: + +```bash +huggingface-cli login +``` + +

(back to top)

+ +--- + +## Quick Start + +Run a single fine-tuning job: + +```bash +voice finetune single configs/single/example.yaml +``` + +This trains a LoRA adapter on top of Llama-3.1-8B-Instruct and writes per-epoch completions and alignment scores to `runs/{run_name}/`. + +For a hyperparameter sweep: + +```bash +voice finetune sweep configs/sweep/example.yaml +``` + +See [docs/04_cli.md](docs/04_cli.md) for the full CLI reference, config format and output layout. + +

(back to top)

+ +--- + +## Python API + +The evaluation suite can be used independently of the CLI: + +```python +from voice import get_dataset, make_comparison, DatasetSpec +from voice.datasets._schema import Split + +ds = get_dataset( + DatasetSpec( + repo_id="AccelerateScience/bo-press-conference-qa", + splits=(Split.TRAIN, Split.VALIDATION, Split.TEST), + ) +) + +# completions: list[Example] — model outputs on the same prompts as ds.validation +results = make_comparison(completions, ds) + +print(results.score) # overall alignment score in [0, 1] +print(results.group_tails) # per-group breakdown +print(results.score_ci()) # 90% jackknife confidence interval ``` + +See [docs/02_evals.md](docs/02_evals.md) for the full scoring methodology and available diagnostic fields. + +

(back to top)

+ +--- + +## Contributing + +Contributions are welcome. To propose a change: + +1. Fork the repository +2. Create a feature branch (`git checkout -b feature/my-change`) +3. Commit your changes (`git commit -m 'Add my change'`) +4. Push to the branch (`git push origin feature/my-change`) +5. Open a pull request + +Please raise an issue first for substantial changes. + +

(back to top)

+ +--- + +## License + +Distributed under the GNU General Public License. See `LICENSE` for details. + +

(back to top)

diff --git a/cspell/library-words.txt b/cspell/library-words.txt index 730b848..34c8f99 100644 --- a/cspell/library-words.txt +++ b/cspell/library-words.txt @@ -25,3 +25,4 @@ PYTHONPATH venv cuda funcs +pathlib diff --git a/cspell/project-words.txt b/cspell/project-words.txt index 8f88110..5f4cff7 100644 --- a/cspell/project-words.txt +++ b/cspell/project-words.txt @@ -29,3 +29,4 @@ Wegmann embs cdfs unprimed +qwen diff --git a/docs/00_data.md b/docs/00_data.md index e504061..038f68b 100644 --- a/docs/00_data.md +++ b/docs/00_data.md @@ -2,12 +2,14 @@ --- -Two example datasets are included, each containing press conference Q&A transcripts split into train, validation and test sets: +VOICE expects datasets in chat format with train, validation, and test splits. Each example is a `messages` list containing system, user, and assistant turns; the assistant turn is the text against which stylometric alignment is measured. -| President | HuggingFace | +Two example datasets are hosted on Hugging Face: + +| Dataset | HuggingFace | |---|---| -| Barack Obama | [`AccelerateScience/bo-press-conference-qa`](https://huggingface.co/datasets/AccelerateScience/bo-press-conference-qa) | -| George W. Bush | [`AccelerateScience/gwb-press-conference-qa`](https://huggingface.co/datasets/AccelerateScience/gwb-press-conference-qa) | +| Barack Obama press conference Q&A | [`AccelerateScience/bo-press-conference-qa`](https://huggingface.co/datasets/AccelerateScience/bo-press-conference-qa) | +| George W. Bush press conference Q&A | [`AccelerateScience/gwb-press-conference-qa`](https://huggingface.co/datasets/AccelerateScience/gwb-press-conference-qa) | Each example is a single JSONL record in chat format: @@ -20,3 +22,32 @@ Each example is a single JSONL record in chat format: ] } ``` + +## Bringing Your Own Data + +Any dataset following the chat format above can be used by specifying its Hugging Face repo id in the axolotl config: + +```yaml +datasets: + - path: your-org/your-dataset + type: chat_template + field_messages: messages + roles_to_train: [assistant] +``` + +For local datasets, use `LocalDatasetSpec` and provide one `.jsonl` file per split: + +```python +from voice import get_dataset, LocalDatasetSpec +from voice.datasets._schema import Split +from pathlib import Path + +ds = get_dataset( + LocalDatasetSpec( + path=Path("data/my-author"), + splits=(Split.TRAIN, Split.VALIDATION, Split.TEST), + ) +) +``` + +VOICE expects files at `{path}/train.jsonl`, `{path}/validation.jsonl`, and `{path}/test.jsonl`. Columns `system`, `question`, and `answer` are also accepted as an alternative to `messages`. diff --git a/docs/01_stylometry.md b/docs/01_stylometry.md index ee053e4..ccb7484 100644 --- a/docs/01_stylometry.md +++ b/docs/01_stylometry.md @@ -8,9 +8,12 @@ $$f : \mathcal{T} \rightarrow \mathbb{R}$$ that maps a text string to a real scalar capturing some surface property of writing style. VOICE treats each metric as a distribution over a corpus: given a set of texts, $f$ is applied to each one to produce a sample from the author's stylometric distribution for that feature. +Metrics are organised into four groups. Because metrics within the same group are highly correlated (they characterise the same underlying linguistic object from different angles), the evaluation suite averages within groups before aggregating across them, preventing any single dimension from dominating the alignment score through sheer metric count. + ## Implemented Metrics ### Word Length Distribution + Moments of the per-word character length distribution. | Metric | Description | @@ -21,6 +24,7 @@ Moments of the per-word character length distribution. | `kurtosis_word_length` | Kurtosis of word length | ### Vocabulary Richness + Type–token and word statistics measuring lexical diversity. | Metric | Description | @@ -32,21 +36,16 @@ Type–token and word statistics measuring lexical diversity. | `tri_legomena_ratio` | Fraction of words appearing exactly three times | ### Function Words + | Metric | Description | |---|---| | `function_word_ratio` | Proportion of tokens drawn from a closed function-word list | ### Character N-gram Diversity + Type–token ratio and MATTR computed over character $n$-grams for $n \in \{3, 4, 5\}$. | Metric | Description | |---|---| | `char_{n}gram_type_token_ratio` | Character $n$-gram TTR | | `char_{n}gram_moving_avg_type_token_ratio` | Character $n$-gram MATTR | - -### Text Length -| Metric | Description | -|---|---| -| `num_words` | Total word count | - -> **Note:** `num_words` may be deprecated in a future release. It does not tend to be used as a signature for authorship attribution in the broader stylometry literature. diff --git a/docs/02_evals.md b/docs/02_evals.md index a5d1d16..054b653 100644 --- a/docs/02_evals.md +++ b/docs/02_evals.md @@ -28,7 +28,7 @@ $t$ is the **tail value**: the fraction of self-distances that *exceed* the obse ### Step 4: Group and Overall Score -Metrics are organised into groups (see [Stylometric Metrics](01_stylometry.md)). Each group captures a distinct linguistic object (word length, vocabulary richness, .etc) and the metrics within it represent different ways of characterising that same object. Because they study the same underlying property, within-group metrics are highly correlated; averaging within groups before aggregating across them prevents any single linguistic dimension from dominating the score simply by having more metrics defined for it. +Metrics are organised into groups (see [Stylometric Metrics](01_stylometry.md)). Each group captures a distinct linguistic object (word length, vocabulary richness, etc.) and the metrics within it represent different ways of characterising that same object. Because they study the same underlying property, metrics in the same group are highly correlated; averaging within groups before aggregating across them prevents any single linguistic dimension from dominating the score simply by having more metrics defined for it. Within each group the tail values are averaged: @@ -112,6 +112,6 @@ results.group_tail_cis() # dict[MetricGroup, tuple[float, float]] | None results.score_ci() # tuple[float, float] | None ``` -Per-metric tails are useful for identifying which stylometric dimensions are misaligned; per-group tails aggregate correlated metrics and correspond directly to the terms summed in the overall score. +Per-metric tails are useful for identifying which stylometric dimensions are misaligned; per-group tails aggregate correlated metrics and correspond directly to the terms in the geometric mean. -The confidence interval methods accept a `confidence` keyword (default 0.90, see `UNCERTAINTY_DEFAULTS`). Passing `uncertainty=False` to `make_comparison` skips the jackknife pass, in which case the interval methods return `None`. +The confidence interval methods accept a `confidence` keyword (default `0.90`, see `UNCERTAINTY_DEFAULTS`). Passing `uncertainty=False` to `make_comparison` skips the jackknife pass, in which case the interval methods return `None`. diff --git a/docs/03_rl.md b/docs/03_rl.md index 1163109..6466793 100644 --- a/docs/03_rl.md +++ b/docs/03_rl.md @@ -1,4 +1,10 @@ -# Reward Function +# Reward Functions + +--- + +VOICE provides a GRPO reward function that guides an online model to produce completions more typical of the target author's style than a reference (base) model. It can be used as the primary training signal or in combination with other rewards. + +The same `voice finetune` CLI is used for GRPO runs - the axolotl config simply adds `rl: grpo` and a few extra keys. See [Fine-Tuning CLI](04_cli.md) for CLI options and output layout. --- @@ -10,7 +16,7 @@ Let $\pi_{\theta}$ and $\pi_{\text{base}}$ denote the online and reference (base The typicality reward reuses the stylometric metrics directly (see [Stylometric Metrics](01_stylometry.md)), and asks how typical a single completion is of the author's general style. -For each metric $f$, let $\hat{F}_f$ be its empirical CDF over the author's training split, the same split used to construct $\mathcal{W}_0$ in the eval suite. For a text $a$, define its percentile $u_f(a) = \hat{F}_f(f(a)) \in [0, 1]$ and +For each metric $f$, let $\hat{F}_f$ be its empirical CDF over the author's training split - the same split used to construct $\mathcal{W}_0$ in the eval suite. For a text $a$, define its percentile $u_f(a) = \hat{F}_f(f(a)) \in [0, 1]$ and $$\tau_f(a) = 1 - 2\left|u_f(a) - \frac{1}{2}\right| \in [0, 1]$$ @@ -21,3 +27,43 @@ $$\bar{\tau}_g(a) = \frac{1}{|g|} \sum_{f \in g} \tau_f(a), \qquad T(a) = \left( using the same floor $\varepsilon$ as the eval suite. The reward is then: $$r_{\text{typicality}}(d_i, d_i^{\text{base}}) = \max\big(T(d_i) - T(d_i^{\text{base}}),\ 0\big)$$ + +This is a relative reward: a completion only receives a positive signal if it is more typical of the author's style than what the base model would have produced for the same prompt. + +--- + +## Configuration + +### Dataset + +GRPO training requires a `ref_completion` column in the dataset containing pre-generated base model completions for each prompt. The `voice._prompt_transform` axolotl dataset type extracts the prompt, the ground-truth assistant turn, and the reference completion and makes them available to TRL: + +```yaml +datasets: + - path: AccelerateScience/bo-press-conference-qa + type: voice._prompt_transform +``` + +### Axolotl Config + +A minimal GRPO config adds four keys to the standard axolotl setup: + +```yaml +rl: grpo + +trl: + num_generations: 8 + max_completion_length: 1024 + reward_funcs: + - voice.rl.rewards.typicality_reward + reward_weights: + - 1.0 + +plugins: + - voice.finetune.callbacks.EvalCompletionsPlugin + - voice.finetune.callbacks.TypicalityRewardPlugin +``` + +`TypicalityRewardPlugin` pre-computes and caches the per-metric empirical CDFs from the training split before training begins. The reward function reads from this cache at each step; calling `typicality_reward` without the plugin registered raises a `RuntimeError`. + +A full sweep example is at `configs/sweep/obama/bo_qwen3_14b_grpo.yaml`. diff --git a/docs/04_cli.md b/docs/04_cli.md index a5662a9..17be2e2 100644 --- a/docs/04_cli.md +++ b/docs/04_cli.md @@ -2,7 +2,7 @@ --- -The `voice finetune` command group runs LoRA fine-tuning jobs via [axolotl](https://github.com/axolotl-ai-cloud/axolotl). Two subcommands are available: `single` for a single training run and `sweep` for a hyperparameter grid search. +The `voice finetune` command group runs LoRA fine-tuning jobs via [axolotl](https://github.com/axolotl-ai-cloud/axolotl). Two subcommands are available: `single` for a single training run and `sweep` for a hyperparameter grid search. Both work for SFT and GRPO runs - the training mode is controlled by the config. --- @@ -87,7 +87,9 @@ axolotl: All five `sweep:` axes are required. The CLI expands them into a Cartesian product; `lora_alpha` is set equal to `lora_r` for each run and does not need to be listed. Per-run values for `learning_rate`, `lora_r`, `lora_alpha`, `lora_target_modules`, `micro_batch_size`, and `gradient_accumulation_steps` are merged on top of the shared `axolotl:` block before training starts. -A full example is at `configs/sweep/example.yaml`. +Additional axes (such as `trl.beta` for GRPO sweeps) can be added freely and are passed through to the axolotl config using dot notation. + +A full SFT example is at `configs/sweep/example.yaml`. A full GRPO example is at `configs/sweep/obama/bo_qwen3_14b_grpo.yaml`. ### Resume From b53a8ae732822be9f16e2802e0db26eac71305de Mon Sep 17 00:00:00 2001 From: Fin Griffin Date: Thu, 2 Jul 2026 10:12:29 +0100 Subject: [PATCH 2/3] chore: add example configs from history paper --- .gitignore | 8 ++- configs/sweep/bush/gwb_qwen3_14b_grpo.yaml | 71 +++++++++++++++++++ configs/sweep/bush/gwb_qwen3_14b_sft.yaml | 79 ++++++++++++++++++++++ configs/sweep/obama/bo_qwen3_14b_grpo.yaml | 71 +++++++++++++++++++ configs/sweep/obama/bo_qwen3_14b_sft.yaml | 76 +++++++++++++++++++++ cspell/library-words.txt | 1 + 6 files changed, 305 insertions(+), 1 deletion(-) create mode 100644 configs/sweep/bush/gwb_qwen3_14b_grpo.yaml create mode 100644 configs/sweep/bush/gwb_qwen3_14b_sft.yaml create mode 100644 configs/sweep/obama/bo_qwen3_14b_grpo.yaml create mode 100644 configs/sweep/obama/bo_qwen3_14b_sft.yaml diff --git a/.gitignore b/.gitignore index a111069..26d182a 100644 --- a/.gitignore +++ b/.gitignore @@ -75,6 +75,12 @@ configs/sweep/** !configs/single/example.yaml !configs/single/test.yaml !configs/sweep/example.yaml - +!configs/sweep/obama/ +!configs/sweep/obama/bo_qwen3_14b_grpo.yaml +!configs/sweep/obama/bo_qwen3_14b_sft.yaml +!configs/sweep/bush/ +!configs/sweep/bush/gwb_qwen3_14b_grpo.yaml +!configs/sweep/bush/gwb_qwen3_14b_sft.yaml +w # Paper write up paper/** diff --git a/configs/sweep/bush/gwb_qwen3_14b_grpo.yaml b/configs/sweep/bush/gwb_qwen3_14b_grpo.yaml new file mode 100644 index 0000000..037177b --- /dev/null +++ b/configs/sweep/bush/gwb_qwen3_14b_grpo.yaml @@ -0,0 +1,71 @@ +sweep: + learning_rate: [1.0e-5, 5.0e-6, 1.0e-6] + trl.beta: [0.01, 0.005, 0.015] + + lora_r: [32] + micro_batch_size: [2] + gradient_accumulation_steps: [8] + + target_layers: + - name: mlp_attention + modules: [q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj] + +axolotl: + base_model: AccelerateScience/Qwen3-14B-gwb-press-conference-sft-02-merged + rl: grpo + chat_template: tokenizer_default + chat_template_kwargs: + enable_thinking: false + + datasets: + - path: AccelerateScience/gwb-press-conference-qa + type: voice._prompt_transform + + skip_prepare_dataset: true + val_set_size: 0.0 + hf_use_auth_token: true + + trl: + num_generations: 8 + max_completion_length: 1024 + temperature: 0.7 + top_p: 1.0 + top_k: 0 + repetition_penalty: 1.0 + reward_funcs: + - voice.rl.rewards.typicality_reward + reward_weights: + - 1.0 + + adapter: lora + lora_dropout: 0.0 + + sequence_len: 2048 + + num_epochs: 4 + + bf16: true + tf32: true + flash_attention: true + gradient_checkpointing: true + + optimizer: adamw_torch_fused + lr_scheduler: cosine + warmup_ratio: 0.03 + weight_decay: 0.0 + + eval_strategy: "no" + save_strategy: epoch + save_total_limit: 4 + + special_tokens: + pad_token: "<|eot_id|>" + + plugins: + - voice.finetune.callbacks.EvalCompletionsPlugin + - voice.finetune.callbacks.TypicalityRewardPlugin + + use_wandb: true + wandb_project: VOICE + wandb_entity: accelerate-science + logging_steps: 1 diff --git a/configs/sweep/bush/gwb_qwen3_14b_sft.yaml b/configs/sweep/bush/gwb_qwen3_14b_sft.yaml new file mode 100644 index 0000000..8f7a780 --- /dev/null +++ b/configs/sweep/bush/gwb_qwen3_14b_sft.yaml @@ -0,0 +1,79 @@ +sweep: + learning_rate: [5.0e-4, 3.0e-4, 2.0e-4, 1e-4] + lora_r: [2, 4, 8, 16, 32] + + micro_batch_size: [2, 4, 8] + gradient_accumulation_steps: [1] + + target_layers: + - name: mlp + modules: [gate_proj, up_proj, down_proj] + - name: attention + modules: [q_proj, k_proj, v_proj, o_proj] + - name: mlp_attention + modules: [gate_proj, up_proj, down_proj, q_proj, k_proj, v_proj, o_proj] + +axolotl: + + base_model: Qwen/Qwen3-14B + + datasets: + - path: AccelerateScience/gwb-press-conference-qa + type: chat_template + field_messages: messages + roles_to_train: + - assistant + train_on_eos: last + eot_tokens: ["<|eot_id|>"] + + special_tokens: + pad_token: "<|eot_id|>" + + hf_use_auth_token: true + + chat_template: tokenizer_default + val_set_size: 0.0 + + test_datasets: + - path: AccelerateScience/gwb-press-conference-qa + split: validation + type: chat_template + field_messages: messages + + sequence_len: 2048 + sample_packing: true + + adapter: lora + lora_dropout: 0.05 + + num_epochs: 3 + + bf16: true + tf32: true + flash_attention: true + gradient_checkpointing: true + + lora_mlp_kernel: false + lora_qkv_kernel: false + lora_o_kernel: false + + optimizer: adamw_torch_fused + lr_scheduler: cosine + warmup_ratio: 0.03 + weight_decay: 0.0 + + eval_strategy: "epoch" + + save_strategy: epoch + save_total_limit: 3 + + plugins: + - voice.finetune.callbacks.EvalCompletionsPlugin + + use_wandb: true + wandb_project: VOICE + wandb_entity: accelerate-science + + # Disable thinking + chat_template_kwargs: + enable_thinking: false diff --git a/configs/sweep/obama/bo_qwen3_14b_grpo.yaml b/configs/sweep/obama/bo_qwen3_14b_grpo.yaml new file mode 100644 index 0000000..536588c --- /dev/null +++ b/configs/sweep/obama/bo_qwen3_14b_grpo.yaml @@ -0,0 +1,71 @@ +sweep: + learning_rate: [1.0e-5, 5.0e-6] + trl.beta: [0.005, 0.01, 0.015] + + lora_r: [64] + micro_batch_size: [2] + gradient_accumulation_steps: [8] + + target_layers: + - name: mlp_attention + modules: [q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj] + +axolotl: + base_model: AccelerateScience/Qwen3-14B-bo-press-conference-sft-merged + rl: grpo + chat_template: tokenizer_default + chat_template_kwargs: + enable_thinking: false + + datasets: + - path: AccelerateScience/bo-press-conference-qa + type: voice._prompt_transform + + skip_prepare_dataset: true + val_set_size: 0.0 + hf_use_auth_token: true + + trl: + num_generations: 8 + max_completion_length: 1024 + temperature: 0.7 + top_p: 1.0 + top_k: 0 + repetition_penalty: 1.0 + reward_funcs: + - voice.rl.rewards.typicality_reward + reward_weights: + - 1.0 + + adapter: lora + lora_dropout: 0.0 + + sequence_len: 2048 + + num_epochs: 4 + + bf16: true + tf32: true + flash_attention: true + gradient_checkpointing: true + + optimizer: adamw_torch_fused + lr_scheduler: cosine + warmup_ratio: 0.03 + weight_decay: 0.0 + + eval_strategy: "no" + save_strategy: epoch + save_total_limit: 4 + + special_tokens: + pad_token: "<|eot_id|>" + + plugins: + - voice.finetune.callbacks.EvalCompletionsPlugin + - voice.finetune.callbacks.TypicalityRewardPlugin + + use_wandb: true + wandb_project: VOICE + wandb_entity: accelerate-science + logging_steps: 1 diff --git a/configs/sweep/obama/bo_qwen3_14b_sft.yaml b/configs/sweep/obama/bo_qwen3_14b_sft.yaml new file mode 100644 index 0000000..7875f27 --- /dev/null +++ b/configs/sweep/obama/bo_qwen3_14b_sft.yaml @@ -0,0 +1,76 @@ +sweep: + learning_rate: [1.0e-4, 2.0e-4, 3.0e-4, 5.0e-4, 7.0e-4, 1.03e-3] + lora_r: [16, 32, 64] + + micro_batch_size: [1,2,4] + gradient_accumulation_steps: [2] + + target_layers: + - name: mlp_attention + modules: [gate_proj, up_proj, down_proj, q_proj, k_proj, v_proj, o_proj] + - name: mlp + modules: [gate_proj, up_proj, down_proj] + +axolotl: + + base_model: Qwen/Qwen3-14B + + datasets: + - path: AccelerateScience/bo-press-conference-qa + type: chat_template + field_messages: messages + roles_to_train: + - assistant + train_on_eos: last + eot_tokens: ["<|eot_id|>"] + + special_tokens: + pad_token: "<|eot_id|>" + + hf_use_auth_token: true + + chat_template: tokenizer_default + val_set_size: 0.0 + + test_datasets: + - path: AccelerateScience/bo-press-conference-qa + split: validation + type: chat_template + field_messages: messages + + sequence_len: 2048 + sample_packing: true + + adapter: lora + lora_dropout: 0.05 + + num_epochs: 3 + + bf16: true + tf32: true + flash_attention: true + gradient_checkpointing: true + + lora_mlp_kernel: false + lora_qkv_kernel: false + lora_o_kernel: false + + optimizer: adamw_torch_fused + lr_scheduler: cosine + warmup_ratio: 0.03 + weight_decay: 0.0 + + eval_strategy: "epoch" + + save_strategy: epoch + save_total_limit: 3 + + plugins: + - voice.finetune.callbacks.EvalCompletionsPlugin + + use_wandb: true + wandb_project: VOICE + wandb_entity: accelerate-science + + chat_template_kwargs: + enable_thinking: false diff --git a/cspell/library-words.txt b/cspell/library-words.txt index 34c8f99..687eecf 100644 --- a/cspell/library-words.txt +++ b/cspell/library-words.txt @@ -26,3 +26,4 @@ venv cuda funcs pathlib +kwargs From c6374734d618fff31bcaa21f75ee7c5af0bf839e Mon Sep 17 00:00:00 2001 From: Fin Griffin Date: Thu, 2 Jul 2026 10:26:45 +0100 Subject: [PATCH 3/3] chore(docs): add example plot to README.md --- README.md | 5 + cspell/project-words.txt | 1 + docs/imgs/function_word_ratio.svg | 3385 +++++++++++++++++++++++++++++ 3 files changed, 3391 insertions(+) create mode 100644 docs/imgs/function_word_ratio.svg diff --git a/README.md b/README.md index a0ecb16..102fb61 100644 --- a/README.md +++ b/README.md @@ -47,6 +47,11 @@ VOICE is an NLP research toolkit for **stylometric style alignment**: fine-tunin - **Evaluation**: a calibrated alignment score $\mathcal{S}\in[0, 1]$ comparing model completions to a reference corpus using Wasserstein distance, normalised against within-author variation estimated from the training split via bootstrap resampling. Uncertainty estimates are provided via jackknife resampling. - **Fine-tuning**: a CLI for running LoRA experiments (single runs or hyperparameter sweeps) via [axolotl](https://github.com/axolotl-ai-cloud/axolotl), with style alignment scoring built in. Both supervised fine-tuning and GRPO are made available, with the latter using a custom *typicality reward* function. +
+ Function word ratio distribution comparison between model completions and reference corpus +

Example: Function word ratio distributions for base model and VOICE fine-tuned model completions vs. the reference corpus with the Wasserstein distance annotated.

+
+

(back to top)

--- diff --git a/cspell/project-words.txt b/cspell/project-words.txt index 5f4cff7..0b0c804 100644 --- a/cspell/project-words.txt +++ b/cspell/project-words.txt @@ -30,3 +30,4 @@ embs cdfs unprimed qwen +imgs diff --git a/docs/imgs/function_word_ratio.svg b/docs/imgs/function_word_ratio.svg new file mode 100644 index 0000000..21164a9 --- /dev/null +++ b/docs/imgs/function_word_ratio.svg @@ -0,0 +1,3385 @@ + + + + + + + + 2026-07-01T17:22:41.475658 + image/svg+xml + + + Matplotlib v3.10.8, https://matplotlib.org/ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +