acceleratescience · fingriffin · Jul 2, 2026 · Jul 2, 2026 · Jul 2, 2026 · Jul 2, 2026
diff --git a/.gitignore b/.gitignore
@@ -75,6 +75,12 @@ configs/sweep/**
 !configs/single/example.yaml
 !configs/single/test.yaml
 !configs/sweep/example.yaml
-
+!configs/sweep/obama/
+!configs/sweep/obama/bo_qwen3_14b_grpo.yaml
+!configs/sweep/obama/bo_qwen3_14b_sft.yaml
+!configs/sweep/bush/
+!configs/sweep/bush/gwb_qwen3_14b_grpo.yaml
+!configs/sweep/bush/gwb_qwen3_14b_sft.yaml
+w
 # Paper write up
 paper/**
diff --git a/README.md b/README.md
@@ -1,14 +1,168 @@
-# 🗣️ VOICE
+[![GPL-3 License](https://img.shields.io/badge/License-GPLv3-brightgreen.svg)](https://opensource.org/licenses/GPL-3.0)
+[![Issues](https://img.shields.io/github/issues-raw/acceleratescience/voice.svg?maxAge=25000)](https://github.com/acceleratescience/voice/issues)
+[![GitHub contributors](https://img.shields.io/github/contributors/acceleratescience/voice.svg?style=flat)]()
+[![GitHub pull requests](https://img.shields.io/github/issues-pr/acceleratescience/voice.svg?style=flat)]()
+[![PR's Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg?style=flat)](http://makeapullrequest.com)
+<br>
+[![GitHub stars](https://img.shields.io/github/stars/acceleratescience/voice.svg?style=social&label=Star)]()
+[![GitHub watchers](https://img.shields.io/github/watchers/acceleratescience/voice.svg?style=social&label=Watch)]()
+[![GitHub forks](https://img.shields.io/github/forks/acceleratescience/voice.svg?style=social&label=Fork)]()
 
-...
+<br />
+<div align="center">
+  <h2>🗣️VOICE</h2>
+  <p align="justify">
+    Research software for fine-tuning large language models to match a target author's writing style,
+    combining a calibrated stylometric evaluation suite with LoRA supervised fine-tuning and GRPO reinforcement learning.
+  </p>
+  <p align="center">
+    <a href="docs/">Documentation</a>
+    ·
+    <a href="https://github.com/acceleratescience/voice/issues">Report Bug</a>
+    ·
+    <a href="https://github.com/acceleratescience/voice/issues">Request Feature</a>
+  </p>
+</div>
+
+<details>
+  <summary>Table of Contents</summary>
+  <ol>
+    <li><a href="#overview">Overview</a></li>
+    <li><a href="#documentation">Documentation</a></li>
+    <li><a href="#installation">Installation</a></li>
+    <li><a href="#quick-start">Quick Start</a></li>
+    <li><a href="#python-api">Python API</a></li>
+    <li><a href="#contributing">Contributing</a></li>
+    <li><a href="#license">License</a></li>
+  </ol>
+</details>
+
+---
+
+## Overview
+
+VOICE is an NLP research toolkit for **stylometric style alignment**: fine-tuning a large language model so its outputs are stylistically consistent with a target author. The toolkit provides three integrated components:
+
+- **Stylometry**: a suite of surface writing style metrics (word length moments, vocabulary richness, function word frequency, character n-gram diversity) organised into four metric groups.
+- **Evaluation**: a calibrated alignment score $\mathcal{S}\in[0, 1]$ comparing model completions to a reference corpus using Wasserstein distance, normalised against within-author variation estimated from the training split via bootstrap resampling. Uncertainty estimates are provided via jackknife resampling.
+- **Fine-tuning**: a CLI for running LoRA experiments (single runs or hyperparameter sweeps) via [axolotl](https://github.com/axolotl-ai-cloud/axolotl), with style alignment scoring built in. Both supervised fine-tuning and GRPO are made available, with the latter using a custom *typicality reward* function.
+
+<div align="center">
+  <img src="docs/imgs/function_word_ratio.svg" alt="Function word ratio distribution comparison between model completions and reference corpus" width="50%">
+  <p><em>Example: Function word ratio distributions for base model and VOICE fine-tuned model completions vs. the reference corpus with the Wasserstein distance annotated.</em></p>
+</div>
+
+<p align="right">(<a href="#top">back to top</a>)</p>
 
 ---
 
 ## Documentation
 
 ```
 docs/
-├── 00_data.md         — Included datasets
-├── 01_evals.md        — Evaluation suite: scoring and interpretation
-└── 02_stylometry.md   — Stylometric metrics: definition and catalogue
+├── 00_data.md          - Datasets: format and Hugging Face references
+├── 01_stylometry.md    - Stylometric metrics: definitions and catalogue
+├── 02_evals.md         - Evaluation suite: scoring methodology and API
+├── 03_rl.md            - Reward functions for GRPO training
+└── 04_cli.md           - Fine-tuning CLI: single runs and sweeps
+```
+
+<p align="right">(<a href="#top">back to top</a>)</p>
+
+---
+
+## Installation
+
+VOICE requires Python 3.12. Training functionality requires Linux (*pinned axolotl version is Linux only*); the evaluation and stylometry components run on all platforms.
+
+Using [uv](https://github.com/astral-sh/uv) (recommended):
+
+```bash
+git clone https://github.com/acceleratescience/voice
+cd voice
+uv sync
+source .venv/bin/activate
+```
+
+Authenticate with Hugging Face before running fine-tuning jobs:
+
+```bash
+huggingface-cli login
+```
+
+<p align="right">(<a href="#top">back to top</a>)</p>
+
+---
+
+## Quick Start
+
+Run a single fine-tuning job:
+
+```bash
+voice finetune single configs/single/example.yaml
 ```
+
+This trains a LoRA adapter on top of Llama-3.1-8B-Instruct and writes per-epoch completions and alignment scores to `runs/{run_name}/`.
+
+For a hyperparameter sweep:
+
+```bash
+voice finetune sweep configs/sweep/example.yaml
+```
+
+See [docs/04_cli.md](docs/04_cli.md) for the full CLI reference, config format and output layout.
+
+<p align="right">(<a href="#top">back to top</a>)</p>
+
+---
+
+## Python API
+
+The evaluation suite can be used independently of the CLI:
+
+```python
+from voice import get_dataset, make_comparison, DatasetSpec
+from voice.datasets._schema import Split
+
+ds = get_dataset(
+    DatasetSpec(
+    repo_id="AccelerateScience/bo-press-conference-qa",
+    splits=(Split.TRAIN, Split.VALIDATION, Split.TEST),
+    )
+)
+
+# completions: list[Example] — model outputs on the same prompts as ds.validation
+results = make_comparison(completions, ds)
+
+print(results.score)           # overall alignment score in [0, 1]
+print(results.group_tails)     # per-group breakdown
+print(results.score_ci())      # 90% jackknife confidence interval
+```
+
+See [docs/02_evals.md](docs/02_evals.md) for the full scoring methodology and available diagnostic fields.
+
+<p align="right">(<a href="#top">back to top</a>)</p>
+
+---
+
+## Contributing
+
+Contributions are welcome. To propose a change:
+
+1. Fork the repository
+2. Create a feature branch (`git checkout -b feature/my-change`)
+3. Commit your changes (`git commit -m 'Add my change'`)
+4. Push to the branch (`git push origin feature/my-change`)
+5. Open a pull request
+
+Please raise an issue first for substantial changes.
+
+<p align="right">(<a href="#top">back to top</a>)</p>
+
+---
+
+## License
+
+Distributed under the GNU General Public License. See `LICENSE` for details.
+
+<p align="right">(<a href="#top">back to top</a>)</p>
diff --git a/configs/sweep/bush/gwb_qwen3_14b_grpo.yaml b/configs/sweep/bush/gwb_qwen3_14b_grpo.yaml
@@ -0,0 +1,71 @@
+sweep:
+  learning_rate: [1.0e-5, 5.0e-6, 1.0e-6]
+  trl.beta: [0.01, 0.005, 0.015]
+
+  lora_r: [32]
+  micro_batch_size: [2]
+  gradient_accumulation_steps: [8]
+
+  target_layers:
+    - name: mlp_attention
+      modules: [q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj]
+
+axolotl:
+  base_model: AccelerateScience/Qwen3-14B-gwb-press-conference-sft-02-merged
+  rl: grpo
+  chat_template: tokenizer_default
+  chat_template_kwargs:
+    enable_thinking: false
+
+  datasets:
+    - path: AccelerateScience/gwb-press-conference-qa
+      type: voice._prompt_transform
+
+  skip_prepare_dataset: true
+  val_set_size: 0.0
+  hf_use_auth_token: true
+
+  trl:
+    num_generations: 8
+    max_completion_length: 1024
+    temperature: 0.7
+    top_p: 1.0
+    top_k: 0
+    repetition_penalty: 1.0
+    reward_funcs:
+      - voice.rl.rewards.typicality_reward
+    reward_weights:
+      - 1.0
+
+  adapter: lora
+  lora_dropout: 0.0
+
+  sequence_len: 2048
+
+  num_epochs: 4
+
+  bf16: true
+  tf32: true
+  flash_attention: true
+  gradient_checkpointing: true
+
+  optimizer: adamw_torch_fused
+  lr_scheduler: cosine
+  warmup_ratio: 0.03
+  weight_decay: 0.0
+
+  eval_strategy: "no"
+  save_strategy: epoch
+  save_total_limit: 4
+
+  special_tokens:
+    pad_token: "<|eot_id|>"
+
+  plugins:
+    - voice.finetune.callbacks.EvalCompletionsPlugin
+    - voice.finetune.callbacks.TypicalityRewardPlugin
+
+  use_wandb: true
+  wandb_project: VOICE
+  wandb_entity: accelerate-science
+  logging_steps: 1
diff --git a/configs/sweep/bush/gwb_qwen3_14b_sft.yaml b/configs/sweep/bush/gwb_qwen3_14b_sft.yaml
@@ -0,0 +1,79 @@
+sweep:
+  learning_rate: [5.0e-4, 3.0e-4, 2.0e-4, 1e-4]
+  lora_r: [2, 4, 8, 16, 32]
+
+  micro_batch_size: [2, 4, 8]
+  gradient_accumulation_steps: [1]
+
+  target_layers:
+    - name: mlp
+      modules: [gate_proj, up_proj, down_proj]
+    - name: attention
+      modules: [q_proj, k_proj, v_proj, o_proj]
+    - name: mlp_attention
+      modules: [gate_proj, up_proj, down_proj, q_proj, k_proj, v_proj, o_proj]
+
+axolotl:
+
+  base_model: Qwen/Qwen3-14B
+
+  datasets:
+    - path: AccelerateScience/gwb-press-conference-qa
+      type: chat_template
+      field_messages: messages
+      roles_to_train:
+        - assistant
+      train_on_eos: last
+      eot_tokens: ["<|eot_id|>"]
+
+  special_tokens:
+    pad_token: "<|eot_id|>"
+
+  hf_use_auth_token: true
+
+  chat_template: tokenizer_default
+  val_set_size: 0.0
+
+  test_datasets:
+    - path: AccelerateScience/gwb-press-conference-qa
+      split: validation
+      type: chat_template
+      field_messages: messages
+
+  sequence_len: 2048
+  sample_packing: true
+
+  adapter: lora
+  lora_dropout: 0.05
+
+  num_epochs: 3
+
+  bf16: true
+  tf32: true
+  flash_attention: true
+  gradient_checkpointing: true
+
+  lora_mlp_kernel: false
+  lora_qkv_kernel: false
+  lora_o_kernel: false
+
+  optimizer: adamw_torch_fused
+  lr_scheduler: cosine
+  warmup_ratio: 0.03
+  weight_decay: 0.0
+
+  eval_strategy: "epoch"
+
+  save_strategy: epoch
+  save_total_limit: 3
+
+  plugins:
+    - voice.finetune.callbacks.EvalCompletionsPlugin
+
+  use_wandb: true
+  wandb_project: VOICE
+  wandb_entity: accelerate-science
+
+  # Disable thinking
+  chat_template_kwargs:
+    enable_thinking: false