Skip to content

add time to first token in OnnxDiscrepancyCheck#2535

Open
xadupre wants to merge 6 commits into
mainfrom
xadupre/tts
Open

add time to first token in OnnxDiscrepancyCheck#2535
xadupre wants to merge 6 commits into
mainfrom
xadupre/tts

Conversation

@xadupre

@xadupre xadupre commented Jun 22, 2026

Copy link
Copy Markdown
Member

Describe your changes

  • add metrics time to first token, time to first 5 tokens in OnnxDiscrepancyCheck

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot couldn't run its full agentic review because no GitHub Actions runner was available. Make sure your repository has a runner available to run Copilot's review, or add a copilot-setup-steps.yml file specifying one with the runs-on attribute. See the docs for more details.

Adds time-to-first-token and time-to-first-N-tokens latency metrics to OnnxDiscrepancyCheck’s generation comparison output, and updates tests accordingly.

Changes:

  • Introduces a new time_to_first_n_tokens config option and surfaces latency metrics for both Transformers and ORT GenAI generation.
  • Changes compare_generation to return a results dictionary (instead of an int) and updates result aggregation/logging.
  • Updates ONNX discrepancy check tests to validate the new return shape and metric keys.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
test/passes/onnx/test_discrepancy_check.py Adjusts assertions for the new dict return type and validates presence of new latency metric fields.
olive/passes/onnx/discrepancy_check.py Adds latency measurement/reporting for TTFT/TTFN and changes generation comparison to return structured results.

Comment thread olive/passes/onnx/discrepancy_check.py Outdated
Comment thread olive/passes/onnx/discrepancy_check.py
Comment thread olive/passes/onnx/discrepancy_check.py Outdated
Comment thread test/passes/onnx/test_discrepancy_check.py
@xadupre xadupre marked this pull request as ready for review June 22, 2026 14:36
xadupre pushed a commit that referenced this pull request Jul 1, 2026
… (with dedicated GGUF conversion pass) (#2548)

## Describe your changes

Merges #2536, #2535, #2534.

Additionally adds llama.cpp integration and other improvements to
`OnnxDiscrepancyCheck` and test-mode workflow handling:

- **New `llama_cpp` flag** (`bool`, default `False`) on
`OnnxDiscrepancyCheck` — when enabled, compares inference with
llama.cpp.
- **New `llama_cpp_env_path` parameter** (`Optional[str]`) — path to the
`llama_env` virtual environment where `llama-cpp-python` and
`convert_hf_to_gguf.py` are installed (defaults to `"llama_env"`
relative to cwd).
- **New `--test_llama_path` CLI option** — specifies the path to the
`llama_env` virtual environment when running with `--test`. Using
`--test_llama_path` without `--test` emits a warning.
- **New `ConvertHfToGGUF` pass**
(`olive/passes/pytorch/convert_hf_to_gguf.py`) — injected when
`--test_llama_path` is provided. This pass converts the test HF model to
GGUF ahead of discrepancy checking and stores the GGUF path in model
attributes for downstream reuse.
- **`compare_llama_cpp()` updates** — now reuses a preconverted GGUF
when available; otherwise it falls back to in-method HF→GGUF conversion.
llama.cpp comparison failures are captured in discrepancy results
(status/failures) instead of aborting the whole run, so ONNX generation
can still complete.
- **Improved `--test_metrics` parsing** — now accepts both
space-separated (`--test_metrics mae speedup`) and comma-separated
(`--test_metrics mae,speedup`) forms.
- **Fixed `add_discrepancy_check_pass` update-in-place** — existing
discrepancy-pass config generated by dry-run is updated in-place so
current `--test_metrics`, `--output_path`, and llama settings are
applied.
- **Fixed test model persistence across engine cache hits** —
`ModelBuilder` stores a reference HF copy (`reference_hf_model/`)
alongside cached ONNX outputs; discrepancy check falls back to this copy
if the original test model path is missing.
- **New `SaveTestModelConfig` pass**
(`olive/passes/pytorch/save_test_model_config.py`) — injected at the
start of passes for `--test`; ensures test model config/marker (and
random test model persistence path usage) is set up before downstream
passes.
- **CI workflow** (`test-model-fast.yml`) — includes setup of a llama
environment and llama.cpp conversion script dependencies.
- **Updated documentation** (`cli-fast-test.md`) — clarifies where layer
reduction happens, when test-model directories are created, cache
fallback behavior, and llama.cpp test flow including the dedicated GGUF
conversion pass.

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.

## (Optional) Issue link

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants