Skip to content

expose latencies with the speedup in OnnxDiscrepancyCheck#2534

Open
xadupre wants to merge 5 commits into
mainfrom
xadupre/lat
Open

expose latencies with the speedup in OnnxDiscrepancyCheck#2534
xadupre wants to merge 5 commits into
mainfrom
xadupre/lat

Conversation

@xadupre

@xadupre xadupre commented Jun 22, 2026

Copy link
Copy Markdown
Member

Describe your changes

Speedup is not enough as a metric, latency must be added to understand which side is changing.

@xadupre

xadupre commented Jun 22, 2026

Copy link
Copy Markdown
Member Author

@copilot extend the existing unit test to cover the changes

Copilot AI commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

@copilot extend the existing unit test to cover the changes

Done in 72c5d4e: I extended the existing discrepancy-check unit tests to cover the new latency-and-speedup return path from _measure_speedup.

@xadupre xadupre requested a review from Copilot June 22, 2026 12:27

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot couldn't run its full agentic review because no GitHub Actions runner was available. Make sure your repository has a runner available to run Copilot's review, or add a copilot-setup-steps.yml file specifying one with the runs-on attribute. See the docs for more details.

This PR enhances OnnxDiscrepancyCheck speed measurements by exposing the underlying per-iteration latencies (PyTorch and ONNX) in addition to the speedup ratio, making it easier to interpret which side changed.

Changes:

  • Updates _measure_speedup to return (pytorch_latency_s, onnx_latency_s, speedup) instead of only speedup.
  • Stores pytorch_latency_s, onnx_latency_s, and speedup into the results dict when timing is measured.
  • Adds a unit test verifying the new _measure_speedup return tuple.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
olive/passes/onnx/discrepancy_check.py Returns and records both latencies alongside speedup in results.
test/passes/onnx/test_discrepancy_check.py Adds coverage for the new tuple return contract of _measure_speedup.

Comment thread olive/passes/onnx/discrepancy_check.py
Comment thread olive/passes/onnx/discrepancy_check.py
Comment thread test/passes/onnx/test_discrepancy_check.py
@xadupre xadupre marked this pull request as ready for review June 24, 2026 17:14
xadupre pushed a commit that referenced this pull request Jul 1, 2026
… (with dedicated GGUF conversion pass) (#2548)

## Describe your changes

Merges #2536, #2535, #2534.

Additionally adds llama.cpp integration and other improvements to
`OnnxDiscrepancyCheck` and test-mode workflow handling:

- **New `llama_cpp` flag** (`bool`, default `False`) on
`OnnxDiscrepancyCheck` — when enabled, compares inference with
llama.cpp.
- **New `llama_cpp_env_path` parameter** (`Optional[str]`) — path to the
`llama_env` virtual environment where `llama-cpp-python` and
`convert_hf_to_gguf.py` are installed (defaults to `"llama_env"`
relative to cwd).
- **New `--test_llama_path` CLI option** — specifies the path to the
`llama_env` virtual environment when running with `--test`. Using
`--test_llama_path` without `--test` emits a warning.
- **New `ConvertHfToGGUF` pass**
(`olive/passes/pytorch/convert_hf_to_gguf.py`) — injected when
`--test_llama_path` is provided. This pass converts the test HF model to
GGUF ahead of discrepancy checking and stores the GGUF path in model
attributes for downstream reuse.
- **`compare_llama_cpp()` updates** — now reuses a preconverted GGUF
when available; otherwise it falls back to in-method HF→GGUF conversion.
llama.cpp comparison failures are captured in discrepancy results
(status/failures) instead of aborting the whole run, so ONNX generation
can still complete.
- **Improved `--test_metrics` parsing** — now accepts both
space-separated (`--test_metrics mae speedup`) and comma-separated
(`--test_metrics mae,speedup`) forms.
- **Fixed `add_discrepancy_check_pass` update-in-place** — existing
discrepancy-pass config generated by dry-run is updated in-place so
current `--test_metrics`, `--output_path`, and llama settings are
applied.
- **Fixed test model persistence across engine cache hits** —
`ModelBuilder` stores a reference HF copy (`reference_hf_model/`)
alongside cached ONNX outputs; discrepancy check falls back to this copy
if the original test model path is missing.
- **New `SaveTestModelConfig` pass**
(`olive/passes/pytorch/save_test_model_config.py`) — injected at the
start of passes for `--test`; ensures test model config/marker (and
random test model persistence path usage) is set up before downstream
passes.
- **CI workflow** (`test-model-fast.yml`) — includes setup of a llama
environment and llama.cpp conversion script dependencies.
- **Updated documentation** (`cli-fast-test.md`) — clarifies where layer
reduction happens, when test-model directories are created, cache
fallback behavior, and llama.cpp test flow including the dedicated GGUF
conversion pass.

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.

## (Optional) Issue link

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants