expose latencies with the speedup in OnnxDiscrepancyCheck by xadupre · Pull Request #2534 · microsoft/Olive

xadupre · 2026-06-22T11:47:08Z

Describe your changes

Speedup is not enough as a metric, latency must be added to understand which side is changing.

xadupre · 2026-06-22T11:47:47Z

@copilot extend the existing unit test to cover the changes

Copilot · 2026-06-22T11:58:49Z

@copilot extend the existing unit test to cover the changes

Done in 72c5d4e: I extended the existing discrepancy-check unit tests to cover the new latency-and-speedup return path from _measure_speedup.

Copilot

Pull request overview

Note

Copilot couldn't run its full agentic review because no GitHub Actions runner was available. Make sure your repository has a runner available to run Copilot's review, or add a copilot-setup-steps.yml file specifying one with the runs-on attribute. See the docs for more details.

This PR enhances OnnxDiscrepancyCheck speed measurements by exposing the underlying per-iteration latencies (PyTorch and ONNX) in addition to the speedup ratio, making it easier to interpret which side changed.

Changes:

Updates _measure_speedup to return (pytorch_latency_s, onnx_latency_s, speedup) instead of only speedup.
Stores pytorch_latency_s, onnx_latency_s, and speedup into the results dict when timing is measured.
Adds a unit test verifying the new _measure_speedup return tuple.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
`olive/passes/onnx/discrepancy_check.py`	Returns and records both latencies alongside speedup in results.
`test/passes/onnx/test_discrepancy_check.py`	Adds coverage for the new tuple return contract of `_measure_speedup`.

… (with dedicated GGUF conversion pass) (#2548) ## Describe your changes Merges #2536, #2535, #2534. Additionally adds llama.cpp integration and other improvements to `OnnxDiscrepancyCheck` and test-mode workflow handling: - **New `llama_cpp` flag** (`bool`, default `False`) on `OnnxDiscrepancyCheck` — when enabled, compares inference with llama.cpp. - **New `llama_cpp_env_path` parameter** (`Optional[str]`) — path to the `llama_env` virtual environment where `llama-cpp-python` and `convert_hf_to_gguf.py` are installed (defaults to `"llama_env"` relative to cwd). - **New `--test_llama_path` CLI option** — specifies the path to the `llama_env` virtual environment when running with `--test`. Using `--test_llama_path` without `--test` emits a warning. - **New `ConvertHfToGGUF` pass** (`olive/passes/pytorch/convert_hf_to_gguf.py`) — injected when `--test_llama_path` is provided. This pass converts the test HF model to GGUF ahead of discrepancy checking and stores the GGUF path in model attributes for downstream reuse. - **`compare_llama_cpp()` updates** — now reuses a preconverted GGUF when available; otherwise it falls back to in-method HF→GGUF conversion. llama.cpp comparison failures are captured in discrepancy results (status/failures) instead of aborting the whole run, so ONNX generation can still complete. - **Improved `--test_metrics` parsing** — now accepts both space-separated (`--test_metrics mae speedup`) and comma-separated (`--test_metrics mae,speedup`) forms. - **Fixed `add_discrepancy_check_pass` update-in-place** — existing discrepancy-pass config generated by dry-run is updated in-place so current `--test_metrics`, `--output_path`, and llama settings are applied. - **Fixed test model persistence across engine cache hits** — `ModelBuilder` stores a reference HF copy (`reference_hf_model/`) alongside cached ONNX outputs; discrepancy check falls back to this copy if the original test model path is missing. - **New `SaveTestModelConfig` pass** (`olive/passes/pytorch/save_test_model_config.py`) — injected at the start of passes for `--test`; ensures test model config/marker (and random test model persistence path usage) is set up before downstream passes. - **CI workflow** (`test-model-fast.yml`) — includes setup of a llama environment and llama.cpp conversion script dependencies. - **Updated documentation** (`cli-fast-test.md`) — clarifies where layer reduction happens, when test-model directories are created, cache fallback behavior, and llama.cpp test flow including the dedicated GGUF conversion pass. ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Lint and apply fixes to your code by running `lintrunner -a` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>

expose latencies with the speedup in OnnxDiscrepancyCheck

b21b4f6

Copilot started work on behalf of xadupre June 22, 2026 11:51 View session

Extend discrepancy check unit test for latency tuple

72c5d4e

Copilot finished work on behalf of xadupre June 22, 2026 11:59

xadupre requested a review from Copilot June 22, 2026 12:27

Copilot AI reviewed Jun 22, 2026

View reviewed changes

Comment thread olive/passes/onnx/discrepancy_check.py

Comment thread olive/passes/onnx/discrepancy_check.py

Comment thread test/passes/onnx/test_discrepancy_check.py

Copilot started reviewing on behalf of xadupre June 22, 2026 12:34 View session

Copilot started work on behalf of xadupre June 22, 2026 12:42 View session

Add return type annotation to _measure_speedup

804bb92

Copilot finished work on behalf of xadupre June 22, 2026 12:43

xadupre marked this pull request as ready for review June 24, 2026 17:14

xadupre added 2 commits June 25, 2026 09:45

Merge branch 'main' into xadupre/lat

d34ecd5

Merge branch 'main' into xadupre/lat

922089a

xadupre mentioned this pull request Jun 29, 2026

Merge 3 existing PR related to OnnxDiscrepancyCheck + llama.cpp integration #2546

Open

5 tasks

Copilot AI mentioned this pull request Jul 1, 2026

Merge 3 existing PRs for OnnxDiscrepancyCheck + llama.cpp integration (with dedicated GGUF conversion pass) #2548

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

expose latencies with the speedup in OnnxDiscrepancyCheck#2534

expose latencies with the speedup in OnnxDiscrepancyCheck#2534
xadupre wants to merge 5 commits into
mainfrom
xadupre/lat

xadupre commented Jun 22, 2026

Uh oh!

xadupre commented Jun 22, 2026

Uh oh!

Copilot AI commented Jun 22, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

xadupre commented Jun 22, 2026

Describe your changes

Uh oh!

xadupre commented Jun 22, 2026

Uh oh!

Copilot AI commented Jun 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants