Merge 3 existing PRs for OnnxDiscrepancyCheck + llama.cpp integration (with dedicated GGUF conversion pass) by Copilot · Pull Request #2548 · microsoft/Olive

Copilot · 2026-07-01T10:13:24Z

Describe your changes

Additionally adds llama.cpp integration and other improvements to OnnxDiscrepancyCheck and test-mode workflow handling:

New llama_cpp flag (bool, default False) on OnnxDiscrepancyCheck — when enabled, compares inference with llama.cpp.
New llama_cpp_env_path parameter (Optional[str]) — path to the llama_env virtual environment where llama-cpp-python and convert_hf_to_gguf.py are installed (defaults to "llama_env" relative to cwd).
New --test_llama_path CLI option — specifies the path to the llama_env virtual environment when running with --test. Using --test_llama_path without --test emits a warning.
New ConvertHfToGGUF pass (olive/passes/pytorch/convert_hf_to_gguf.py) — injected when --test_llama_path is provided. This pass converts the test HF model to GGUF ahead of discrepancy checking and stores the GGUF path in model attributes for downstream reuse.
compare_llama_cpp() updates — now reuses a preconverted GGUF when available; otherwise it falls back to in-method HF→GGUF conversion. llama.cpp comparison failures are captured in discrepancy results (status/failures) instead of aborting the whole run, so ONNX generation can still complete.
Improved --test_metrics parsing — now accepts both space-separated (--test_metrics mae speedup) and comma-separated (--test_metrics mae,speedup) forms.
Fixed add_discrepancy_check_pass update-in-place — existing discrepancy-pass config generated by dry-run is updated in-place so current --test_metrics, --output_path, and llama settings are applied.
Fixed test model persistence across engine cache hits — ModelBuilder stores a reference HF copy (reference_hf_model/) alongside cached ONNX outputs; discrepancy check falls back to this copy if the original test model path is missing.
New SaveTestModelConfig pass (olive/passes/pytorch/save_test_model_config.py) — injected at the start of passes for --test; ensures test model config/marker (and random test model persistence path usage) is set up before downstream passes.
CI workflow (test-model-fast.yml) — includes setup of a llama environment and llama.cpp conversion script dependencies.
Updated documentation (cli-fast-test.md) — clarifies where layer reduction happens, when test-model directories are created, cache fallback behavior, and llama.cpp test flow including the dedicated GGUF conversion pass.

Checklist before requesting a review

Add unit tests for this change.
Make sure all tests can pass.
Update documents if necessary.
Lint and apply fixes to your code by running lintrunner -a
Is this a user-facing change? If yes, give a description of this change to be included in the release notes.

(Optional) Issue link

Initial plan

4403ed3

Copilot AI assigned Copilot and xadupre Jul 1, 2026

Copilot AI mentioned this pull request Jul 1, 2026

Merge 3 existing PR related to OnnxDiscrepancyCheck + llama.cpp integration #2546

Open

5 tasks

Copilot started work on behalf of xadupre July 1, 2026 10:14 View session

Add ConvertHfToGGUF pass and integrate llama test flow

7b0afa6

Copilot AI changed the title ~~[WIP] Merge 3 existing PRs related to OnnxDiscrepancyCheck and llama.cpp integration~~ Merge 3 existing PRs for OnnxDiscrepancyCheck + llama.cpp integration (with dedicated GGUF conversion pass) Jul 1, 2026

Copilot AI requested a review from xadupre July 1, 2026 10:37

Copilot finished work on behalf of xadupre July 1, 2026 10:37

xadupre marked this pull request as ready for review July 1, 2026 10:39

xadupre merged commit 1e8a0d1 into xadupre/merged Jul 1, 2026
4 checks passed

xadupre deleted the copilot/merge-existing-prs-onnxdiscrepancycheck-llama-inte branch July 1, 2026 10:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Merge 3 existing PRs for OnnxDiscrepancyCheck + llama.cpp integration (with dedicated GGUF conversion pass)#2548

Merge 3 existing PRs for OnnxDiscrepancyCheck + llama.cpp integration (with dedicated GGUF conversion pass)#2548
xadupre merged 2 commits into
xadupre/mergedfrom
copilot/merge-existing-prs-onnxdiscrepancycheck-llama-inte

Copilot AI commented Jul 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Copilot AI commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your changes

Checklist before requesting a review

(Optional) Issue link

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Jul 1, 2026 •

edited

Loading