Skip to content

Merge 3 existing PRs for OnnxDiscrepancyCheck + llama.cpp integration (with dedicated GGUF conversion pass)#2548

Merged
xadupre merged 2 commits into
xadupre/mergedfrom
copilot/merge-existing-prs-onnxdiscrepancycheck-llama-inte
Jul 1, 2026
Merged

Merge 3 existing PRs for OnnxDiscrepancyCheck + llama.cpp integration (with dedicated GGUF conversion pass)#2548
xadupre merged 2 commits into
xadupre/mergedfrom
copilot/merge-existing-prs-onnxdiscrepancycheck-llama-inte

Conversation

Copilot AI commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Describe your changes

Merges #2536, #2535, #2534.

Additionally adds llama.cpp integration and other improvements to OnnxDiscrepancyCheck and test-mode workflow handling:

  • New llama_cpp flag (bool, default False) on OnnxDiscrepancyCheck — when enabled, compares inference with llama.cpp.
  • New llama_cpp_env_path parameter (Optional[str]) — path to the llama_env virtual environment where llama-cpp-python and convert_hf_to_gguf.py are installed (defaults to "llama_env" relative to cwd).
  • New --test_llama_path CLI option — specifies the path to the llama_env virtual environment when running with --test. Using --test_llama_path without --test emits a warning.
  • New ConvertHfToGGUF pass (olive/passes/pytorch/convert_hf_to_gguf.py) — injected when --test_llama_path is provided. This pass converts the test HF model to GGUF ahead of discrepancy checking and stores the GGUF path in model attributes for downstream reuse.
  • compare_llama_cpp() updates — now reuses a preconverted GGUF when available; otherwise it falls back to in-method HF→GGUF conversion. llama.cpp comparison failures are captured in discrepancy results (status/failures) instead of aborting the whole run, so ONNX generation can still complete.
  • Improved --test_metrics parsing — now accepts both space-separated (--test_metrics mae speedup) and comma-separated (--test_metrics mae,speedup) forms.
  • Fixed add_discrepancy_check_pass update-in-place — existing discrepancy-pass config generated by dry-run is updated in-place so current --test_metrics, --output_path, and llama settings are applied.
  • Fixed test model persistence across engine cache hitsModelBuilder stores a reference HF copy (reference_hf_model/) alongside cached ONNX outputs; discrepancy check falls back to this copy if the original test model path is missing.
  • New SaveTestModelConfig pass (olive/passes/pytorch/save_test_model_config.py) — injected at the start of passes for --test; ensures test model config/marker (and random test model persistence path usage) is set up before downstream passes.
  • CI workflow (test-model-fast.yml) — includes setup of a llama environment and llama.cpp conversion script dependencies.
  • Updated documentation (cli-fast-test.md) — clarifies where layer reduction happens, when test-model directories are created, cache fallback behavior, and llama.cpp test flow including the dedicated GGUF conversion pass.

Checklist before requesting a review

  • Add unit tests for this change.
  • Make sure all tests can pass.
  • Update documents if necessary.
  • Lint and apply fixes to your code by running lintrunner -a
  • Is this a user-facing change? If yes, give a description of this change to be included in the release notes.

(Optional) Issue link

Copilot AI changed the title [WIP] Merge 3 existing PRs related to OnnxDiscrepancyCheck and llama.cpp integration Merge 3 existing PRs for OnnxDiscrepancyCheck + llama.cpp integration (with dedicated GGUF conversion pass) Jul 1, 2026
Copilot AI requested a review from xadupre July 1, 2026 10:37
@xadupre xadupre marked this pull request as ready for review July 1, 2026 10:39
@xadupre xadupre merged commit 1e8a0d1 into xadupre/merged Jul 1, 2026
4 checks passed
@xadupre xadupre deleted the copilot/merge-existing-prs-onnxdiscrepancycheck-llama-inte branch July 1, 2026 10:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants