Merge 3 existing PRs for OnnxDiscrepancyCheck + llama.cpp integration (with dedicated GGUF conversion pass)#2548
Merged
xadupre merged 2 commits intoJul 1, 2026
Conversation
5 tasks
Copilot
AI
changed the title
[WIP] Merge 3 existing PRs related to OnnxDiscrepancyCheck and llama.cpp integration
Merge 3 existing PRs for OnnxDiscrepancyCheck + llama.cpp integration (with dedicated GGUF conversion pass)
Jul 1, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Describe your changes
Merges #2536, #2535, #2534.
Additionally adds llama.cpp integration and other improvements to
OnnxDiscrepancyCheckand test-mode workflow handling:llama_cppflag (bool, defaultFalse) onOnnxDiscrepancyCheck— when enabled, compares inference with llama.cpp.llama_cpp_env_pathparameter (Optional[str]) — path to thellama_envvirtual environment wherellama-cpp-pythonandconvert_hf_to_gguf.pyare installed (defaults to"llama_env"relative to cwd).--test_llama_pathCLI option — specifies the path to thellama_envvirtual environment when running with--test. Using--test_llama_pathwithout--testemits a warning.ConvertHfToGGUFpass (olive/passes/pytorch/convert_hf_to_gguf.py) — injected when--test_llama_pathis provided. This pass converts the test HF model to GGUF ahead of discrepancy checking and stores the GGUF path in model attributes for downstream reuse.compare_llama_cpp()updates — now reuses a preconverted GGUF when available; otherwise it falls back to in-method HF→GGUF conversion. llama.cpp comparison failures are captured in discrepancy results (status/failures) instead of aborting the whole run, so ONNX generation can still complete.--test_metricsparsing — now accepts both space-separated (--test_metrics mae speedup) and comma-separated (--test_metrics mae,speedup) forms.add_discrepancy_check_passupdate-in-place — existing discrepancy-pass config generated by dry-run is updated in-place so current--test_metrics,--output_path, and llama settings are applied.ModelBuilderstores a reference HF copy (reference_hf_model/) alongside cached ONNX outputs; discrepancy check falls back to this copy if the original test model path is missing.SaveTestModelConfigpass (olive/passes/pytorch/save_test_model_config.py) — injected at the start of passes for--test; ensures test model config/marker (and random test model persistence path usage) is set up before downstream passes.test-model-fast.yml) — includes setup of a llama environment and llama.cpp conversion script dependencies.cli-fast-test.md) — clarifies where layer reduction happens, when test-model directories are created, cache fallback behavior, and llama.cpp test flow including the dedicated GGUF conversion pass.Checklist before requesting a review
lintrunner -a(Optional) Issue link