extend command line --test to trigger speedup measure#2536
Conversation
|
@copilot can you documentation the option somewhere |
Added documentation for the new |
There was a problem hiding this comment.
Pull request overview
This PR extends Olive’s CLI --test workflow by allowing callers to choose which discrepancy-check metrics are evaluated (accuracy via MAE and/or speedup via latency timing), and adds coverage + documentation for the new option.
Changes:
- Add
--test_metricsCLI option (choices:mae,speedup) and plumb it througholive runand the base CLI workflow runner. - Update discrepancy-check pass injection to enable/disable MAE threshold enforcement and speedup timing based on selected metrics.
- Add unit tests for metric selection behavior and warning when
--test_metricsis provided without--test, plus docs describing usage.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| test/cli/test_base.py | Adds tests for metric-selection behavior and unused-flag warning behavior. |
| olive/cli/run.py | Passes test_metrics into discrepancy-check injection and warns when --test_metrics is unused. |
| olive/cli/base.py | Introduces TEST_METRICS, --test_metrics arg, warning helper, and metric-aware discrepancy pass injection. |
| docs/source/how-to/cli/cli-fast-test.md | Documents --test_metrics and provides an example for speedup-only runs. |
|
@copilot test/cli/test_base.py:360: in test_add_discrepancy_check_pass_default_enables_all_metrics |
Fixed in the latest commit. The test was asserting the old behavior (default = all metrics, so |
|
@copilot fix test/cli/test_cli.py:178: in test_workflow_run_command_with_test_override |
Fixed in the latest commit. Added |
… (with dedicated GGUF conversion pass) (#2548) ## Describe your changes Merges #2536, #2535, #2534. Additionally adds llama.cpp integration and other improvements to `OnnxDiscrepancyCheck` and test-mode workflow handling: - **New `llama_cpp` flag** (`bool`, default `False`) on `OnnxDiscrepancyCheck` — when enabled, compares inference with llama.cpp. - **New `llama_cpp_env_path` parameter** (`Optional[str]`) — path to the `llama_env` virtual environment where `llama-cpp-python` and `convert_hf_to_gguf.py` are installed (defaults to `"llama_env"` relative to cwd). - **New `--test_llama_path` CLI option** — specifies the path to the `llama_env` virtual environment when running with `--test`. Using `--test_llama_path` without `--test` emits a warning. - **New `ConvertHfToGGUF` pass** (`olive/passes/pytorch/convert_hf_to_gguf.py`) — injected when `--test_llama_path` is provided. This pass converts the test HF model to GGUF ahead of discrepancy checking and stores the GGUF path in model attributes for downstream reuse. - **`compare_llama_cpp()` updates** — now reuses a preconverted GGUF when available; otherwise it falls back to in-method HF→GGUF conversion. llama.cpp comparison failures are captured in discrepancy results (status/failures) instead of aborting the whole run, so ONNX generation can still complete. - **Improved `--test_metrics` parsing** — now accepts both space-separated (`--test_metrics mae speedup`) and comma-separated (`--test_metrics mae,speedup`) forms. - **Fixed `add_discrepancy_check_pass` update-in-place** — existing discrepancy-pass config generated by dry-run is updated in-place so current `--test_metrics`, `--output_path`, and llama settings are applied. - **Fixed test model persistence across engine cache hits** — `ModelBuilder` stores a reference HF copy (`reference_hf_model/`) alongside cached ONNX outputs; discrepancy check falls back to this copy if the original test model path is missing. - **New `SaveTestModelConfig` pass** (`olive/passes/pytorch/save_test_model_config.py`) — injected at the start of passes for `--test`; ensures test model config/marker (and random test model persistence path usage) is set up before downstream passes. - **CI workflow** (`test-model-fast.yml`) — includes setup of a llama environment and llama.cpp conversion script dependencies. - **Updated documentation** (`cli-fast-test.md`) — clarifies where layer reduction happens, when test-model directories are created, cache fallback behavior, and llama.cpp test flow including the dedicated GGUF conversion pass. ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Lint and apply fixes to your code by running `lintrunner -a` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Describe your changes
--test only enables discrepancy measure by default, it needs to be extended to let the user add speedup measures