Skip to content

extend command line --test to trigger speedup measure#2536

Open
xadupre wants to merge 7 commits into
mainfrom
xadupre/cmd
Open

extend command line --test to trigger speedup measure#2536
xadupre wants to merge 7 commits into
mainfrom
xadupre/cmd

Conversation

@xadupre

@xadupre xadupre commented Jun 22, 2026

Copy link
Copy Markdown
Member

Describe your changes

--test only enables discrepancy measure by default, it needs to be extended to let the user add speedup measures

@xadupre

xadupre commented Jun 22, 2026

Copy link
Copy Markdown
Member Author

@copilot can you documentation the option somewhere

Copilot AI commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

@copilot can you documentation the option somewhere

Added documentation for the new --test_metrics option (including speedup) in docs/source/how-to/cli/cli-fast-test.md in commit 15287d8.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends Olive’s CLI --test workflow by allowing callers to choose which discrepancy-check metrics are evaluated (accuracy via MAE and/or speedup via latency timing), and adds coverage + documentation for the new option.

Changes:

  • Add --test_metrics CLI option (choices: mae, speedup) and plumb it through olive run and the base CLI workflow runner.
  • Update discrepancy-check pass injection to enable/disable MAE threshold enforcement and speedup timing based on selected metrics.
  • Add unit tests for metric selection behavior and warning when --test_metrics is provided without --test, plus docs describing usage.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
test/cli/test_base.py Adds tests for metric-selection behavior and unused-flag warning behavior.
olive/cli/run.py Passes test_metrics into discrepancy-check injection and warns when --test_metrics is unused.
olive/cli/base.py Introduces TEST_METRICS, --test_metrics arg, warning helper, and metric-aware discrepancy pass injection.
docs/source/how-to/cli/cli-fast-test.md Documents --test_metrics and provides an example for speedup-only runs.

Comment thread olive/cli/base.py
@xadupre xadupre marked this pull request as ready for review June 22, 2026 15:29
@xadupre

xadupre commented Jun 22, 2026

Copy link
Copy Markdown
Member Author

@copilot test/cli/test_base.py:360: in test_add_discrepancy_check_pass_default_enables_all_metrics
assert "timing_iterations" not in pass_config
E AssertionError: assert 'timing_iterations' not in {'type': 'OnnxDiscrepancyCheck', 'reference_model_path': 'ref_model', 'report_output_dir': 'out_dir', 'max_mae': 0.1, ...}

Copilot AI commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

@copilot test/cli/test_base.py:360: in test_add_discrepancy_check_pass_default_enables_all_metrics
assert "timing_iterations" not in pass_config
E AssertionError: assert 'timing_iterations' no...

Fixed in the latest commit. The test was asserting the old behavior (default = all metrics, so timing_iterations absent). After the default changed to {"mae"} only, timing_iterations is set to 0. Updated the test name to test_add_discrepancy_check_pass_default_enables_mae_only and changed the assertion to assert pass_config["timing_iterations"] == 0.

@xadupre

xadupre commented Jun 22, 2026

Copy link
Copy Markdown
Member Author

@copilot fix test/cli/test_cli.py:178: in test_workflow_run_command_with_test_override
mock_run.assert_called_once_with(
/opt/hostedtoolcache/Python/3.12.13/x64/lib/python3.12/unittest/mock.py:961: in assert_called_once_with
return self.assert_called_with(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
/opt/hostedtoolcache/Python/3.12.13/x64/lib/python3.12/unittest/mock.py:949: in assert_called_with
raise AssertionError(_error_message()) from cause
E AssertionError: expected call not found.
E Expected: run({'input_model': {'type': 'HfModel', 'model_path': 'hf-internal-testing/tiny-random-LlamaForCausalLM', 'load_kwargs': {'attn_implementation': 'eager', 'trust_remote_code': False}, 'test_model_config': {'hidden_layers': 2}, 'test_model_path': '/mnt/vss/_work/1/.pytest_basetemp/test_workflow_run_command_with1/output/test_model'}, 'output_dir': '/mnt/vss/_work/1/.pytest_basetemp/test_workflow_run_command_with1/output', 'passes': {'discrepancy_check': {'type': 'OnnxDiscrepancyCheck', 'reference_model_path': '/mnt/vss/_work/1/.pytest_basetemp/test_workflow_run_command_with1/output/test_model', 'max_mae': 0.1, 'report_output_dir': '/mnt/vss/_work/1/.pytest_basetemp/test_workflow_run_command_with1/output'}}}, list_required_packages=False, package_config=None, tempdir=None)
E Actual: run({'input_model': {'type': 'HfModel', 'model_path': 'hf-internal-testing/tiny-random-LlamaForCausalLM', 'load_kwargs': {'attn_implementation': 'eager', 'trust_remote_code': False}, 'test_model_config': {'hidden_layers': 2}, 'test_model_path': '/mnt/vss/_work/1/.pytest_basetemp/test_workflow_run_command_with1/output/test_model'}, 'output_dir': '/mnt/vss/_work/1/.pytest_basetemp/test_workflow_run_command_with1/output', 'passes': {'discrepancy_check': {'type': 'OnnxDiscrepancyCheck', 'reference_model_path': '/mnt/vss/_work/1/.pytest_basetemp/test_workflow_run_command_with1/output/test_model', 'report_output_dir': '/mnt/vss/_work/1/.pytest_basetemp/test_workflow_run_command_with1/output', 'max_mae': 0.1, 'timing_iterations': 0}}}, list_required_packages=False, tempdir=None, package_config=None)

Copilot AI commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

@copilot fix test/cli/test_cli.py:178: in test_workflow_run_command_with_test_override
mock_run.assert_called_once_with(
/opt/hostedtoolcache/Python/3.12.13/x64/lib/python3.12/unittest/mock.py:9...

Fixed in the latest commit. Added "timing_iterations": 0 to the expected discrepancy_check pass config in test_workflow_run_command_with_test_override, matching the new default mae-only behavior.

xadupre pushed a commit that referenced this pull request Jul 1, 2026
… (with dedicated GGUF conversion pass) (#2548)

## Describe your changes

Merges #2536, #2535, #2534.

Additionally adds llama.cpp integration and other improvements to
`OnnxDiscrepancyCheck` and test-mode workflow handling:

- **New `llama_cpp` flag** (`bool`, default `False`) on
`OnnxDiscrepancyCheck` — when enabled, compares inference with
llama.cpp.
- **New `llama_cpp_env_path` parameter** (`Optional[str]`) — path to the
`llama_env` virtual environment where `llama-cpp-python` and
`convert_hf_to_gguf.py` are installed (defaults to `"llama_env"`
relative to cwd).
- **New `--test_llama_path` CLI option** — specifies the path to the
`llama_env` virtual environment when running with `--test`. Using
`--test_llama_path` without `--test` emits a warning.
- **New `ConvertHfToGGUF` pass**
(`olive/passes/pytorch/convert_hf_to_gguf.py`) — injected when
`--test_llama_path` is provided. This pass converts the test HF model to
GGUF ahead of discrepancy checking and stores the GGUF path in model
attributes for downstream reuse.
- **`compare_llama_cpp()` updates** — now reuses a preconverted GGUF
when available; otherwise it falls back to in-method HF→GGUF conversion.
llama.cpp comparison failures are captured in discrepancy results
(status/failures) instead of aborting the whole run, so ONNX generation
can still complete.
- **Improved `--test_metrics` parsing** — now accepts both
space-separated (`--test_metrics mae speedup`) and comma-separated
(`--test_metrics mae,speedup`) forms.
- **Fixed `add_discrepancy_check_pass` update-in-place** — existing
discrepancy-pass config generated by dry-run is updated in-place so
current `--test_metrics`, `--output_path`, and llama settings are
applied.
- **Fixed test model persistence across engine cache hits** —
`ModelBuilder` stores a reference HF copy (`reference_hf_model/`)
alongside cached ONNX outputs; discrepancy check falls back to this copy
if the original test model path is missing.
- **New `SaveTestModelConfig` pass**
(`olive/passes/pytorch/save_test_model_config.py`) — injected at the
start of passes for `--test`; ensures test model config/marker (and
random test model persistence path usage) is set up before downstream
passes.
- **CI workflow** (`test-model-fast.yml`) — includes setup of a llama
environment and llama.cpp conversion script dependencies.
- **Updated documentation** (`cli-fast-test.md`) — clarifies where layer
reduction happens, when test-model directories are created, cache
fallback behavior, and llama.cpp test flow including the dedicated GGUF
conversion pass.

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.

## (Optional) Issue link

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants