[torchlib] Implement quantize_per_channel and dequantize_per_channel#2390
Merged
Conversation
Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] [torchlib] Implement quantize_per_channel and dequantize_per_channel
[torchlib] Implement quantize_per_channel and dequantize_per_channel
Jun 14, 2025
justinchuby
reviewed
Jun 14, 2025
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2390 +/- ##
==========================================
- Coverage 72.66% 72.64% -0.03%
==========================================
Files 259 259
Lines 31748 31766 +18
Branches 3005 3007 +2
==========================================
+ Hits 23069 23075 +6
- Misses 7660 7672 +12
Partials 1019 1019 ☔ View full report in Codecov by Harness. |
Collaborator
|
Please format the code using lintrunner. You can do then |
Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>
Contributor
Author
... Done! Code has been formatted using lintrunner. The formatting fixed import sorting, whitespace issues, and line wrapping. Commit: 82c8f9e |
justinchuby
approved these changes
Dec 30, 2025
Contributor
|
How do we know this is correct? |
Adds e2e numerical-parity tests (eager torch vs ORT) for quantize_per_channel and dequantize_per_channel covering int8 and uint8, axis 0 and non-zero axes, distinct per-channel scales/zero_points, narrow quant_min/quant_max clamping, and a quantize->dequantize round trip. Fixes the lowering to make parity hold: - Per-axis QuantizeLinear/DequantizeLinear (opset 13+ axis attribute) is used instead of opset23-only attributes (output_dtype/precision/ block_size), which were emitted into an opset-18 graph and rejected by ONNX Runtime. - zero_points (int64 from torch) is now cast to the quantized dtype so it matches the int8/uint8 tensor, and scales (float64 from torch) is cast to the input/float32 type as required by Q/DQ. - quantize_per_channel now clamps to quant_min/quant_max (via Clip) to match torch's reference semantics; dequantize defaults to float32. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
titaiwangms
approved these changes
Jun 19, 2026
# Conflicts: # tests/function_libs/torch_lib/e2e_ops_tests.py
Resolve merge conflict in e2e_ops_tests.py by keeping both main's torchvision import (for deform_conv2d tests) and the quantized_decomposed registration import for the new per_channel parity tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
titaiwangms
approved these changes
Jun 19, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR implements the missing
quantize_per_channelanddequantize_per_channeloperations in the torchlib quantized_decomposed module.Changes
Added two new functions to
onnxscript/function_libs/torch_lib/ops/quantized_decomposed.py:quantized_decomposed_quantize_per_channelscalesandzero_points(one value per channel)axisparameter to specify the quantization dimensionquantized_decomposed_dequantize_per_channelscalesand optionalzero_pointszero_pointsparameter isOptional[TensorType]matching PyTorch referenceoutput_dtypeparameterImplementation Details
Both functions:
@torch_opdecorator withtrace_only=Truetorch.ao.quantization.fx._decomposedaxisandoutput_dtypeparameters for per-axis quantizationThe implementation leverages ONNX's native per-axis quantization support rather than implementing the tensor manipulation logic from the PyTorch reference, making it more efficient and aligned with ONNX best practices.
Testing
Validated that:
Fixes #2389.
💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.