Fix aten::any.dim truthiness for numeric tensors#4353
Conversation
|
@fallintoplace Thanks for the contribution. Torchscript is being removed soon (like pytorch 2.14). Please use Dynamo instead. |
|
@zewenli98 What about this one? Thank you for your attention. |
zewenli98
left a comment
There was a problem hiding this comment.
@fallintoplace Thanks for the contribution. Torchscript is being removed soon (like pytorch 2.14). Please use Dynamo instead.
Pytorch decided not to remove Torchscript support in 2.14.
This PR fixes the correctness bug in aten::any on numeric tensors (cases like [1, -1] that cancel to 0). Can you fix the comments and rebase? Thanks!
| in_tensor = castITensor(ctx, in_tensor, nvinfer1::DataType::kINT32, (util::node_info(n) + "_in").c_str()); | ||
| } else { | ||
| // Numeric truthiness is based on nonzero elements, not the reduced sum of raw values. | ||
| auto zero_tensor = tensor_to_const(ctx, torch::tensor({0}, torch::kInt32), util::node_info(n) + "_zero"); |
There was a problem hiding this comment.
zero_tensor with torch::kInt32 is created here and then compared with in_tensor. If in_tensor is float/half or lives on a different device, the elementwise EQUAL may fail at network construction or behave incorrectly.
Can you create the zero constant to match in_tensor's dtype and device?
| } | ||
|
|
There was a problem hiding this comment.
Can you test different dtypes like fp32 and fp16?
Can you also test if NaN works? e.g.:
>>> torch.any(torch.tensor([0, torch.nan]))
tensor(True)
910ffd0 to
e5ecc75
Compare
|
@fallintoplace Can you fix the lint issue? please refer to https://github.com/pytorch/TensorRT/blob/main/CONTRIBUTING.md#coding-guidelines. Thanks |
e5ecc75 to
1254a03
Compare
1254a03 to
7e03ae2
Compare
Summary
aten::any.diminputs into an explicit nonzero mask before reducing.Testing
git diff --checkbazelisk test //tests/core/conversion/converters:test_reducewas attempted locally. The first run was blocked by missing PyTorch; after installing PyTorch in a local venv and settingTORCH_PATH, Bazel stayed in analysis for several minutes on this local machine and was interrupted before running tests.