You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I used pytorch-quantization to perform PTQ int8 quantization on ResNet50 and exported it to onnx, followed by exporting it to engine. trt. When reasoning, I found that the speed did not increase, but instead slowed down. What went wrong.
#4304
Open
jishenghuang opened this issue
Dec 30, 2024
· 2 comments
[I exported both quantified and unquantified models based on fixed and dynamic batches, and found that the inference speed did not increase. Here is the onnx model I exported.
1.Fixed Batch:
Quantified:
Without quantification:
2.Dynamic Batch:
Quantified:
Without quantification:
Unable to upload onnx model, use screenshot instead. If necessary, I can add your contact information and send you these onnx models
Description
Environment
TensorRT Version: 10.7
NVIDIA GPU: rtx3090
NVIDIA Driver Version:
CUDA Version: 11.7
CUDNN Version:
Operating System:
Python Version (if applicable): 3.10
Tensorflow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if so, version):
Relevant Files
Model link:
Steps To Reproduce
Commands or scripts:
Have you tried the latest release?:
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (
polygraphy run <model.onnx> --onnxrt
):The text was updated successfully, but these errors were encountered: