I used pytorch-quantization to perform PTQ int8 quantization on ResNet50 and exported it to onnx, followed by exporting it to engine. trt. When reasoning, I found that the speed did not increase, but instead slowed down. What went wrong. #4304

jishenghuang · 2024-12-30T02:54:11Z

Description

Environment

TensorRT Version: 10.7

NVIDIA GPU: rtx3090

NVIDIA Driver Version:

CUDA Version: 11.7

CUDNN Version:

Operating System:

Python Version (if applicable): 3.10

Tensorflow Version (if applicable):

PyTorch Version (if applicable):

Baremetal or Container (if so, version):

Relevant Files

Model link:

Steps To Reproduce

Commands or scripts:

Have you tried the latest release?:

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

The text was updated successfully, but these errors were encountered:

lix19937 · 2025-01-01T14:00:32Z

Can u upload the onnx ?

jishenghuang · 2025-01-06T01:51:51Z

[I exported both quantified and unquantified models based on fixed and dynamic batches, and found that the inference speed did not increase. Here is the onnx model I exported.
1.Fixed Batch:
Quantified:

Without quantification:

2.Dynamic Batch:
Quantified:

Without quantification:

Unable to upload onnx model, use screenshot instead. If necessary, I can add your contact information and send you these onnx models

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I used pytorch-quantization to perform PTQ int8 quantization on ResNet50 and exported it to onnx, followed by exporting it to engine. trt. When reasoning, I found that the speed did not increase, but instead slowed down. What went wrong. #4304

I used pytorch-quantization to perform PTQ int8 quantization on ResNet50 and exported it to onnx, followed by exporting it to engine. trt. When reasoning, I found that the speed did not increase, but instead slowed down. What went wrong. #4304

jishenghuang commented Dec 30, 2024

lix19937 commented Jan 1, 2025

jishenghuang commented Jan 6, 2025

I used pytorch-quantization to perform PTQ int8 quantization on ResNet50 and exported it to onnx, followed by exporting it to engine. trt. When reasoning, I found that the speed did not increase, but instead slowed down. What went wrong. #4304

I used pytorch-quantization to perform PTQ int8 quantization on ResNet50 and exported it to onnx, followed by exporting it to engine. trt. When reasoning, I found that the speed did not increase, but instead slowed down. What went wrong. #4304

Comments

jishenghuang commented Dec 30, 2024

Description

Environment

Relevant Files

Steps To Reproduce

lix19937 commented Jan 1, 2025

jishenghuang commented Jan 6, 2025