Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I used pytorch-quantization to perform PTQ int8 quantization on ResNet50 and exported it to onnx, followed by exporting it to engine. trt. When reasoning, I found that the speed did not increase, but instead slowed down. What went wrong. #4304

Open
jishenghuang opened this issue Dec 30, 2024 · 2 comments

Comments

@jishenghuang
Copy link

Description

Environment

TensorRT Version: 10.7

NVIDIA GPU: rtx3090

NVIDIA Driver Version:

CUDA Version: 11.7

CUDNN Version:

Operating System:

Python Version (if applicable): 3.10

Tensorflow Version (if applicable):

PyTorch Version (if applicable):

Baremetal or Container (if so, version):

Relevant Files

Model link:

Steps To Reproduce

Commands or scripts:

Have you tried the latest release?:

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

@lix19937
Copy link

lix19937 commented Jan 1, 2025

Can u upload the onnx ?

@jishenghuang
Copy link
Author

[I exported both quantified and unquantified models based on fixed and dynamic batches, and found that the inference speed did not increase. Here is the onnx model I exported.
1.Fixed Batch:
Quantified:
Image

Without quantification:
Image

2.Dynamic Batch:
Quantified:
Image

Without quantification:
Image
Unable to upload onnx model, use screenshot instead. If necessary, I can add your contact information and send you these onnx models

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants