You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have modified the inference part of the code from the Stable Diffusion demo and generated a TensorRT engine with dynamic shapes. However, every time I change the input shape, the inference time is very long, which makes it difficult to perform inference with user-defined image sizes. Even small changes in input shape lead to significant variations in inference time.
I am running inference on the Unet+Controlnet part of the Stable Diffusion demo, where both the input and output have dynamic shapes. The code I modified is based on the following implementation, and I am not using CUDA Graph during inference.
The range of shapes I am working with is quite large, from a minimum of [1, 3, 1, 1] to a maximum of [8, 3, 1280, 1280]. However, when I adjust the input shape from [2, 3, 1024, 1024] to [2, 3, 1024, 960], the inference time for the first run is very long after the shape changed, and only on the second run does the inference time meet expectations. So that the feature for custom image sizes cannot be implement.
Could you help me understand why this behavior occurs and how to optimize the inference time for changing input and output shapes?
@renne444 Thanks for raising the issue. To run the Stable Diffusion demo with dynamic shapes, you do not need to modify the code. We provide a flag --build-dynamic-shape that you can specify in addition to the demo command.
If the range of your input shapes are [1, 3, 1, 1] to [8, 3, 1280, 1280], you will need to update the min_image_shapehere to 1.
Coming to perf issues - here is the detailed doc for Dynamic Shapes and related behavior.
Note that with Dynamic Shapes enabled, the demo will perform the best (lowest latency) for the image shapes provided in the demo command using --height and --width at the time of engine build. With the above recommendations, of you're still seeing a large difference in latency when you first change the input shape vs the subsequent runs, please share the numbers here
Description
I have modified the inference part of the code from the Stable Diffusion demo and generated a TensorRT engine with dynamic shapes. However, every time I change the input shape, the inference time is very long, which makes it difficult to perform inference with user-defined image sizes. Even small changes in input shape lead to significant variations in inference time.
I am running inference on the Unet+Controlnet part of the Stable Diffusion demo, where both the input and output have dynamic shapes. The code I modified is based on the following implementation, and I am not using CUDA Graph during inference.
The range of shapes I am working with is quite large, from a minimum of
[1, 3, 1, 1]
to a maximum of[8, 3, 1280, 1280]
. However, when I adjust the input shape from[2, 3, 1024, 1024]
to[2, 3, 1024, 960]
, the inference time for the first run is very long after the shape changed, and only on the second run does the inference time meet expectations. So that the feature for custom image sizes cannot be implement.Could you help me understand why this behavior occurs and how to optimize the inference time for changing input and output shapes?
Code
Environment
TensorRT Version: 10.7.0
NVIDIA GPU: Nvidia L20
NVIDIA Driver Version: 535.216.01
CUDA Version: 12.2
CUDNN Version:
The text was updated successfully, but these errors were encountered: