Long Inference Time on First Run After Changing Input Shape in Dynamic Shape TensorRT Engine #4289

renne444 · 2024-12-19T06:55:10Z

Description

I have modified the inference part of the code from the Stable Diffusion demo and generated a TensorRT engine with dynamic shapes. However, every time I change the input shape, the inference time is very long, which makes it difficult to perform inference with user-defined image sizes. Even small changes in input shape lead to significant variations in inference time.

I am running inference on the Unet+Controlnet part of the Stable Diffusion demo, where both the input and output have dynamic shapes. The code I modified is based on the following implementation, and I am not using CUDA Graph during inference.

The range of shapes I am working with is quite large, from a minimum of [1, 3, 1, 1] to a maximum of [8, 3, 1280, 1280]. However, when I adjust the input shape from [2, 3, 1024, 1024] to [2, 3, 1024, 960], the inference time for the first run is very long after the shape changed, and only on the second run does the inference time meet expectations. So that the feature for custom image sizes cannot be implement.

Could you help me understand why this behavior occurs and how to optimize the inference time for changing input and output shapes?

Code

    def infer_with_dynamic_shape(self, feed_dict, shape_dict, stream, use_cuda_graph=False):

		for binding in range(self.engine.num_io_tensors):
            tensor_name = self.engine.get_tensor_name(binding)
            if self.engine.get_tensor_mode(tensor_name) == trt.TensorIOMode.INPUT:
                input_tensor = feed_dict[tensor_name]
                if not isinstance(input_tensor, torch.Tensor) or self.tensors[tensor_name].shape == input_tensor.shape and self.tensors[tensor_name].device == input_tensor.device:
                    self.tensors[tensor_name].copy_(input_tensor)
                else:
                    device = self.tensors[tensor_name].device
                    dtype = self.tensors[tensor_name].dtype
                    self.tensors[tensor_name] = torch.empty(shape_dict[tensor_name], dtype=dtype).to(device=device)
                    self.tensors[tensor_name].copy_(input_tensor)
                    self.context.set_input_shape(tensor_name, shape_dict[tensor_name])
            else:
                if shape_dict[tensor_name] and shape_dict[tensor_name] != self.tensors[tensor_name].shape:
                    device = self.tensors[tensor_name].device
                    dtype = self.tensors[tensor_name].dtype
                    pre_shape = self.tensors[tensor_name].shape
                    self.tensors[tensor_name] = torch.empty(shape_dict[tensor_name], dtype=dtype).to(device=device)

        for name, tensor in self.tensors.items():
            self.context.set_tensor_address(name, tensor.data_ptr())

        if use_cuda_graph:
            if self.cuda_graph_instance is not None:
                CUASSERT(cudart.cudaGraphLaunch(self.cuda_graph_instance, stream))
                CUASSERT(cudart.cudaStreamSynchronize(stream))
            else:
                # do inference before CUDA graph capture
                noerror = self.context.execute_async_v3(stream)
                if not noerror:
                    raise ValueError(f"ERROR: inference failed.")
                # capture cuda graph
                CUASSERT(
                    cudart.cudaStreamBeginCapture(stream, cudart.cudaStreamCaptureMode.cudaStreamCaptureModeGlobal))
                self.context.execute_async_v3(stream)
                self.graph = CUASSERT(cudart.cudaStreamEndCapture(stream))
                self.cuda_graph_instance = CUASSERT(cudart.cudaGraphInstantiate(self.graph, 0))
        else:
            noerror = self.context.execute_async_v3(stream)
            if not noerror:
                cuda_error = cudart.cudaGetLastError()
                raise ValueError(f"ERROR: inference failed. ")

Environment

TensorRT Version: 10.7.0

NVIDIA GPU: Nvidia L20

NVIDIA Driver Version: 535.216.01

CUDA Version: 12.2

CUDNN Version:

The text was updated successfully, but these errors were encountered:

asfiyab-nvidia · 2024-12-19T22:57:45Z

@renne444 Thanks for raising the issue. To run the Stable Diffusion demo with dynamic shapes, you do not need to modify the code. We provide a flag --build-dynamic-shape that you can specify in addition to the demo command.

If the range of your input shapes are [1, 3, 1, 1] to [8, 3, 1280, 1280], you will need to update the min_image_shape here to 1.

Coming to perf issues - here is the detailed doc for Dynamic Shapes and related behavior.
Note that with Dynamic Shapes enabled, the demo will perform the best (lowest latency) for the image shapes provided in the demo command using --height and --width at the time of engine build. With the above recommendations, of you're still seeing a large difference in latency when you first change the input shape vs the subsequent runs, please share the numbers here

songh11 · 2024-12-30T11:03:41Z

@renne444 I've encountered this issue as well. Have you managed to resolve it? Any help or insights would be greatly appreciated.

renne444 · 2024-12-31T01:51:43Z

@songh11 No, I haven't found a way yet. I plan to use PyTorch instead.

songh11 · 2024-12-31T02:05:59Z

@songh11 No, I haven't found a way yet. I plan to use PyTorch instead.

I see. Thank you for your reply

asfiyab-nvidia added triaged Issue has been triaged by maintainers Demo: Diffusion Issues regarding demoDiffusion labels Dec 19, 2024

asfiyab-nvidia self-assigned this Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long Inference Time on First Run After Changing Input Shape in Dynamic Shape TensorRT Engine #4289

Long Inference Time on First Run After Changing Input Shape in Dynamic Shape TensorRT Engine #4289

renne444 commented Dec 19, 2024 •

edited

Loading

asfiyab-nvidia commented Dec 19, 2024

songh11 commented Dec 30, 2024

renne444 commented Dec 31, 2024

songh11 commented Dec 31, 2024

Long Inference Time on First Run After Changing Input Shape in Dynamic Shape TensorRT Engine #4289

Long Inference Time on First Run After Changing Input Shape in Dynamic Shape TensorRT Engine #4289

Comments

renne444 commented Dec 19, 2024 • edited Loading

Description

Code

Environment

asfiyab-nvidia commented Dec 19, 2024

songh11 commented Dec 30, 2024

renne444 commented Dec 31, 2024

songh11 commented Dec 31, 2024

renne444 commented Dec 19, 2024 •

edited

Loading