Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long Inference Time on First Run After Changing Input Shape in Dynamic Shape TensorRT Engine #4289

Open
renne444 opened this issue Dec 19, 2024 · 4 comments
Assignees
Labels
Demo: Diffusion Issues regarding demoDiffusion triaged Issue has been triaged by maintainers

Comments

@renne444
Copy link

renne444 commented Dec 19, 2024

Description

I have modified the inference part of the code from the Stable Diffusion demo and generated a TensorRT engine with dynamic shapes. However, every time I change the input shape, the inference time is very long, which makes it difficult to perform inference with user-defined image sizes. Even small changes in input shape lead to significant variations in inference time.

I am running inference on the Unet+Controlnet part of the Stable Diffusion demo, where both the input and output have dynamic shapes. The code I modified is based on the following implementation, and I am not using CUDA Graph during inference.

The range of shapes I am working with is quite large, from a minimum of [1, 3, 1, 1] to a maximum of [8, 3, 1280, 1280]. However, when I adjust the input shape from [2, 3, 1024, 1024] to [2, 3, 1024, 960], the inference time for the first run is very long after the shape changed, and only on the second run does the inference time meet expectations. So that the feature for custom image sizes cannot be implement.

Could you help me understand why this behavior occurs and how to optimize the inference time for changing input and output shapes?

Code

    def infer_with_dynamic_shape(self, feed_dict, shape_dict, stream, use_cuda_graph=False):

		for binding in range(self.engine.num_io_tensors):
            tensor_name = self.engine.get_tensor_name(binding)
            if self.engine.get_tensor_mode(tensor_name) == trt.TensorIOMode.INPUT:
                input_tensor = feed_dict[tensor_name]
                if not isinstance(input_tensor, torch.Tensor) or self.tensors[tensor_name].shape == input_tensor.shape and self.tensors[tensor_name].device == input_tensor.device:
                    self.tensors[tensor_name].copy_(input_tensor)
                else:
                    device = self.tensors[tensor_name].device
                    dtype = self.tensors[tensor_name].dtype
                    self.tensors[tensor_name] = torch.empty(shape_dict[tensor_name], dtype=dtype).to(device=device)
                    self.tensors[tensor_name].copy_(input_tensor)
                    self.context.set_input_shape(tensor_name, shape_dict[tensor_name])
            else:
                if shape_dict[tensor_name] and shape_dict[tensor_name] != self.tensors[tensor_name].shape:
                    device = self.tensors[tensor_name].device
                    dtype = self.tensors[tensor_name].dtype
                    pre_shape = self.tensors[tensor_name].shape
                    self.tensors[tensor_name] = torch.empty(shape_dict[tensor_name], dtype=dtype).to(device=device)

        for name, tensor in self.tensors.items():
            self.context.set_tensor_address(name, tensor.data_ptr())

        if use_cuda_graph:
            if self.cuda_graph_instance is not None:
                CUASSERT(cudart.cudaGraphLaunch(self.cuda_graph_instance, stream))
                CUASSERT(cudart.cudaStreamSynchronize(stream))
            else:
                # do inference before CUDA graph capture
                noerror = self.context.execute_async_v3(stream)
                if not noerror:
                    raise ValueError(f"ERROR: inference failed.")
                # capture cuda graph
                CUASSERT(
                    cudart.cudaStreamBeginCapture(stream, cudart.cudaStreamCaptureMode.cudaStreamCaptureModeGlobal))
                self.context.execute_async_v3(stream)
                self.graph = CUASSERT(cudart.cudaStreamEndCapture(stream))
                self.cuda_graph_instance = CUASSERT(cudart.cudaGraphInstantiate(self.graph, 0))
        else:
            noerror = self.context.execute_async_v3(stream)
            if not noerror:
                cuda_error = cudart.cudaGetLastError()
                raise ValueError(f"ERROR: inference failed. ")

Environment

TensorRT Version: 10.7.0

NVIDIA GPU: Nvidia L20

NVIDIA Driver Version: 535.216.01

CUDA Version: 12.2

CUDNN Version:

@asfiyab-nvidia asfiyab-nvidia added triaged Issue has been triaged by maintainers Demo: Diffusion Issues regarding demoDiffusion labels Dec 19, 2024
@asfiyab-nvidia asfiyab-nvidia self-assigned this Dec 19, 2024
@asfiyab-nvidia
Copy link
Collaborator

@renne444 Thanks for raising the issue. To run the Stable Diffusion demo with dynamic shapes, you do not need to modify the code. We provide a flag --build-dynamic-shape that you can specify in addition to the demo command.

If the range of your input shapes are [1, 3, 1, 1] to [8, 3, 1280, 1280], you will need to update the min_image_shape here to 1.

Coming to perf issues - here is the detailed doc for Dynamic Shapes and related behavior.
Note that with Dynamic Shapes enabled, the demo will perform the best (lowest latency) for the image shapes provided in the demo command using --height and --width at the time of engine build. With the above recommendations, of you're still seeing a large difference in latency when you first change the input shape vs the subsequent runs, please share the numbers here

@songh11
Copy link

songh11 commented Dec 30, 2024

@renne444 I've encountered this issue as well. Have you managed to resolve it? Any help or insights would be greatly appreciated.

@renne444
Copy link
Author

@songh11 No, I haven't found a way yet. I plan to use PyTorch instead.

@songh11
Copy link

songh11 commented Dec 31, 2024

@songh11 No, I haven't found a way yet. I plan to use PyTorch instead.

I see. Thank you for your reply

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Demo: Diffusion Issues regarding demoDiffusion triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

3 participants