You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
problem
I am trying to run inference on a Qualcomm QCM6490 device, which requires specific dependencies to utilize its NPU. To meet these requirements, I use the Qualcomm-provided SDK image and container that includes the necessary binaries for running inference on an aarch64 architecture.
However, when I attempt to use the Triton Inference Server on the host device, it becomes necessary to install these dependencies inside the Triton container. This approach is not feasible due to compatibility issues, dependency conflicts, and the additional overhead of customizing the Triton container.
Describe the solution you'd like
To address this issue, I propose enabling Triton Inference Server to run within the Qualcomm SDK container. Specifically:
The Triton server should be able to run as a service inside the SDK container, leveraging the pre-installed binaries and dependencies provided by Qualcomm.
This setup would allow users to clone the Triton server, deploy it within the SDK container, and run inference as a service.
The inference service should support API calls where:
-> Users can send images and other necessary inputs.
-> The service returns the output tensors, enabling seamless integration.
Ideally, the inference should utilize the device's NPU for optimal performance.
Use Case
This feature would benefit users running Triton Inference Server on specialized devices like the Qualcomm QCM6490, where dependency management and NPU optimization are critical. It would streamline workflows and reduce the complexity of configuring Triton containers for such devices.
Expected Outcome
By enabling Triton Inference Server to operate within the SDK container:
-> Users can leverage the Qualcomm binaries directly without modifying the Triton container.
-> The solution becomes more scalable and user-friendly.
-> Inference tasks can take full advantage of the NPU's capabilities on aarch64 devices.
The text was updated successfully, but these errors were encountered:
problem
I am trying to run inference on a Qualcomm QCM6490 device, which requires specific dependencies to utilize its NPU. To meet these requirements, I use the Qualcomm-provided SDK image and container that includes the necessary binaries for running inference on an aarch64 architecture.
However, when I attempt to use the Triton Inference Server on the host device, it becomes necessary to install these dependencies inside the Triton container. This approach is not feasible due to compatibility issues, dependency conflicts, and the additional overhead of customizing the Triton container.
Describe the solution you'd like
To address this issue, I propose enabling Triton Inference Server to run within the Qualcomm SDK container. Specifically:
-> Users can send images and other necessary inputs.
-> The service returns the output tensors, enabling seamless integration.
Ideally, the inference should utilize the device's NPU for optimal performance.
Use Case
This feature would benefit users running Triton Inference Server on specialized devices like the Qualcomm QCM6490, where dependency management and NPU optimization are critical. It would streamline workflows and reduce the complexity of configuring Triton containers for such devices.
Expected Outcome
By enabling Triton Inference Server to operate within the SDK container:
-> Users can leverage the Qualcomm binaries directly without modifying the Triton container.
-> The solution becomes more scalable and user-friendly.
-> Inference tasks can take full advantage of the NPU's capabilities on aarch64 devices.
The text was updated successfully, but these errors were encountered: