-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Translation on Multiple GPUs with device_index in v.2.15.0+ #786
Comments
I'm not reproducing this error. Do you confirm the error is raised when creating the Can you also describe the model you are using (original framework, model type, quantization, etc.)? Reporting the output with |
Ok, I got the same error when loading a newly converted model. I will check. Thanks for the report! |
It seems like the model is now able to call different GPUs, but under load, it tends to hit a waitress problem with Task queue depth, stopping the REST server. A restart of the REST server is needed to restore service. Switching back to only using one GPU (0) doesn't have this problem despite the same load. |
Do you mean the performance is reduced when using multiple GPUs? Can you be more specific? Consider opening a separate issue if you can isolate the issue with CTranslate2. |
The error is actually a waitress problem that's used by OpenNMT-py REST server. It happens under very intense load (when api calls are repeatedly made) which is understandable. However, getting the REST server to load different ctranslate models depending on different GPUs seem to make the "task queue depth" error happen more frequently (attempts to fix it on waitress's side by increasing the number of threads from a default of 4 to 32 doesn't help at all). But letting the REST server serve/load all the models on 1 GPU doesn't have this problem. I noticed this when switching back to back (across GPUs or with one GPU) with load being fairly similar. |
As far as I know, the OpenNMT-py server cannot process multiple translations in parallel. So the model running on multiple GPUs will only use 1 GPU at a time. See this issue OpenNMT/OpenNMT-py#2001 (comment). |
The issue is in versions 2.15.0 and 2.15.1 during translation, while it works fine in 2.14.0
Code:
Error:
The text was updated successfully, but these errors were encountered: