-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is batch streaming possible with the Text Generation functions? #1423
Comments
What do you mean by batch streaming exactly? |
Something like the following:
By itself it is definetly not working and currently not implemented as far as i see from the generator.generate_tokens function. In the extensions.py CTranslate2/python/ctranslate2/extensions.py Line 299 in 61d3450
The generator gets a list. When providing a list to generate_tokens it crashes. Btw. not sure if the type hint is correct then, i think it should be str. It should however be possible with limited adjustments to the Code. Adjusting the Code above leads to the following output:
|
The type hint is correct. This method takes a list of tokens, but not a batch of list of tokens. For batch mode, see the "Tip" note in the related documentation: https://opennmt.net/CTranslate2/generation.html#token-streaming |
My bad, the tokenizer converts it to a list 😐. Thanks for the hint of the callback 👍 |
I'm closing this issue, even though the term "batch streaming" was not clarified by OP.
|
There doesn't seem to be good documentation on using
generate_iterable
. From just the name, I get the sense that this could potentially be used to batch stream. I'm looking to improve the performance of tgi / vllm and streaming is a crucial functionality that I would like to support but it's unclear if it's possible with CTranslate2.The text was updated successfully, but these errors were encountered: