-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strange output of Flan-T5-Large when using float16 converted model #1074
Comments
Thanks again for reporting the issue. I found that the model produces NaN values after the 5th layer. Apparently this is a known issue when running T5 models in FP16: huggingface/transformers#10956. As far as I understand, the issue is that the model is trained with We can try the change that is proposed in this PR but it does not seem efficient. Another solution would be to support Maybe you can go one step further and use the |
Ok right 👍 |
Hey there! I discovered some similar quirks evaluating ctranslate2 with flan-t5. I compared the outputs of flan-t5-xl and flan-t5-xxl on GPU using float32, int8_float16, float16, and int8 The results for "Who is Barrack Obama?":
Conclusion: The model is losing a lot in terms of precision when running in int8_float16 in int8. XXL model seems to be more affected as it generates nonsensical text in int8. Here is the code being used:
|
Update: After investigating, it seems a part of the answer is there: https://github.com/huggingface/transformers/pull/22095/commits |
Seems also related to huggingface/transformers#20287 |
Yeah, I get it returning its own input many times. No real answers. |
So from reading those issues, it doesn't quantize properly to either fp16 or int8, but supposedly works fine as bf16 or fp32. Also, it looks like this is the actual underlying issue: triton-inference-server/fastertransformer_backend#95 (comment) |
Here are more details: huggingface/transformers#20287 (comment) |
@guillaumekln This appears to be the necessary patch: larsmennen/transformers@f90b269 |
However it looks like the patch has already been upstream for a while. Any ideas? |
#1239 keeps the FFN output layer in FP32 as suggested in the different issues mentioned above. Can you help testing this development?
The model must converted again with this version. |
@guillaumekln Thank you for the quick fix! This appears to solve the problem for me. |
We are running the tests as we speak on a FlanT5 XXL. I seems there is the same problem in fp8. I will try with fp16. Maybe related to this ? https://github.com/OpenNMT/CTranslate2/pull/1239/files#diff-c2632813f7cc9ff44bda20479812ebefd15be7f7fe1bd099e553111bcb6750acR171 |
You meant int8, right? More specifically did you try int8 or int8_float16? |
Sorry about the typo. Int8 indeed. So here at the tests with Flan T5 XXL:
With
int8: With int8: I may be missing something. |
Thanks for the test. The error should be fixed in this build: https://github.com/OpenNMT/CTranslate2/actions/runs/5068556228 You can check whether it improves the output for "float16", but "int8_float16" will probably behave like "int8". I'm not sure what else is needed to improve the output with 8-bit quantization. |
Results are looking good. I also experienced garbled output previously with CTranslate and T5-Flan series. Running a domain specific t5-xl based on flan-xl and trying @baptistejamin prompt with latest build: https://github.com/OpenNMT/CTranslate2/actions/runs/5068556228 : Who is barrack obama? -> Barack Obama is the 44th President of the United States. He was elected in 2008 and has been in office since January 20, 2009. With: ctranslate2.Translator( Equally response on: |
Fixed for me as well, thank you |
I can confirm it works fine with Flan XL int8 with the latest build However, Flan XXL int8 it still returns non-sensical text, and it for the same prompt, the output is worse than a XL model. Finally, float16 mode still returns It behaves exactly as before the initial patch was made. |
Hi, |
No the change is not merged yet. It does not seem complete since the generation for Flan XXL int8 is still broken. |
I've been working with XXL in int8 with 3.15.1 and it appears to be emitting valid responses. Both cuda and CPU. @baptistejamin FYI |
What’s your prompt and parameters ?Sent from my iPhoneOn 15 Jun 2023, at 03:55, Bradley Fox ***@***.***> wrote:
I've been working with XXL in int8 with 3.15.1 and it appears to be emitting valid responses. Both cuda and CPU. @baptistejamin FYI
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
@baptistejamin Any prompt works just fine - However, I had a recent converted model fail in the same way. It that case it was how tokenizer was converted to the config.json that that CT2 creates. The bos and and decoder start sequence tokens were I haven't spent much time troubleshooting but it comes from when the Seq2Seq model inits in model_spec.py: 386
Maybe this is also your issue? |
FYI, CTranslate2 3.17 supports the "bfloat16" compute type which should be used instead of "float16" for these models. |
Hi,
I've installed the "python-wheels" artifact from #1066 .
Then, I use the following code to answer to a question based on a context:
which generates the following text :
<pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad>
Any suggestion regarding this result?
The text was updated successfully, but these errors were encountered: