You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
2. Please use English, otherwise it will be closed.
Motivation
A lot of times when you're dealing with constained max_tokens sampling, you end up getting cut-off responses. I believe if we were to have a penalizer start rewarding the EOS token more and more as you get closer to max_tokens, this would help LLMs create answers that are both complete and within the token limit.
The reward would start at 0% on the 0th token and and end up at 100% for the max_tokensth token. We would presumably want to scale this exponentially to avoid getting very short responses that are much shorter than max_tokens.
Unfortunately my Python skills are not sufficient to implement this so I'm creating a feature request instead.
Related resources
No response
The text was updated successfully, but these errors were encountered:
Checklist
Motivation
A lot of times when you're dealing with constained
max_tokens
sampling, you end up getting cut-off responses. I believe if we were to have a penalizer start rewarding the EOS token more and more as you get closer tomax_tokens
, this would help LLMs create answers that are both complete and within the token limit.The reward would start at 0% on the 0th token and and end up at 100% for the
max_tokens
th token. We would presumably want to scale this exponentially to avoid getting very short responses that are much shorter thanmax_tokens
.Unfortunately my Python skills are not sufficient to implement this so I'm creating a feature request instead.
Related resources
No response
The text was updated successfully, but these errors were encountered: