-
Notifications
You must be signed in to change notification settings - Fork 664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
regex stopping condition #2035
regex stopping condition #2035
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- It is better to implement it here
def check_finished(self): - Can you add a new argument for this stop condition, so we do not make the old simple string match slower? You can add a new field at SamplingParameters
class SamplingParams:
thank you @merrymercy. In |
yes |
hi @merrymercy I was testing my change and I don't think using just We could reuse for stop_regex_str in self.sampling_params.stop_regex_strs:
logger.debug(f"stop_regex='{stop_regex_str}' decoded_text_length={len(self.decoded_text)}")
if re.search(stop_regex_str, self.decoded_text):
self.finished_reason = FINISH_MATCHED_STR(matched=stop_regex_str)
return |
hi @merrymercy, tried to solve the problem by decoding one token at a time. Thank you for any feedback! |
len(self.sampling_params.stop_strs) > 0 | ||
or len(self.sampling_params.stop_regex_strs) > 0 | ||
): | ||
self.stop_check_text += self.tokenizer.decode(last_token_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You cannot decode text token by token and concatenate the string. This will lead to wrong outputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, I didn't know there are tokenizers where this doesn't work. Is it then ok to decode the entire output each time? Or decode a fixed window and accept that it's not 100% reliable?
move to #2699 |
Motivation
Hi @merrymercy! I am interested in contributing to the SGLang project so I gave this issue a shot: #2007 Is this a sensible approach? I am a very new to the project so any pointers are welcomed.
I will add tests and update the docs once the change looks ok to you. I am actually having troubles getting the project running on my machine so I haven't been able to test it yet.
Checklist