Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regex stopping condition #2035

Closed

Conversation

jancervenka
Copy link

@jancervenka jancervenka commented Nov 14, 2024

Motivation

Hi @merrymercy! I am interested in contributing to the SGLang project so I gave this issue a shot: #2007 Is this a sensible approach? I am a very new to the project so any pointers are welcomed.

I will add tests and update the docs once the change looks ok to you. I am actually having troubles getting the project running on my machine so I haven't been able to test it yet.

Checklist

  • Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example tutorials.

Copy link
Contributor

@merrymercy merrymercy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. It is better to implement it here
    def check_finished(self):
  2. Can you add a new argument for this stop condition, so we do not make the old simple string match slower? You can add a new field at SamplingParameters

@jancervenka
Copy link
Author

jancervenka commented Nov 15, 2024

thank you @merrymercy. In Req.check_finished, is it ok to ignore tail_str and just check decoded_text? Because we don't know how many tokens to decode to get a regex match.

@merrymercy
Copy link
Contributor

yes

@jancervenka jancervenka marked this pull request as draft November 16, 2024 11:39
@jancervenka
Copy link
Author

jancervenka commented Nov 16, 2024

hi @merrymercy I was testing my change and I don't think using just decoded_text works. When I tried it with the following debug logging, it shows that decoded_text is empty throughout the entire inference.

We could reuse tail_str and instead of overwriting it every time, append each tail_str to a string and regex check the string? And if there is no stop_str_max_len to produce tail_str, we could just decode and append the last token?

        for stop_regex_str in self.sampling_params.stop_regex_strs:
            logger.debug(f"stop_regex='{stop_regex_str}' decoded_text_length={len(self.decoded_text)}")
            if re.search(stop_regex_str, self.decoded_text):
                self.finished_reason = FINISH_MATCHED_STR(matched=stop_regex_str)
                return
Screenshot 2024-11-16 at 20 55 02

@jancervenka jancervenka marked this pull request as ready for review November 18, 2024 18:57
@jancervenka
Copy link
Author

hi @merrymercy, tried to solve the problem by decoding one token at a time. Thank you for any feedback!

len(self.sampling_params.stop_strs) > 0
or len(self.sampling_params.stop_regex_strs) > 0
):
self.stop_check_text += self.tokenizer.decode(last_token_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot decode text token by token and concatenate the string. This will lead to wrong outputs.

Copy link
Author

@jancervenka jancervenka Nov 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, I didn't know there are tokenizers where this doesn't work. Is it then ok to decode the entire output each time? Or decode a fixed window and accept that it's not 100% reliable?

@merrymercy
Copy link
Contributor

move to #2699

@merrymercy merrymercy closed this Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants