regex stopping condition #2035

jancervenka · 2024-11-14T20:32:22Z

Motivation

Hi @merrymercy! I am interested in contributing to the SGLang project so I gave this issue a shot: #2007 Is this a sensible approach? I am a very new to the project so any pointers are welcomed.

I will add tests and update the docs once the change looks ok to you. I am actually having troubles getting the project running on my machine so I haven't been able to test it yet.

Checklist

Format your code according to the Contributor Guide.
Add unit tests as outlined in the Contributor Guide.
Update documentation as needed, including docstrings or example tutorials.

merrymercy

It is better to implement it here

sglang/python/sglang/srt/managers/schedule_batch.py

Line 325 in a10d530

def check_finished(self):
Can you add a new argument for this stop condition, so we do not make the old simple string match slower? You can add a new field at SamplingParameters

sglang/python/sglang/srt/sampling/sampling_params.py

Line 23 in a10d530

class SamplingParams:

jancervenka · 2024-11-15T18:10:58Z

thank you @merrymercy. In Req.check_finished, is it ok to ignore tail_str and just check decoded_text? Because we don't know how many tokens to decode to get a regex match.

merrymercy · 2024-11-16T07:18:41Z

yes

jancervenka · 2024-11-16T19:56:14Z

hi @merrymercy I was testing my change and I don't think using just decoded_text works. When I tried it with the following debug logging, it shows that decoded_text is empty throughout the entire inference.

We could reuse tail_str and instead of overwriting it every time, append each tail_str to a string and regex check the string? And if there is no stop_str_max_len to produce tail_str, we could just decode and append the last token?

        for stop_regex_str in self.sampling_params.stop_regex_strs:
            logger.debug(f"stop_regex='{stop_regex_str}' decoded_text_length={len(self.decoded_text)}")
            if re.search(stop_regex_str, self.decoded_text):
                self.finished_reason = FINISH_MATCHED_STR(matched=stop_regex_str)
                return

…ition

jancervenka · 2024-11-18T18:58:31Z

hi @merrymercy, tried to solve the problem by decoding one token at a time. Thank you for any feedback!

merrymercy · 2024-11-23T05:56:34Z

python/sglang/srt/managers/schedule_batch.py

+            len(self.sampling_params.stop_strs) > 0
+            or len(self.sampling_params.stop_regex_strs) > 0
+        ):
+            self.stop_check_text += self.tokenizer.decode(last_token_id)


You cannot decode text token by token and concatenate the string. This will lead to wrong outputs.

Oh right, I didn't know there are tokenizers where this doesn't work. Is it then ok to decode the entire output each time? Or decode a fixed window and accept that it's not 100% reliable?

merrymercy · 2025-01-02T22:44:58Z

move to #2699

regex stopping condition

41ee906

jancervenka requested review from merrymercy, Ying1123, hnyls2002 and ByronHsu as code owners November 14, 2024 20:32

merrymercy requested changes Nov 15, 2024

View reviewed changes

jancervenka marked this pull request as draft November 16, 2024 11:39

jancervenka and others added 5 commits November 16, 2024 12:42

move stopping to schedule_batch.py

6e49085

test

f8d83b1

formatting

d993913

update

2dc19a8

formatting

958d07b

jancervenka added 4 commits November 18, 2024 18:45

stop_check_text

688f085

fixes

21f6495

Merge remote-tracking branch 'origin/main' into regex-stopping-condition

cdfde40

Merge remote-tracking branch 'upstream/main' into regex-stopping-cond…

433d2bb

…ition

jancervenka marked this pull request as ready for review November 18, 2024 18:57

jancervenka requested review from zhyncs and ispobock as code owners November 18, 2024 18:57

merrymercy requested changes Nov 23, 2024

View reviewed changes

merrymercy force-pushed the main branch from 1ad76cd to 835f8af Compare December 9, 2024 07:31

mickqian mentioned this pull request Jan 2, 2025

[Feature] Support regex as a stopping condition #2699

Open

3 tasks

merrymercy closed this Jan 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

regex stopping condition #2035

regex stopping condition #2035

jancervenka commented Nov 14, 2024 •

edited

Loading

merrymercy left a comment

jancervenka commented Nov 15, 2024 •

edited

Loading

merrymercy commented Nov 16, 2024

jancervenka commented Nov 16, 2024 •

edited

Loading

jancervenka commented Nov 18, 2024

merrymercy Nov 23, 2024

jancervenka Nov 24, 2024 •

edited

Loading

merrymercy commented Jan 2, 2025

regex stopping condition #2035

regex stopping condition #2035

Conversation

jancervenka commented Nov 14, 2024 • edited Loading

Motivation

Checklist

merrymercy left a comment

Choose a reason for hiding this comment

jancervenka commented Nov 15, 2024 • edited Loading

merrymercy commented Nov 16, 2024

jancervenka commented Nov 16, 2024 • edited Loading

jancervenka commented Nov 18, 2024

merrymercy Nov 23, 2024

Choose a reason for hiding this comment

jancervenka Nov 24, 2024 • edited Loading

Choose a reason for hiding this comment

merrymercy commented Jan 2, 2025

jancervenka commented Nov 14, 2024 •

edited

Loading

jancervenka commented Nov 15, 2024 •

edited

Loading

jancervenka commented Nov 16, 2024 •

edited

Loading

jancervenka Nov 24, 2024 •

edited

Loading