Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stream the output #52

Merged
merged 2 commits into from
Aug 19, 2024
Merged

stream the output #52

merged 2 commits into from
Aug 19, 2024

Conversation

xsuchy
Copy link
Member

@xsuchy xsuchy commented Aug 6, 2024

so it is more interactive

@xsuchy xsuchy marked this pull request as draft August 6, 2024 17:05
@xsuchy
Copy link
Member Author

xsuchy commented Aug 6, 2024

Currently, the logdetective waits a dozen minutes and then prints the result én bloc. I wanted to turn on the streaming, but with streaming on, the models continue answering and never finishes.
I tried everything possible, but it still does not work.
Does anyone have idea how to stop at the right time?

@xsuchy xsuchy marked this pull request as ready for review August 7, 2024 11:03
@xsuchy xsuchy changed the title Draft: stream the output stream the output Aug 7, 2024
@xsuchy
Copy link
Member Author

xsuchy commented Aug 7, 2024

I made the stream default, but added --no-stream option to workaround broken models like llama3. But instead of heuristics when not to keep streaming on, I choose rather to document the broken model in README.

Ready for re-review.

Copy link
Collaborator

@TomasTomecek TomasTomecek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally and works really well!

Although there is a big gap (~30 seconds) after the 'Explanation:' text is printed and before the output starts appearing. We may investigate what's happening (is the model being loaded to memory and initialized?) and print some "progress bar" there.

LGTM, very nice!

README.md Outdated
@@ -55,6 +55,10 @@ Example you want to use a different model:
logdetective https://example.com/logs.txt --model https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct.Q5_K_S.gguf?download=true
logdetective https://example.com/logs.txt --model QuantFactory/Meta-Llama-3-8B-Instruct-GGUF

Note that streaming with some models (notably Meta-Llama-3 is broken) is brokend and can be workarounded by `no-stream` option:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/brokend/broken/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

README.md Outdated Show resolved Hide resolved
xsuchy added 2 commits August 19, 2024 08:32
so it is more interactive
addressing:
logdetective/logdetective.py:12: R0915[too-many-statements]: main: Too many statements (55/50)
@jpodivin jpodivin merged commit 525b456 into fedora-copr:main Aug 19, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants