-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update setup.py #662
Open
anagnoko23
wants to merge
425
commits into
scaleapi:main
Choose a base branch
from
anagnoko23:patch-1
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Update setup.py #662
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Forward HTTP status code for sync requests * don't return json response for celery forwarding results * fix unit tests * forward for all sync requests
* TRT-LLM WIP * Integrate TensorRT-LLM * fix * revert * formatting * fix * comments
* make test work * add status checking * fix * test * wget fix * final fixes * move namespace
* Found a bug in the codellama vllm model_len logic. Also, let's just avoid the vLLM error by making sure max_num_batched_tokens >= max_model_len * nevermind I realized that if statement will never happen here.
* count prompt tokens, use tokenizer if needed * docstrings * fix tests and code cov * add download files from s3 fn * use same helpers and add docstring * change to namedtuple * add s3 repo locations * fallback read from s3 * refactor tokenizer laod * edit tests * refactor _SUPPORTED_MODELS_BY_FRAMEWORK * updates for tests * move to utils file * move some fns over * use lru cache * move model info * root to opt * add log and adjust integration test * refocus logs * change empty string to optional * mock count tokens for unit tests * change 1 mock * add unit tests * config change * comments pt 1 * move internal logic to plugins file * replace usage of utils file * rearrange test mock * only return prompt tokens count on last token in stream * fix mock * reorganize imports * inject in external interfaces * make changes to tests * fix tests * adjust test * oops test * add more tests
* emit metrics on token counts * remove print
)" (scaleapi#386) This reverts commit 5b6aeff.
* Some updates to integration tests * fix * comment * better env var
* adding zephyr 7b * update tokenizer repo
* update tensor-rt llm in enum * fix to be the same as in the egp and spellbook-backend
* time use case * name * update fake
* update docs to show model len / context windows * make title clearer * make title clearer pt2
* change code-llama to codellama * use both code-llama and codellama temporarily
* fix completions request id
* 4x sqlalchemy pool size * don't update nullpool
need to deduplicate some arguments
* Add hardware spec to client * fix import and update version * fix import and update version
…leapi#635) * Add *.py files to model weights if trust_remote_code is provided * Add to azure * add test * Add additional tests
* Refactor client data types + add vllm arg passthrough * Bump client version * fix dict assignment * add test
…mpatibilty + additional flags to set through API (scaleapi#638)
* quick todo * add prometheus metric for the 1-N scaling part * todo prometheus server addr * values sample * autogen tpl * add in a thing to the model endpoint service to say whether we can autoscale from zero * pass through validation * untested to get concurrency value * fix some tests, add some tests * clean up some things * fix a few bugs * autogen tpl * cleanup * comment dependency * rename * rename
This reverts commit 7cb43cd.
* Revert "Remove model name override (scaleapi#641)" This reverts commit 7cb43cd. * Remove 'Annotated' type usage - python 3.8/pydantic doesn't like it * Add request id on batch completions error
…pi#646) * work on image cache gateway * some service stuff * rename * upd test * autogen tpl and add h100 * quick test * quick test * fix test * black * fix some config misnames * rename * another rename
* up storage limit + test * actually bump it again * bump recHardware also * oops
* Bearer auth for oai compatibility * fix test
* Updates to helm charts to sync with SGP * bump versiong
…caleapi#655) * balloons can take up more than 1 gpu * setting to make balloons only for high priority * values.yaml default * bump helm chart version
…outs (scaleapi#660) * add this new trace dimension * try bumping ddtrace to the newest 1.x.y version * reset reqs to main * again * remove thing that doesn't work (rip) * emit sync call timeout metrics in monitoring metrics gateway * initialize the sync/streaming inference gateways to use the monitoring metrics gateway * Revert "initialize the sync/streaming inference gateways to use the monitoring metrics gateway" Let's just emit in the use case instead This reverts commit 0bf2a54. * wip try emitting from use cases, will probably abandon it * Revert "wip try emitting from use cases, will probably abandon it" This reverts commit 6b599bd. * Revert "Revert "initialize the sync/streaming inference gateways to use the monitoring metrics gateway"" ok let's actually just emit from the sync/streaming gateways This reverts commit 432c0b5. * small refactor * thread the readable endpoint name through everywhere * actually emit the metrics * rename * rename * comment + small type thing
…#661) Pass said parameter to vLLM engine if requested by user
I removed an empty line.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I removed an empty line.
Pull Request Summary
What is this PR changing? Why is this change being made? Any caveats you'd like to highlight? Link any relevant documents, links, or screenshots here if applicable.
Test Plan and Usage Guide
How did you validate that your PR works correctly? How do you run or demo the code? Provide enough detail so a reviewer can reasonably reproduce the testing procedure. Paste example command line invocations if applicable.