Update setup.py #662

anagnoko23 · 2024-11-20T11:35:27Z

I removed an empty line.

Pull Request Summary

What is this PR changing? Why is this change being made? Any caveats you'd like to highlight? Link any relevant documents, links, or screenshots here if applicable.

Test Plan and Usage Guide

How did you validate that your PR works correctly? How do you run or demo the code? Provide enough detail so a reviewer can reasonably reproduce the testing procedure. Paste example command line invocations if applicable.

* Forward HTTP status code for sync requests * don't return json response for celery forwarding results * fix unit tests * forward for all sync requests

* TRT-LLM WIP * Integrate TensorRT-LLM * fix * revert * formatting * fix * comments

* make test work * add status checking * fix * test * wget fix * final fixes * move namespace

* Found a bug in the codellama vllm model_len logic. Also, let's just avoid the vLLM error by making sure max_num_batched_tokens >= max_model_len * nevermind I realized that if statement will never happen here.

* count prompt tokens, use tokenizer if needed * docstrings * fix tests and code cov * add download files from s3 fn * use same helpers and add docstring * change to namedtuple * add s3 repo locations * fallback read from s3 * refactor tokenizer laod * edit tests * refactor _SUPPORTED_MODELS_BY_FRAMEWORK * updates for tests * move to utils file * move some fns over * use lru cache * move model info * root to opt * add log and adjust integration test * refocus logs * change empty string to optional * mock count tokens for unit tests * change 1 mock * add unit tests * config change * comments pt 1 * move internal logic to plugins file * replace usage of utils file * rearrange test mock * only return prompt tokens count on last token in stream * fix mock * reorganize imports * inject in external interfaces * make changes to tests * fix tests * adjust test * oops test * add more tests

* emit metrics on token counts * remove print

)" (scaleapi#386) This reverts commit 5b6aeff.

* Some updates to integration tests * fix * comment * better env var

* adding zephyr 7b * update tokenizer repo

* update tensor-rt llm in enum * fix to be the same as in the egp and spellbook-backend

* time use case * name * update fake

* update docs to show model len / context windows * make title clearer * make title clearer pt2

* change code-llama to codellama * use both code-llama and codellama temporarily

* fix completions request id

* 4x sqlalchemy pool size * don't update nullpool

need to deduplicate some arguments

* Add hardware spec to client * fix import and update version * fix import and update version

…leapi#635) * Add *.py files to model weights if trust_remote_code is provided * Add to azure * add test * Add additional tests

* Refactor client data types + add vllm arg passthrough * Bump client version * fix dict assignment * add test

…mpatibilty + additional flags to set through API (scaleapi#638)

* quick todo * add prometheus metric for the 1-N scaling part * todo prometheus server addr * values sample * autogen tpl * add in a thing to the model endpoint service to say whether we can autoscale from zero * pass through validation * untested to get concurrency value * fix some tests, add some tests * clean up some things * fix a few bugs * autogen tpl * cleanup * comment dependency * rename * rename

This reverts commit 7cb43cd.

* Revert "Remove model name override (scaleapi#641)" This reverts commit 7cb43cd. * Remove 'Annotated' type usage - python 3.8/pydantic doesn't like it * Add request id on batch completions error

…pi#646) * work on image cache gateway * some service stuff * rename * upd test * autogen tpl and add h100 * quick test * quick test * fix test * black * fix some config misnames * rename * another rename

* up storage limit + test * actually bump it again * bump recHardware also * oops

* Bearer auth for oai compatibility * fix test

* Updates to helm charts to sync with SGP * bump versiong

…caleapi#655) * balloons can take up more than 1 gpu * setting to make balloons only for high priority * values.yaml default * bump helm chart version

…outs (scaleapi#660) * add this new trace dimension * try bumping ddtrace to the newest 1.x.y version * reset reqs to main * again * remove thing that doesn't work (rip) * emit sync call timeout metrics in monitoring metrics gateway * initialize the sync/streaming inference gateways to use the monitoring metrics gateway * Revert "initialize the sync/streaming inference gateways to use the monitoring metrics gateway" Let's just emit in the use case instead This reverts commit 0bf2a54. * wip try emitting from use cases, will probably abandon it * Revert "wip try emitting from use cases, will probably abandon it" This reverts commit 6b599bd. * Revert "Revert "initialize the sync/streaming inference gateways to use the monitoring metrics gateway"" ok let's actually just emit from the sync/streaming gateways This reverts commit 432c0b5. * small refactor * thread the readable endpoint name through everywhere * actually emit the metrics * rename * rename * comment + small type thing

…#661) Pass said parameter to vLLM engine if requested by user

I removed an empty line.

song-william and others added 30 commits November 14, 2023 15:32

use modelEngine fullname (scaleapi#374)

0e47fc8

Forward HTTP status code for sync requests (scaleapi#375)

b319397

* Forward HTTP status code for sync requests * don't return json response for celery forwarding results * fix unit tests * forward for all sync requests

Integrate TensorRT-LLM (scaleapi#358)

4e2ea6c

* TRT-LLM WIP * Integrate TensorRT-LLM * fix * revert * formatting * fix * comments

Fine-tuning e2e integration test (scaleapi#372)

5e4d662

* make test work * add status checking * fix * test * wget fix * final fixes * move namespace

Found a bug in the codellama vllm model_len logic. (scaleapi#380)

5b6aeff

* Found a bug in the codellama vllm model_len logic. Also, let's just avoid the vLLM error by making sure max_num_batched_tokens >= max_model_len * nevermind I realized that if statement will never happen here.

Fix sample.yaml (scaleapi#381)

043f83a

Fix integration test (scaleapi#383)

2221de0

emit metrics on token counts (scaleapi#382)

df3738c

* emit metrics on token counts * remove print

Increase llama-2 max_input_tokens (scaleapi#384)

df26b0a

Revert "Found a bug in the codellama vllm model_len logic. (scaleapi#380

d478ee5

)" (scaleapi#386) This reverts commit 5b6aeff.

Some updates to integration tests (scaleapi#385)

e71326d

* Some updates to integration tests * fix * comment * better env var

Celery autoscaler (scaleapi#378)

4888ecf

Don't install Celery autoscaler for test deployments (scaleapi#388)

4d72b23

LLM update API route (scaleapi#387)

3c0f168

adding zephyr 7b (scaleapi#389)

37814ee

* adding zephyr 7b * update tokenizer repo

update tensor-rt llm in enum (scaleapi#390)

4483dff

* update tensor-rt llm in enum * fix to be the same as in the egp and spellbook-backend

pypi version bump (scaleapi#391)

de7a493

Change middleware format (scaleapi#393)

cccbd3e

Fix custom framework Dockerfile (scaleapi#395)

1ee6fbe

fixing enum value (scaleapi#396)

3adbd59

overriding model length for zephyr 7b alpha (scaleapi#398)

8501db0

time completions use case (scaleapi#397)

9944483

* time use case * name * update fake

update docs to show model len / context windows (scaleapi#401)

8f657c7

* update docs to show model len / context windows * make title clearer * make title clearer pt2

Add MultiprocessingConcurrencyLimiter to gateway (scaleapi#399)

69e07ff

change code-llama to codellama (scaleapi#400)

b349a0d

* change code-llama to codellama * use both code-llama and codellama temporarily

fix completions request id (scaleapi#402)

cefef80

* fix completions request id

Allow latest inference framework tag (scaleapi#403)

04a5908

Bump helm chart version 0.1.0 to 0.1.1 (scaleapi#406)

c9ceab9

4x sqlalchemy pool size (scaleapi#405)

5ec6ada

* 4x sqlalchemy pool size * don't update nullpool

seanshi-scale and others added 29 commits October 10, 2024 12:54

add rec hardware to the configmap yaml (scaleapi#631)

2f62171

fix bug in batch completions v2 (scaleapi#633)

258862a

need to deduplicate some arguments

Add hardware spec to client (scaleapi#632)

5a69175

* Add hardware spec to client * fix import and update version * fix import and update version

Add *.py files to model weights if trust_remote_code is provided (sca…

89b9ddd

…leapi#635) * Add *.py files to model weights if trust_remote_code is provided * Add to azure * add test * Add additional tests

vllm 0.6.3 (scaleapi#636)

74a40e7

Refactor client data types + add vllm arg passthrough (scaleapi#637)

4adc3f2

* Refactor client data types + add vllm arg passthrough * Bump client version * fix dict assignment * add test

Update oai spec to remove strict flag default to workaround vllm inco…

ff971ea

…mpatibilty + additional flags to set through API (scaleapi#638)

Bump commit in integration tests (scaleapi#640)

9c07cad

Add served_model_name (scaleapi#639)

841b4d4

Remove model name override (scaleapi#641)

7cb43cd

Add 1b 3b to model zoo (scaleapi#642)

33ce5ab

Fix guided decoding logit setup (scaleapi#643)

3ce747f

Revert "Remove model name override (scaleapi#641)" (scaleapi#644)

18457ab

This reverts commit 7cb43cd.

Miscellaneous improvments (scaleapi#645)

1d855ca

* Revert "Remove model name override (scaleapi#641)" This reverts commit 7cb43cd. * Remove 'Annotated' type usage - python 3.8/pydantic doesn't like it * Add request id on batch completions error

Fix up the image caching functionality so it works with h100s (scalea…

36e088f

…pi#646) * work on image cache gateway * some service stuff * rename * upd test * autogen tpl and add h100 * quick test * quick test * fix test * black * fix some config misnames * rename * another rename

increase storage limit for h100s (scaleapi#648)

8f9a672

* up storage limit + test * actually bump it again * bump recHardware also * oops

Bearer auth for oai compatibility (scaleapi#649)

9233b9a

* Bearer auth for oai compatibility * fix test

Updates to helm charts to sync with SGP (scaleapi#651)

785e0fa

* Updates to helm charts to sync with SGP * bump versiong

Add script to stamp initial schema (scaleapi#653)

05f2ecc

Remove ENV requirement for db migration (scaleapi#654)

0024b0c

Remove restricte model name check (scaleapi#656)

84f31a8

Safe handle model param (scaleapi#657)

cb699e8

More vllm args passthrough (scaleapi#658)

c2692c4

Changes to balloons to support a less "on-demand" style of compute (s…

b6eac17

…caleapi#655) * balloons can take up more than 1 gpu * setting to make balloons only for high priority * values.yaml default * bump helm chart version

More vllm args passthrough (scaleapi#659)

3609e08

Add max_model_len as Optional Argument for Model.create API (scaleapi…

bd77a0a

…#661) Pass said parameter to vLLM engine if requested by user

Update setup.py

36b8240

I removed an empty line.

AaDalal force-pushed the main branch from ceea920 to f3e466c Compare January 21, 2025 21:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update setup.py #662

Update setup.py #662

anagnoko23 commented Nov 20, 2024

Update setup.py #662

Are you sure you want to change the base?

Update setup.py #662

Conversation

anagnoko23 commented Nov 20, 2024

Pull Request Summary

Test Plan and Usage Guide