Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update setup.py #662

Open
wants to merge 425 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
425 commits
Select commit Hold shift + click to select a range
0e47fc8
use modelEngine fullname (#374)
song-william Nov 14, 2023
b319397
Forward HTTP status code for sync requests (#375)
yunfeng-scale Nov 15, 2023
4e2ea6c
Integrate TensorRT-LLM (#358)
yunfeng-scale Nov 15, 2023
5e4d662
Fine-tuning e2e integration test (#372)
tiffzhao5 Nov 15, 2023
5b6aeff
Found a bug in the codellama vllm model_len logic. (#380)
sam-scale Nov 15, 2023
043f83a
Fix sample.yaml (#381)
yunfeng-scale Nov 15, 2023
257ea6c
count prompt tokens (#366)
saiatmakuri Nov 15, 2023
2221de0
Fix integration test (#383)
yunfeng-scale Nov 16, 2023
df3738c
emit metrics on token counts (#382)
saiatmakuri Nov 16, 2023
df26b0a
Increase llama-2 max_input_tokens (#384)
sam-scale Nov 16, 2023
d478ee5
Revert "Found a bug in the codellama vllm model_len logic. (#380)" (#…
yunfeng-scale Nov 17, 2023
e71326d
Some updates to integration tests (#385)
yunfeng-scale Nov 17, 2023
4888ecf
Celery autoscaler (#378)
squeakymouse Nov 17, 2023
4d72b23
Don't install Celery autoscaler for test deployments (#388)
squeakymouse Nov 21, 2023
3c0f168
LLM update API route (#387)
squeakymouse Nov 27, 2023
37814ee
adding zephyr 7b (#389)
ian-scale Nov 27, 2023
4483dff
update tensor-rt llm in enum (#390)
ian-scale Nov 27, 2023
de7a493
pypi version bump (#391)
ian-scale Nov 27, 2023
cccbd3e
Change middleware format (#393)
squeakymouse Nov 29, 2023
1ee6fbe
Fix custom framework Dockerfile (#395)
squeakymouse Nov 29, 2023
3adbd59
fixing enum value (#396)
ian-scale Nov 29, 2023
8501db0
overriding model length for zephyr 7b alpha (#398)
ian-scale Dec 1, 2023
9944483
time completions use case (#397)
saiatmakuri Dec 1, 2023
8f657c7
update docs to show model len / context windows (#401)
ian-scale Dec 6, 2023
69e07ff
Add MultiprocessingConcurrencyLimiter to gateway (#399)
squeakymouse Dec 7, 2023
b349a0d
change code-llama to codellama (#400)
ian-scale Dec 7, 2023
cefef80
fix completions request id (#402)
saiatmakuri Dec 8, 2023
04a5908
Allow latest inference framework tag (#403)
squeakymouse Dec 8, 2023
c9ceab9
Bump helm chart version 0.1.0 to 0.1.1 (#406)
seanshi-scale Dec 11, 2023
5ec6ada
4x sqlalchemy pool size (#405)
yunfeng-scale Dec 11, 2023
353e246
bump datadog module to 0.47.0 for ipv6 support for dogstatsd (#407)
saiatmakuri Dec 12, 2023
74cc915
Fix autoscaler node selector (#409)
seanshi-scale Dec 12, 2023
5b64972
Log request sizes (#410)
yunfeng-scale Dec 13, 2023
474155e
add support for mixtral-8x7b and mixtral-8x7b-instruct (#408)
saiatmakuri Dec 14, 2023
d915f5b
Make sure metadata is not incorrectly wiped during endpoint update (#…
yunfeng-scale Dec 22, 2023
6bbcb6c
Always return output for completions sync response (#412)
yunfeng-scale Dec 22, 2023
d0061f2
handle update endpoint errors (#414)
saiatmakuri Dec 28, 2023
1f9c461
[bug-fix] LLM Artifact Gateway .list_files() (#416)
saiatmakuri Jan 3, 2024
756682f
enable sensitive log mode (#415)
song-william Jan 4, 2024
13fa6eb
Throughput benchmark script (#411)
yunfeng-scale Jan 8, 2024
e8bb27c
Upgrade vllm to 0.2.7 (#417)
yunfeng-scale Jan 10, 2024
a5bfdb7
LLM batch completions API (#418)
yunfeng-scale Jan 17, 2024
db11cd7
Small update to vllm batch (#419)
yunfeng-scale Jan 17, 2024
53a1918
sensitive content flag (#421)
yunfeng-scale Jan 19, 2024
fc9a503
Revert a broken refactoring (#423)
yunfeng-scale Jan 22, 2024
112513c
[Logging I/O] Post inference hooks as background tasks (#422)
tiffzhao5 Jan 23, 2024
d130660
Batch inference client / doc (#424)
yunfeng-scale Jan 26, 2024
a9843a1
Minor fixes for batch inference (#426)
yunfeng-scale Jan 26, 2024
1213b4c
LLM benchmark script improvements (#427)
seanshi-scale Jan 31, 2024
8d8774c
Allow using pydantic v2 (#429)
seanshi-scale Feb 3, 2024
a2a6563
Fix helm chart nodeSelector for GPU endpoints (#430)
squeakymouse Feb 6, 2024
ea38f1e
Allow pydantic 2 in python client requested requirements (#433)
seanshi-scale Feb 7, 2024
7028575
Fix permissions (#431)
yunfeng-scale Feb 7, 2024
e07fc7a
[Client] Add Auth headers to the python async routes (#434)
seanshi-scale Feb 7, 2024
847317e
pin boto3 and urllib3 version (#432)
edgan8 Feb 8, 2024
5bff345
include stop string in output (#435)
saiatmakuri Feb 13, 2024
c427d0b
Logging post inference hook implementation (#428)
tiffzhao5 Feb 15, 2024
0541e49
add codellama-70b models (#436)
saiatmakuri Feb 16, 2024
da86a9d
Add hook validation and support logging for python client (#437)
tiffzhao5 Feb 16, 2024
4d0cd26
Azure refactor for async endpoints (#425)
squeakymouse Feb 20, 2024
d88511b
remove handling (#438)
tiffzhao5 Feb 20, 2024
b4e7a5c
Clean up logs for logging hook (#439)
tiffzhao5 Feb 21, 2024
9a892cf
Fix Infra Task Gateway (#443)
saiatmakuri Feb 22, 2024
a636421
support gemma models (#444)
saiatmakuri Feb 22, 2024
31c7c5a
Fix infra config dependency (#449)
squeakymouse Feb 22, 2024
b3a0036
Add emitted timestamp for logging (#450)
tiffzhao5 Feb 23, 2024
c4db5e4
change cache update time (#451)
tiffzhao5 Feb 23, 2024
b4aef83
Bump aiohttp from 3.9.1 to 3.9.2 in /model-engine (#446)
dependabot[bot] Feb 23, 2024
dc03fd4
Bump python-multipart from 0.0.6 to 0.0.7 in /model-engine (#447)
dependabot[bot] Feb 23, 2024
be330c2
Bump gitpython from 3.1.32 to 3.1.41 in /model-engine (#453)
dependabot[bot] Feb 23, 2024
37d38d4
Log endpoint in sensitive_log_mode (#455)
squeakymouse Feb 26, 2024
06bc25e
Bump orjson from 3.8.6 to 3.9.15 in /model-engine (#456)
dependabot[bot] Feb 27, 2024
9a4e2e5
Allow the load test script to use a csv of inputs (#440)
seanshi-scale Feb 27, 2024
38c59e2
add some debugging to vllm docker (#454)
yunfeng-scale Feb 27, 2024
468bcbe
Add product label validation (#442)
edgan8 Feb 27, 2024
f9a3ff5
Add log statement for gateway sending async task (#459)
tiffzhao5 Feb 28, 2024
39ef7c4
Some batch inference improvements (#460)
yunfeng-scale Mar 2, 2024
036b1a9
Fix cacher (#462)
yunfeng-scale Mar 6, 2024
575eaa6
Fix vllm batch docker image (#463)
yunfeng-scale Mar 7, 2024
0528b52
Add tool completion to batch inference (#461)
yunfeng-scale Mar 7, 2024
659d08d
fix llm-engine finetune.create failures (#464)
ian-scale Mar 8, 2024
bfcfbba
Change back batch infer GPU util and add tool completion client chang…
yunfeng-scale Mar 8, 2024
4b012f0
Try to fix async requests getting stuck (#466)
squeakymouse Mar 11, 2024
b09c106
[Client] Add num_prompt_tokens to the client's CompletionOutputs (#467)
seanshi-scale Mar 12, 2024
80a2d3e
Tool completion respect num new tokens (#469)
yunfeng-scale Mar 13, 2024
24314f5
Azure fixes + additional asks (#468)
squeakymouse Mar 15, 2024
1d33b27
Metrics for stuck async requests (#471)
squeakymouse Mar 15, 2024
98e1f43
Fix cacher (#472)
yunfeng-scale Mar 15, 2024
6db2d48
Add retries to deflake integration tests (#473)
squeakymouse Mar 19, 2024
9904091
add suffix to integration tests (#474)
saiatmakuri Mar 19, 2024
2e5eec2
fix docs tests gateway endpoint (#475)
saiatmakuri Mar 20, 2024
5f6cd32
Guided decoding (#476)
yunfeng-scale Mar 21, 2024
b785d25
Add emitting token count metrics to datadog statsd (#458)
seanshi-scale Mar 27, 2024
bdf4a25
Downgrade sse-starlette version (#478)
squeakymouse Mar 28, 2024
5524f80
Return 400 for botocore client errors (#479)
yunfeng-scale Apr 1, 2024
f187c00
Increase Kaniko Memory (#481)
saiatmakuri Apr 2, 2024
3d9ea75
Batch job metrics (#480)
yunfeng-scale Apr 2, 2024
e924ffa
Use base model name as metric tag (#483)
yunfeng-scale Apr 3, 2024
2b4466b
Change LLM Engine base path from global var (#482)
squeakymouse Apr 4, 2024
077c5a5
Remove fine-tune limit for internal users (#484)
squeakymouse Apr 4, 2024
c46162a
Parallel Python execution for tool completion (#470)
yunfeng-scale Apr 5, 2024
8523141
Allow JSONL for fine-tuning datasets
squeakymouse Apr 9, 2024
38d94de
Fix throughput_benchmarks ITL calculation, add option to use a json f…
seanshi-scale Apr 10, 2024
3c7d40b
Add Model.update() to Python client (#490)
squeakymouse Apr 11, 2024
740c12a
Bump idna from 3.4 to 3.7 in /clients/python (#491)
dependabot[bot] Apr 12, 2024
795d624
Bump idna from 3.4 to 3.7 in /model-engine (#492)
dependabot[bot] Apr 12, 2024
ee3a367
Properly add mixtral 8x22b (#493)
yunfeng-scale Apr 16, 2024
040622a
support mixtral 8x22b instruct (#495)
saiatmakuri Apr 17, 2024
10d84ca
fix return_token_log_probs on vLLM > 0.3.3 endpoints (#498)
saiatmakuri Apr 23, 2024
9673b3f
Package update + more docs on dev setup (#500)
dmchoiboi Apr 24, 2024
edecf56
Add Llama 3 models (#501)
yunfeng-scale Apr 24, 2024
0079f7e
Enforce model checkpoints existing for endpoint/bundle creation (#503)
dmchoiboi Apr 26, 2024
866bcd1
guided decoding with grammar (#488)
saiatmakuri Apr 29, 2024
9d0e433
adding asyncenginedead error catch (#504)
ian-scale Apr 30, 2024
6f8870c
Default include_stop_str_in_output to None (#506)
squeakymouse May 2, 2024
a2bf698
get latest inference framework tag from configmap (#505)
saiatmakuri May 3, 2024
70d0e77
integration tests for completions (#507)
saiatmakuri May 3, 2024
13da4c1
patch service config identifier (#509)
saiatmakuri May 4, 2024
a87e5aa
require safetensors (#510)
saiatmakuri May 6, 2024
e1da243
Add py.typed for proper typechecking support on clients (#513)
dmchoiboi May 7, 2024
1106435
Fix package name mapping (#514)
dmchoiboi May 7, 2024
c019a6a
Necessary Changes for long context llama-3-8b (#516)
sam-scale May 14, 2024
fbe7417
Increase max gpu utilization for 70b models (#517)
dmchoiboi May 15, 2024
ba68b8d
Infer hardware from model name (#515)
yunfeng-scale May 15, 2024
1470aac
Improve TensorRT-LLM Functionality (#487)
seanshi-scale May 15, 2024
80e5276
Upgrade vLLM version for batch completion (#518)
dmchoiboi May 15, 2024
a36f7a2
Revert "Upgrade vLLM version for batch completion (#518)" (#520)
dmchoiboi May 16, 2024
110833b
Allow H100 to be used (#522)
yunfeng-scale May 17, 2024
e207936
vLLM version 0.4.2 Docker image (#521)
squeakymouse May 20, 2024
2f71b89
Image cache and balloon on H100s, also temporarily stop people from u…
yunfeng-scale May 20, 2024
8993b18
Hardcode llama 3 70b endpoint param (#524)
yunfeng-scale May 21, 2024
028d415
Don't fail checking GPU memory (#525)
yunfeng-scale May 22, 2024
275f495
Option to read Redis URL from AWS Secret (#526)
seanshi-scale May 28, 2024
8a4c745
Fix formatting on completions documentation guide (#527)
saiatmakuri May 28, 2024
5bb8797
Higher priority for gateway (#529)
yunfeng-scale Jun 3, 2024
bd192cb
Non-interactive installation during docker build (#533)
yunfeng-scale Jun 4, 2024
ad24f65
[Client] Add guided_grammar and other missing fields (#532)
seanshi-scale Jun 4, 2024
f84adbb
Make balloon creation flexible (#531)
yunfeng-scale Jun 6, 2024
6447c5f
Bump kv cache min memory for batch jobs (#536)
dmchoiboi Jun 10, 2024
4c6b176
DEBUG: Add additional logging for authz errors (#539)
dmchoiboi Jun 14, 2024
69163b2
Add debug log for authz errors (#540)
dmchoiboi Jun 15, 2024
dfb7b15
Mitigation for AsyncEngineDeadError (#545)
dmchoiboi Jun 20, 2024
f0fee2a
Infer hardware specs from config (#543)
yunfeng-scale Jun 20, 2024
2756aed
Add special token param to completions + batch completions apis (#544)
seanshi-scale Jun 21, 2024
51b38a9
Fix integration test (#546)
dmchoiboi Jun 22, 2024
d8b5efe
Bump vllm to v0.5.0.post1 (#547)
dmchoiboi Jun 24, 2024
4471d19
Fix integration tests for streaming case (#548)
dmchoiboi Jun 24, 2024
f92830b
Update vllm batch job to work with vllm > 0.5.0 (#550)
dmchoiboi Jun 26, 2024
c1b521d
Modify v1 completions_stream logic to raise most exceptions before as…
anant-marur Jun 26, 2024
20c15af
Increase default concurrency to 100 for http forwarder (#552)
seanshi-scale Jul 3, 2024
8860ee3
Use circleci AWS IAM role (#553)
yunfeng-scale Jul 3, 2024
1f474ba
Allow hardware infer from client (#555)
yunfeng-scale Jul 5, 2024
137f88d
Fix AWS IAM role access (#556)
yunfeng-scale Jul 5, 2024
d5d9193
More rigorous endpoint update handling (#558)
dmchoiboi Jul 8, 2024
0bacaa5
Update vllm server to be openai compatible (#560)
dmchoiboi Jul 9, 2024
72a2b5a
Fix healthcheck_route and predict_route for async endpoints (#554)
squeakymouse Jul 10, 2024
3ff1196
Fix some oddities in the client (#562)
seanshi-scale Jul 10, 2024
c0cea60
Bump pydantic to 2.8.2 (#561)
dmchoiboi Jul 11, 2024
b5e4daf
fix: Use env AWS_REGION in sqs_client or default to us-west-2 (#563)
nicolastomeo Jul 11, 2024
8acb52f
Add support for phi 3 models (#564)
dmchoiboi Jul 16, 2024
6132a3e
Parse wrapped sync endpoint error (#566)
yunfeng-scale Jul 16, 2024
8baaefb
Allow request deserialization using alias (#567)
dmchoiboi Jul 16, 2024
b3d9200
Add earliest log (#568)
yunfeng-scale Jul 16, 2024
7670d7b
Log info instead of debug (#569)
yunfeng-scale Jul 17, 2024
adc6c37
Disable data parallelism for batch completions (#570)
dmchoiboi Jul 22, 2024
2558f7d
bump vllm batch version (#571)
dmchoiboi Jul 22, 2024
758f7bb
Add deepseek models (#572)
dmchoiboi Jul 22, 2024
04e5818
Reduce hardware requirement for deepseek coder lite (#573)
dmchoiboi Jul 22, 2024
7d4ac86
Bump vllm version to 0.5.3post1 (#576)
seanshi-scale Jul 23, 2024
9818676
Azure compatibility work for LLM engine (#551)
squeakymouse Jul 23, 2024
afbb98a
Add Llama 3.1 models (#577)
seanshi-scale Jul 24, 2024
42f1de1
Shared pydantic configs (#578)
dmchoiboi Jul 26, 2024
87d816e
Add autogenerated openai spec (#579)
dmchoiboi Jul 26, 2024
dcc5ff8
Bump istio proxy memory for gateway (#580)
yunfeng-scale Jul 29, 2024
6e35c71
Make configs backwards-compatible (#581)
squeakymouse Jul 29, 2024
d033638
Reduce connection pool size (#582)
yunfeng-scale Jul 29, 2024
353c472
Up storage limit (#575)
dmchoiboi Jul 30, 2024
e4f0854
Use session for sts boto3 client for logging hook (#583)
tiffzhao5 Jul 30, 2024
a6e2eda
Add env label (#584)
yunfeng-scale Aug 1, 2024
3174f50
Various Db configuration improvements (#585)
dmchoiboi Aug 1, 2024
44bbba1
Enable passing in headers through the client (#586)
dmchoiboi Aug 2, 2024
0d39f29
Re-add auth header (#588)
dmchoiboi Aug 2, 2024
7e7f3bf
Make storage required for endpoint creation requests (#587)
squeakymouse Aug 5, 2024
554d30d
More Batch Inference Options (#590)
seanshi-scale Aug 10, 2024
5c815e3
Allow support for vllm batch with checkpoints (#591)
dmchoiboi Aug 12, 2024
37a0bd9
MLI-2510 Validate json logs to test hypothesis on no records in Snowf…
tiffzhao5 Aug 12, 2024
3d9a770
[Batch Completions V2] DTO models + Batch completions service (#593)
dmchoiboi Aug 13, 2024
b58cf41
Add Qwen2 72b instruct (#594)
yunfeng-scale Aug 14, 2024
065fb9d
Get Gemma2 working (#595)
dmchoiboi Aug 15, 2024
6de5b7c
Allow setting max context length for batch jobs (#598)
dmchoiboi Aug 15, 2024
b84018f
Fix dto for batch completion (#599)
dmchoiboi Aug 15, 2024
092d9f4
Update client with new max_context_length (#600)
dmchoiboi Aug 15, 2024
439e001
Batch completions V2 job (#602)
dmchoiboi Aug 21, 2024
1309815
Some cleanups (#604)
dmchoiboi Aug 21, 2024
9684586
More batch job cleanup (#605)
dmchoiboi Aug 22, 2024
47eefb1
Relax pydantic constraint for client (#606)
dmchoiboi Aug 22, 2024
49af089
Fix list initialization (#607)
dmchoiboi Aug 26, 2024
f425d1f
Docs for qwen2 72b instruct (#601)
yunfeng-scale Aug 29, 2024
cff524c
MLI-2847 Replace instead of patch PDB (#603)
yunfeng-scale Aug 29, 2024
0600c10
Use maxUnavailale for endpoint PDB (#596)
yunfeng-scale Aug 29, 2024
370b111
Http Forwarder updates (#608)
dmchoiboi Sep 5, 2024
62ebf4d
Introduce alembic to repo (#610)
dmchoiboi Sep 9, 2024
6bbacf0
Chat completion API (#609)
dmchoiboi Sep 11, 2024
624b91e
Fix passing in vllm args options (#611)
dmchoiboi Sep 12, 2024
65bbb63
Option to skip AWS profile set (#613)
seanshi-scale Sep 17, 2024
86e3589
MLI-2949 Upgrade vllm to 0.6.1.post2 (#614)
yunfeng-scale Sep 18, 2024
80fa44d
add skipping aws profile code to v2 batch (#615)
seanshi-scale Sep 18, 2024
2c389ff
Enable users to force redeploy endpoints (#617)
dmchoiboi Sep 23, 2024
1f03d44
Remove print statement (#618)
dmchoiboi Sep 23, 2024
01c9387
Fix batch compeltion v2 for oai completion (#621)
dmchoiboi Sep 24, 2024
ba06540
Multinode bundle db migration + orm class + entity (#620)
seanshi-scale Sep 25, 2024
72fd1b8
Make Redis endpoint cache read service identifier (#622)
seanshi-scale Sep 26, 2024
c1fc1c6
set default storage request/limit for batch jobs (#624)
dmchoiboi Sep 27, 2024
41639da
Bump server python version to 3.10 (#623)
dmchoiboi Sep 27, 2024
1e35c17
Upgrade vLLM to 0.6.2 (#626)
dmchoiboi Sep 30, 2024
515ab65
Update docs to sunset free demo (#625)
yixu34 Oct 3, 2024
2061eff
Add OpenAI compatible v2 completion (#627)
dmchoiboi Oct 4, 2024
1b8ee43
Add completion routes to main router (#628)
dmchoiboi Oct 4, 2024
9c5579f
Add chat template override to client (#629)
dmchoiboi Oct 8, 2024
8830f89
Multinode serving (#574)
seanshi-scale Oct 9, 2024
8dc74c8
Enable more vllm args to be passed through for batch completions (#630)
dmchoiboi Oct 10, 2024
2f62171
add rec hardware to the configmap yaml (#631)
seanshi-scale Oct 10, 2024
258862a
fix bug in batch completions v2 (#633)
seanshi-scale Oct 11, 2024
5a69175
Add hardware spec to client (#632)
dmchoiboi Oct 11, 2024
89b9ddd
Add *.py files to model weights if trust_remote_code is provided (#635)
dmchoiboi Oct 14, 2024
74a40e7
vllm 0.6.3 (#636)
dmchoiboi Oct 14, 2024
4adc3f2
Refactor client data types + add vllm arg passthrough (#637)
dmchoiboi Oct 15, 2024
ff971ea
Update oai spec to remove strict flag default to workaround vllm inco…
dmchoiboi Oct 16, 2024
c6f87b8
0-N scaling for sync/streaming endpoints (#634)
seanshi-scale Oct 16, 2024
9c07cad
Bump commit in integration tests (#640)
seanshi-scale Oct 16, 2024
841b4d4
Add served_model_name (#639)
dmchoiboi Oct 16, 2024
7cb43cd
Remove model name override (#641)
dmchoiboi Oct 16, 2024
33ce5ab
Add 1b 3b to model zoo (#642)
dmchoiboi Oct 17, 2024
3ce747f
Fix guided decoding logit setup (#643)
dmchoiboi Oct 17, 2024
18457ab
Revert "Remove model name override (#641)" (#644)
dmchoiboi Oct 17, 2024
1d855ca
Miscellaneous improvments (#645)
dmchoiboi Oct 17, 2024
36e088f
Fix up the image caching functionality so it works with h100s (#646)
seanshi-scale Oct 23, 2024
8f9a672
increase storage limit for h100s (#648)
seanshi-scale Oct 23, 2024
9233b9a
Bearer auth for oai compatibility (#649)
dmchoiboi Oct 24, 2024
785e0fa
Updates to helm charts to sync with SGP (#651)
dmchoiboi Oct 30, 2024
05f2ecc
Add script to stamp initial schema (#653)
dmchoiboi Oct 30, 2024
0024b0c
Remove ENV requirement for db migration (#654)
dmchoiboi Oct 31, 2024
84f31a8
Remove restricte model name check (#656)
dmchoiboi Nov 4, 2024
cb699e8
Safe handle model param (#657)
dmchoiboi Nov 4, 2024
c2692c4
More vllm args passthrough (#658)
dmchoiboi Nov 5, 2024
b6eac17
Changes to balloons to support a less "on-demand" style of compute (#…
seanshi-scale Nov 5, 2024
3609e08
More vllm args passthrough (#659)
dmchoiboi Nov 8, 2024
f2be2a9
emit model name in dd traces, also emit error dd metrics on http time…
seanshi-scale Nov 15, 2024
bd77a0a
Add max_model_len as Optional Argument for Model.create API (#661)
sandeshghanta Nov 19, 2024
36b8240
Update setup.py
anagnoko23 Nov 20, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 2 additions & 0 deletions .black.toml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ exclude = '''
| buck-out
| build
| dist
| alembic
| gen
)/
)
'''
162 changes: 130 additions & 32 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
version: 2.1
orbs:
python: circleci/[email protected]
aws-cli: circleci/[email protected]

workflows:
ci:
Expand All @@ -10,11 +11,16 @@ workflows:
- integration_tests
- build_image
- build_docs
- deploy_docs:
filters:
branches:
only:
- main

jobs:
run_unit_tests_python_client:
docker:
- image: python:3.8-bookworm
- image: python:3.10-bookworm
resource_class: small
parallelism: 1
steps:
Expand All @@ -28,7 +34,7 @@ jobs:
- run_unit_tests_python_client
run_unit_tests_server:
docker:
- image: python:3.8-bookworm
- image: python:3.10-bookworm
environment:
ML_INFRA_DATABASE_URL: postgresql://postgres@localhost/circle_test
- image: circleci/postgres:12.9-postgis-ram
Expand All @@ -48,7 +54,7 @@ jobs:
- run_unit_tests_server
build_docs:
docker:
- image: python:3.8-bookworm
- image: python:3.10-bookworm
resource_class: small
parallelism: 1
steps:
Expand All @@ -62,41 +68,131 @@ jobs:
name: Build Docs
command: |
mkdocs build --strict
deploy_docs:
docker:
- image: python:3.10-bookworm
resource_class: small
parallelism: 1
steps:
- add_ssh_keys: # gives write access to CircleCI worker
fingerprints:
- "76:0c:1b:9e:e3:6a:c3:5c:6f:24:91:ef:7c:54:d2:7a"
- checkout # checkout source code to working directory
- environment_setup
- install_client
- python/install-packages:
pkg-manager: pip
pip-dependency-file: requirements-docs.txt
- run:
name: Deploy Docs
command: |
mkdocs gh-deploy
build_image:
executor: ubuntu-large
steps:
- checkout
- run:
name: Build Docker Image
command: |
docker build . -f server/Dockerfile -t llm-engine:$CIRCLE_SHA1
docker build . -f model-engine/Dockerfile -t model-engine:$CIRCLE_SHA1
integration_tests:
executor: ubuntu-large
steps:
- checkout
- aws-cli/setup:
role-arn: ${CIRCLECI_ROLE_ARN}
aws-region: AWS_REGION
- run:
name: Build Docker Image
command: |
docker build . -f model-engine/Dockerfile -t model-engine:$CIRCLE_SHA1
- run:
name: Install minikube
command: |
cd $HOME
curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube_latest_amd64.deb
sudo dpkg -i minikube_latest_amd64.deb
minikube start --vm-driver=docker --kubernetes-version=v1.23.0 --memory=14336 --cpus=8
minikube start --vm-driver=docker --kubernetes-version=v1.23.0 --memory=49152 --cpus=14
- run:
name: Install helm
name: Install kubectl, helm
command: |
cd $HOME
cd $HOME/bin
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
curl -LO "https://dl.k8s.io/release/v1.23.0/bin/linux/amd64/kubectl"
chmod +x kubectl
- run:
name: Install helm chart dependencies (Redis, Postgres, Istio)
command: |
sudo apt-get update && sudo apt-get install -y expect
pushd $HOME/project/.circleci/resources
kubectl create namespace model-engine
kubectl apply -f redis-k8s.yaml
kubectl apply -f postgres-k8s.yaml
kubectl create secret generic model-engine-postgres-credentials --from-literal=database_url=postgresql://postgres:[email protected]:5432/circle_test
kubectl create secret generic model-engine-postgres-credentials --from-literal=database_url=postgresql://postgres:[email protected]:5432/circle_test -n model-engine
export ISTIO_VERSION=1.15.0
popd
curl -L https://istio.io/downloadIstio | TARGET_ARCH=x86_64 sh -
install istio-${ISTIO_VERSION}/bin/istioctl $HOME/bin
$HOME/bin/istioctl install --set profile=demo -y
kubectl create configmap default-config --from-literal=config="$(cat $HOME/project/.circleci/resources/.minikube-config-map | envsubst)"
kubectl create configmap default-config --namespace model-engine --from-literal=config="$(cat $HOME/project/.circleci/resources/.minikube-config-map | envsubst)"
cat $HOME/project/.circleci/resources/.minikube-registry-creds | envsubst | expect
minikube addons enable registry-creds
- run:
name: Pre-load model-engine image to minikube
command: |
minikube --logtostderr -v 1 image load model-engine:$CIRCLE_SHA1
- run:
name: Pre-load integration test images to minikube
command: |
docker build -f model-engine/model_engine_server/inference/pytorch_or_tf.base.Dockerfile \
--build-arg BASE_IMAGE=python:3.8-slim \
--build-arg REQUIREMENTS_FILE="$CIRCLE_SHA1-base-requirements.txt" \
-t temp:1.11.0-cuda11.3-cudnn8-runtime-$CIRCLE_SHA1 .

touch $CIRCLE_SHA1-requirements.txt
echo -e "cloudpickle==2.1.0\npyyaml==6.0" > $CIRCLE_SHA1-requirements.txt

DOCKER_BUILDKIT=1 docker build -f model-engine/model_engine_server/inference/pytorch_or_tf.user.Dockerfile \
--build-arg BASE_IMAGE=temp:1.11.0-cuda11.3-cudnn8-runtime-$CIRCLE_SHA1 \
--build-arg REQUIREMENTS_FILE="$CIRCLE_SHA1-requirements.txt" \
-t $CIRCLECI_AWS_ACCOUNT_ID.dkr.ecr.us-west-2.amazonaws.com/hosted-model-inference/async-pytorch:1.11.0-cuda11.3-cudnn8-runtime-$CIRCLE_SHA1-b8c25b .
rm $CIRCLE_SHA1-requirements.txt

minikube --logtostderr -v 1 image load $CIRCLECI_AWS_ACCOUNT_ID.dkr.ecr.us-west-2.amazonaws.com/hosted-model-inference/async-pytorch:1.11.0-cuda11.3-cudnn8-runtime-$CIRCLE_SHA1-b8c25b
- run:
name: Install helm chart
command: |
cd $HOME/project/charts
helm install llm-engine llm-engine --values llm-engine/values_sample.yaml
pushd $HOME/project/charts
cat model-engine/values_circleci.yaml | envsubst > model-engine/values_circleci_subst.yaml
helm install model-engine model-engine --values model-engine/values_circleci_subst.yaml --set tag=$CIRCLE_SHA1 --atomic --debug
- run:
name: Change python version to 3.10.14
command: |
pyenv install 3.10.14
pyenv global 3.10.14
- run:
name: Install integration test dependencies
command: |
export DEBIAN_FRONTEND=noninteractive
sudo apt-get update && sudo apt-get install -y libcurl4-openssl-dev libssl-dev python3-dev
pip install -r model-engine/requirements.txt
- install_client
- install_server
- run:
name: Run integration tests
command: |
pushd $HOME/project
kubectl port-forward svc/model-engine 5001:80 &
export GIT_TAG=$CIRCLE_SHA1
pytest integration_tests

executors:
ubuntu-large:
machine:
image: "ubuntu-2004:202201-02"
resource_class: xlarge
image: default
resource_class: 2xlarge

commands:
environment_setup:
Expand All @@ -112,29 +208,30 @@ commands:
install_server:
description: Installs LLM Engine server
steps:
- python/install-packages:
pkg-manager: pip
app-dir: server
- python/install-packages:
pkg-manager: pip
app-dir: server
pip-dependency-file: requirements-test.txt
- python/install-packages:
pkg-manager: pip
app-dir: server
pip-dependency-file: requirements_override.txt
- run:
name: Install Server
command: |
pushd server
pip install -e .
popd
- python/install-packages:
pkg-manager: pip
app-dir: model-engine
- python/install-packages:
pkg-manager: pip
app-dir: model-engine
pip-dependency-file: requirements-test.txt
- python/install-packages:
pkg-manager: pip
app-dir: model-engine
pip-dependency-file: requirements_override.txt
- run:
name: Install Server
command: |
pushd model-engine
pip install -e .
popd
install_client:
description: Install LLM Engine client
steps:
- run:
name: Install LLM Engine client
command: |
pip install --upgrade pip
pip install -e $HOME/project/clients/python
run_unit_tests_python_client:
description: Unit tests of the python client
Expand All @@ -159,16 +256,17 @@ commands:
- run:
name: Ruff Lint Check
command: |
ruff .
ruff check .
- run:
name: Type Check
command: |
pushd server
pushd model-engine
mypy . --install-types --non-interactive
popd
- run:
name: Unit Tests
command: |
pushd server
WORKSPACE=.. pytest
pushd model-engine
GIT_TAG=$(git rev-parse HEAD) WORKSPACE=.. pytest --cov --cov-report=xml
diff-cover coverage.xml --compare-branch=origin/main --fail-under=80
popd
5 changes: 5 additions & 0 deletions .circleci/resources/.minikube-config-map
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Configmap for AWS credentials inside minikube.
[default]
aws_access_key_id = $AWS_ACCESS_KEY_ID
aws_secret_access_key = $AWS_SECRET_ACCESS_KEY
aws_session_token = $AWS_SESSION_TOKEN
15 changes: 15 additions & 0 deletions .circleci/resources/.minikube-registry-creds
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Script to send the registry-creds addon configuration to minikube
# Source: https://github.com/kubernetes/minikube/issues/8283
# See expect syntax here: https://manpages.ubuntu.com/manpages/trusty/man1/expect.1.html
spawn minikube addons configure registry-creds
expect "Do you want to enable AWS Elastic Container Registry?" { send "y\r" }
expect "Enter AWS Access Key ID:" { send "$AWS_ACCESS_KEY_ID\r" }
expect "Enter AWS Secret Access Key:" { send "$AWS_SECRET_ACCESS_KEY\r" }
expect "Enter AWS Session Token:" { send "$AWS_SESSION_TOKEN\r" }
expect "Enter AWS Region:" { send "us-west-2\r" }
expect "Enter 12 digit AWS Account ID (Comma separated list):" { send "$CIRCLECI_AWS_ACCOUNT_ID\r" }
expect "Enter ARN of AWS role to assume:" { send "\r" }
expect "Do you want to enable Google Container Registry?" { send "n\r" }
expect "Do you want to enable Docker Registry?" { send "n\r" }
expect "Do you want to enable Azure Container Registry?" { send "n\r" }
expect eof
50 changes: 50 additions & 0 deletions .circleci/resources/postgres-k8s.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: postgres
labels:
app: postgres
spec:
replicas: 1
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: main
image: "cimg/postgres:12.8-postgis"
imagePullPolicy: IfNotPresent
resources:
requests:
memory: 1Gi
cpu: 1
ports:
- containerPort: 5432
env:
- name: POSTGRES_USER
value: postgres
- name: POSTGRES_DB
value: circle_test
- name: POSTGRES_PASSWORD
value: circle_test

---

kind: Service
apiVersion: v1
metadata:
name: postgres
labels:
app: postgres
spec:
type: ClusterIP
selector:
app: postgres
ports:
- name: redis
port: 5432
targetPort: 5432
43 changes: 43 additions & 0 deletions .circleci/resources/redis-k8s.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis-message-broker-master
labels:
app: redis-message-broker-master
spec:
replicas: 1
selector:
matchLabels:
app: redis-message-broker-master
template:
metadata:
labels:
app: redis-message-broker-master
spec:
containers:
- name: main
image: redis
imagePullPolicy: IfNotPresent
resources:
requests:
memory: 1Gi
cpu: 1
ports:
- containerPort: 6379

---

kind: Service
apiVersion: v1
metadata:
name: redis-message-broker-master
labels:
app: redis-message-broker-master
spec:
type: ClusterIP
selector:
app: redis-message-broker-master
ports:
- name: redis
port: 6379
targetPort: 6379
Loading