Custom tensor parallelism #5438
Unanswered
sarathkondeti
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello,
I'm deploying Vicuna-33b over 2xA6000s(48gb each) using TP 2.
It consumes about ~37gb on each card.
I have a new scenario where there is a workload on gpu0 consuming 24gb vram.
Does Deepspeed currently do anything smart like a 25:75 tensor slicing to adapt to the 24+48gb gpu mem?
I see that tp_shard.py doesn't support any parameters but I wanted to confirm this behavior.
Thank you very much.
Beta Was this translation helpful? Give feedback.
All reactions