You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I'm using lm_eval.simple_evaluate to test deepseek-ai/DeepSeek-V2-Lite with 8 H100 GPUs and HFLM, and I found a skewed memory overhead to a fixed batch size 16.
It seems that both computatio and memory are under utilized.
I also tried to set batch_size=auto, and the profiling runs with the max_seq which is 16k and returns a batch size 1. This batch size is clearly not the optimal one.
Do you have any clue to improve the evaluation efficiency? Is vllm a better choice?
Hi, I'm using
lm_eval.simple_evaluate
to test deepseek-ai/DeepSeek-V2-Lite with 8 H100 GPUs andHFLM
, and I found a skewed memory overhead to a fixed batch size 16.It seems that both computatio and memory are under utilized.
I also tried to set
batch_size=auto
, and the profiling runs with the max_seq which is 16k and returns a batch size 1. This batch size is clearly not the optimal one.Do you have any clue to improve the evaluation efficiency? Is vllm a better choice?
Attached is my script
The text was updated successfully, but these errors were encountered: