finer granular zero offload strategy #4741
Unanswered
chizhang118
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Now for DeepSpeed zero offload, we have fixed strategy for computation and memory, to offlaod optimizer state and gradient and optimizer computation in CPU, and others on GPU. Do we need a finer-granular strategy for various hardware and parameter settings?
Beta Was this translation helpful? Give feedback.
All reactions