You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to benchmark multi-node allgather perf using param tests for buffers up to 2G. but the test will OOM at buffer size around 1G. While the same config works for nccl-tests. Any ideas or insight will be helpful. Thank you!. AR and RS tests are fine and results are very similar to nccl-tests. You can reproduce this on A100-40G /H100 clusters. (p4d or p5 on AWS)
PyTorch nightly with cuda 12.1 or PyTorch 2.0.1 with CUDA 11.8
Hi,
I'm trying to benchmark multi-node allgather perf using param tests for buffers up to 2G. but the test will OOM at buffer size around 1G. While the same config works for nccl-tests. Any ideas or insight will be helpful. Thank you!. AR and RS tests are fine and results are very similar to nccl-tests. You can reproduce this on A100-40G /H100 clusters. (p4d or p5 on AWS)
PyTorch nightly with cuda 12.1 or PyTorch 2.0.1 with CUDA 11.8
for param I'm launching the following way
for nccl-test, I'm using NCCL 2.18.3 + CUDA 12.1, but older version also works.
and in the bash file
The text was updated successfully, but these errors were encountered: