Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train batch size errors #6914

Open
lckkkk02 opened this issue Dec 25, 2024 · 2 comments
Open

Train batch size errors #6914

lckkkk02 opened this issue Dec 25, 2024 · 2 comments
Assignees

Comments

@lckkkk02
Copy link

[rank0]: AssertionError: Check batch related parameters. train_batch_size is not equal to micro_batch_per_gpu * gradient_acc_step * world_size 8 != 2 * 1 * 1
在前面的dist.world_size打印获取=4,但是依然不行

@lx-Meteors
Copy link

很明显 wordsize虽然是打印是4,但是系统获取的还是8? 可以改一下后面的参数啊 例如 2*4 * 1

@loadams
Copy link
Contributor

loadams commented Jan 2, 2025

@lckkkk02 - did changing the parameters work for you? Or can you share your ds_config and code that reproduces the failure and more information about your setup?

@loadams loadams changed the title 初始化问题 Train batch size errors Jan 2, 2025
@loadams loadams self-assigned this Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants