Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When mixed precision input contains grad tensor, FSDP cast it with no grad #1191

Open
kamwoh opened this issue Dec 8, 2024 · 0 comments
Open

Comments

@kamwoh
Copy link

kamwoh commented Dec 8, 2024

args, kwargs = cast_floats_to_right_precision(True, True, *args, **kwargs)

From this line, notice that it will cast input into appropriate precision with no_grad operation if mixed_precision is on.

In this case, if the input contains grad tensor, for instance, input for this FSDP module is the output computed from another learnable module, then the gradient cannot be backpropagated into this learnable module, causing this learnable module does not receive gradient during optimizer.step()

Wondering why do we casting this with no grad? Is turning off no grad (set to false) safe?

At the moment, i set no_grad as false to bypass the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant