Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-trainable parameters? #921

Open
hovinen opened this issue Mar 1, 2024 · 2 comments
Open

Non-trainable parameters? #921

hovinen opened this issue Mar 1, 2024 · 2 comments

Comments

@hovinen
Copy link

hovinen commented Mar 1, 2024

I would like to set up a network in which all of the parameters of one of the linear layers are hard-coded and do not change through training. In other libraries such as PyTorch, one can do this by clearing flag requires_grad on the parameters one wishes to hold fixed. I can't find any equivalent in the dfdx documentation, nor any mention of the terms "non-trainable" or similar.

Does dfdx support this at all? If so, how does one set this up?

@swfsql
Copy link
Contributor

swfsql commented Mar 1, 2024

I'm not entirely sure, but I believe you can create a wrapper structure that defines how the forward_mut method behaves (assuming you want to implement a Module), and in that method when using the linear layers that you intend to not train, instead of calling their forward_mut methods you'd call the forward instead. But I'm not sure how you'd need to go about the Tapes on the inputs data, maybe it can be kept the same.

@opfromthestart
Copy link
Contributor

If its possible to isolate the trainable parts of the model, you can just make the optimizer only take the trainable parts as input. Eg if you are only training the last layer, you can make the optimizer only take the last layer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants