Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[refactor] Generalized SwiGLU python code #1160

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

warpuv
Copy link
Contributor

@warpuv warpuv commented Nov 21, 2024

Created base classes and generalized functions from the SwiGLU implementation to be reused by SwiGLU and other GLU-like activation functions which may be implemented in the future.

What does this PR do?

Fixes #1158.

Before submitting

  • Did you have fun?
    • Make sure you had fun coding 🙃
  • Did you read the contributor guideline?
  • Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
    • N/A
  • Did you make sure to update the docs?
    • N/A
  • Did you write any new necessary tests?
    • N/A
  • Did you update the changelog? (if needed)
    • N/A

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 21, 2024
@warpuv
Copy link
Contributor Author

warpuv commented Nov 27, 2024

Dear @danthe3rd have you had a chance to look at the PR?

Created base classes and generalized functions in common_glu.py
to reuse by SwiGLU and other GLU-like activation functions that may be implemented in the future.
@warpuv
Copy link
Contributor Author

warpuv commented Dec 3, 2024

Hi @lw if it’s tedious to check all these changes at once, I can delete this PR and create several trivial PRs out of this one, easer to check. What do you think?

@danthe3rd
Copy link
Contributor

Hi,
Is it still something you want to merge here?
As stated before, this approach most likely will be hard to implement on H100s, and probably it would be more efficient to pack the weights in a different scheme

@warpuv
Copy link
Contributor Author

warpuv commented Dec 19, 2024

Hi, Is it still something you want to merge here? As stated before, this approach most likely will be hard to implement on H100s, and probably it would be more efficient to pack the weights in a different scheme

Hello @danthe3rd,
This commit is only about making it easier to add new activation functions. I may try to implement optimization for H100 later, not in this MR. No other commits were planned to push in this MR.

@danthe3rd
Copy link
Contributor

What do you mean by "MR"?

@warpuv
Copy link
Contributor Author

warpuv commented Dec 19, 2024

What do you mean by "MR"?

MR - Merge Request or Pull Request (PR)

@danthe3rd
Copy link
Contributor

So this PR is not implementing anything new, but if you want to add support for other activation functions, probably we should discuss it. I believe this SwiGLU implementation has poor design choices, and it might be better to start from scratch for another implementation.
Would you be available for a VC? Are you on the PyTorch slack channel by any chance? Or is there another way I can reach out to you to set this up?

@warpuv
Copy link
Contributor Author

warpuv commented Jan 5, 2025

So this PR is not implementing anything new, but if you want to add support for other activation functions, probably we should discuss it. I believe this SwiGLU implementation has poor design choices, and it might be better to start from scratch for another implementation. Would you be available for a VC? Are you on the PyTorch slack channel by any chance? Or is there another way I can reach out to you to set this up?

You can find me on PyTorch slack channel by first and last name, or by using nick name (same as here).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Generalize SwiGLU related python code
3 participants