Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: implement llama3 RoPE scaling type and fix converter #1751

Merged
merged 2 commits into from
Aug 12, 2024

Conversation

ebraraktas
Copy link
Contributor

Fixes #1745 .

Implementation is ported from tranformers

@ebraraktas ebraraktas force-pushed the fixllama3.1-conversion branch from 5787b0c to 5503c93 Compare July 31, 2024 23:20
@ebraraktas
Copy link
Contributor Author

While implementing this, I saw that rotary embedding layer is shared among layers of Llama3 and huggingface implementation is refactored to use that shared one. Maybe we can implement this feature in ctranslate2 to reduce memory usage.

@minhthuc2502
Copy link
Collaborator

minhthuc2502 commented Aug 2, 2024

Hello, thank you for your PR. I will fix the CI soon and you can rebase on the master. What do you mean "rotary embedding layer is shared among layers of Llama3 and huggingface implementation". Can you add the link here?

@ebraraktas
Copy link
Contributor Author

Thanks for the comment, I will rebase once you fix it. @minhthuc2502

For RoPE sharing:

  • rotary_emb will be removed from LlamaAttention, see this comment
  • position_embeddings (tuple of cos and sin) will be generated from input embeddings at the beginning of inference inside LlamaModel.forward and passed to the layers using it.
  • And position_embeddings will be passed as an input to LlamaAttention.forward, implemented here

@minhthuc2502
Copy link
Collaborator

Ah I understand your point. Actually, we can keep the current architecture because It requires more changes than HF to do this.

@minhthuc2502
Copy link
Collaborator

Could you rebase on the master branch? please

@ebraraktas ebraraktas force-pushed the fixllama3.1-conversion branch from 5503c93 to 8429e5e Compare August 7, 2024 14:03
@ebraraktas ebraraktas force-pushed the fixllama3.1-conversion branch from 585a300 to 5d3c89e Compare August 7, 2024 19:17
@ebraraktas
Copy link
Contributor Author

ebraraktas commented Aug 7, 2024

As in this PR, no space left error occurred during docker step of the CI/CD. @minhthuc2502

@minhthuc2502 minhthuc2502 merged commit a386cbd into OpenNMT:master Aug 12, 2024
13 checks passed
@ebraraktas ebraraktas deleted the fixllama3.1-conversion branch September 9, 2024 08:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Llama 3.1 support please?
2 participants