Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

falcon3 support please? #1834

Open
BBC-Esq opened this issue Dec 18, 2024 · 1 comment
Open

falcon3 support please? #1834

BBC-Esq opened this issue Dec 18, 2024 · 1 comment

Comments

@BBC-Esq
Copy link

BBC-Esq commented Dec 18, 2024

Falcon3 was just released today:

https://huggingface.co/collections/tiiuae/falcon3-67605ae03578be86e4e87026

More details:

https://huggingface.co/blog/falcon3

More stuff:

details and stuff
model |   |  
-- | -- | --
model.embed_tokens.weight | [131072, 3072] | BF16
model.layers.0 |   |  
model.layers.0.input_layernorm.weight | [3072] | BF16
model.layers.0.mlp.down_proj.weight | [3072, 23040] | BF16
model.layers.0.mlp.gate_proj.weight | [23040, 3072] | BF16
model.layers.0.mlp.up_proj.weight | [23040, 3072] | BF16
model.layers.0.post_attention_layernorm.weight | [3072] | BF16
model.layers.0.self_attn.k_proj.weight | [1024, 3072] | BF16
model.layers.0.self_attn.o_proj.weight | [3072, 3072] | BF16
model.layers.0.self_attn.q_proj.weight | [3072, 3072] | BF16
model.layers.0.self_attn.v_proj.weight | [1024, 3072] | BF16
model.layers.1 |   |  
model.layers.1.input_layernorm.weight | [3072] | BF16
model.layers.1.mlp.down_proj.weight | [3072, 23040] | BF16
model.layers.1.mlp.gate_proj.weight | [23040, 3072] | BF16
model.layers.1.mlp.up_proj.weight | [23040, 3072] | BF16
model.layers.1.post_attention_layernorm.weight | [3072] | BF16
model.layers.1.self_attn.k_proj.weight | [1024, 3072] | BF16
model.layers.1.self_attn.o_proj.weight | [3072, 3072] | BF16
model.layers.1.self_attn.q_proj.weight | [3072, 3072] | BF16
model.layers.1.self_attn.v_proj.weight | [1024, 3072] | BF16
model.layers.2 |   |  
model.layers.2.input_layernorm.weight | [3072] | BF16
model.layers.2.mlp.down_proj.weight | [3072, 23040] | BF16
model.layers.2.mlp.gate_proj.weight | [23040, 3072] | BF16
model.layers.2.mlp.up_proj.weight | [23040, 3072] | BF16
model.layers.2.post_attention_layernorm.weight | [3072] | BF16
model.layers.2.self_attn.k_proj.weight | [1024, 3072] | BF16
model.layers.2.self_attn.o_proj.weight | [3072, 3072] | BF16
model.layers.2.self_attn.q_proj.weight | [3072, 3072] | BF16
model.layers.2.self_attn.v_proj.weight | [1024, 3072] | BF16
model.layers.3 |   |  
model.layers.3.input_layernorm.weight | [3072] | BF16
model.layers.3.mlp.down_proj.weight | [3072, 23040] | BF16
model.layers.3.mlp.gate_proj.weight | [23040, 3072] | BF16
model.layers.3.mlp.up_proj.weight | [23040, 3072] | BF16
model.layers.3.post_attention_layernorm.weight | [3072] | BF16
model.layers.3.self_attn.k_proj.weight | [1024, 3072] | BF16
model.layers.3.self_attn.o_proj.weight | [3072, 3072] | BF16
model.layers.3.self_attn.q_proj.weight | [3072, 3072] | BF16
model.layers.3.self_attn.v_proj.weight | [1024, 3072] | BF16
model.layers.4 |   |  
model.layers.4.input_layernorm.weight | [3072] | BF16
model.layers.4.mlp.down_proj.weight | [3072, 23040] | BF16
model.layers.4.mlp.gate_proj.weight | [23040, 3072] | BF16
model.layers.4.mlp.up_proj.weight | [23040, 3072] | BF16
model.layers.4.post_attention_layernorm.weight | [3072] | BF16
model.layers.4.self_attn.k_proj.weight | [1024, 3072] | BF16
model.layers.4.self_attn.o_proj.weight | [3072, 3072] | BF16
model.layers.4.self_attn.q_proj.weight | [3072, 3072] | BF16
model.layers.4.self_attn.v_proj.weight | [1024, 3072] | BF16
model.layers.5 |   |  
model.layers.5.input_layernorm.weight | [3072] | BF16
model.layers.5.mlp.down_proj.weight | [3072, 23040] | BF16
model.layers.5.mlp.gate_proj.weight | [23040, 3072] | BF16
model.layers.5.mlp.up_proj.weight | [23040, 3072] | BF16
model.layers.5.post_attention_layernorm.weight | [3072] | BF16
model.layers.5.self_attn.k_proj.weight | [1024, 3072] | BF16
model.layers.5.self_attn.o_proj.weight | [3072, 3072] | BF16
model.layers.5.self_attn.q_proj.weight | [3072, 3072] | BF16
model.layers.5.self_attn.v_proj.weight | [1024, 3072] | BF16
model.layers.6 |   |  
model.layers.6.input_layernorm.weight | [3072] | BF16
model.layers.6.mlp.down_proj.weight | [3072, 23040] | BF16
model.layers.6.mlp.gate_proj.weight | [23040, 3072] | BF16
model.layers.6.mlp.up_proj.weight | [23040, 3072] | BF16
model.layers.6.post_attention_layernorm.weight | [3072] | BF16
model.layers.6.self_attn.k_proj.weight | [1024, 3072] | BF16
model.layers.6.self_attn.o_proj.weight | [3072, 3072] | BF16
model.layers.6.self_attn.q_proj.weight | [3072, 3072] | BF16
model.layers.6.self_attn.v_proj.weight | [1024, 3072] | BF16
model.layers.7 |   |  
model.layers.7.input_layernorm.weight | [3072] | BF16
model.layers.7.mlp.down_proj.weight | [3072, 23040] | BF16
model.layers.7.mlp.gate_proj.weight | [23040, 3072] | BF16
model.layers.7.mlp.up_proj.weight | [23040, 3072] | BF16
model.layers.7.post_attention_layernorm.weight | [3072] | BF16
model.layers.7.self_attn.k_proj.weight | [1024, 3072] | BF16
model.layers.7.self_attn.o_proj.weight | [3072, 3072] | BF16
model.layers.7.self_attn.q_proj.weight | [3072, 3072] | BF16
model.layers.7.self_attn.v_proj.weight | [1024, 3072] | BF16
model.layers.8 |   |  
model.layers.8.input_layernorm.weight | [3072] | BF16
model.layers.8.mlp.down_proj.weight | [3072, 23040] | BF16
model.layers.8.mlp.gate_proj.weight | [23040, 3072] | BF16
model.layers.8.mlp.up_proj.weight | [23040, 3072] | BF16
model.layers.8.post_attention_layernorm.weight | [3072] | BF16
model.layers.8.self_attn.k_proj.weight | [1024, 3072] | BF16
model.layers.8.self_attn.o_proj.weight | [3072, 3072] | BF16
model.layers.8.self_attn.q_proj.weight | [3072, 3072] | BF16
model.layers.8.self_attn.v_proj.weight | [1024, 3072] | BF16
model.layers.9 |   |  
model.layers.9.input_layernorm.weight | [3072] | BF16
model.layers.9.mlp.down_proj.weight | [3072, 23040] | BF16
model.layers.9.mlp.gate_proj.weight | [23040, 3072] | BF16
model.layers.9.mlp.up_proj.weight | [23040, 3072] | BF16
model.layers.9.post_attention_layernorm.weight | [3072] | BF16
model.layers.9.self_attn.k_proj.weight | [1024, 3072] | BF16
model.layers.9.self_attn.o_proj.weight | [3072, 3072] | BF16
model.layers.9.self_attn.q_proj.weight | [3072, 3072] | BF16
model.layers.9.self_attn.v_proj.weight | [1024, 3072] | BF16
model.layers.10 |   |  
model.layers.10.input_layernorm.weight | [3072] | BF16
model.layers.10.mlp.down_proj.weight | [3072, 23040] | BF16
model.layers.10.mlp.gate_proj.weight | [23040, 3072] | BF16
model.layers.10.mlp.up_proj.weight | [23040, 3072] | BF16
model.layers.10.post_attention_layernorm.weight | [3072] | BF16
model.layers.10.self_attn.k_proj.weight | [1024, 3072] | BF16
model.layers.10.self_attn.o_proj.weight | [3072, 3072] | BF16
model.layers.10.self_attn.q_proj.weight | [3072, 3072] | BF16
model.layers.10.self_attn.v_proj.weight | [1024, 3072] | BF16
model.layers.11 |   |  
model.layers.11.input_layernorm.weight | [3072] | BF16
model.layers.11.mlp.down_proj.weight | [3072, 23040] | BF16
model.layers.11.mlp.gate_proj.weight | [23040, 3072] | BF16
model.layers.11.mlp.up_proj.weight | [23040, 3072] | BF16
model.layers.11.post_attention_layernorm.weight | [3072] | BF16
model.layers.11.self_attn.k_proj.weight | [1024, 3072] | BF16
model.layers.11.self_attn.o_proj.weight | [3072, 3072] | BF16
model.layers.11.self_attn.q_proj.weight | [3072, 3072] | BF16
model.layers.11.self_attn.v_proj.weight | [1024, 3072] | BF16
model.layers.12 |   |  
model.layers.12.input_layernorm.weight | [3072] | BF16
model.layers.12.mlp.down_proj.weight | [3072, 23040] | BF16
model.layers.12.mlp.gate_proj.weight | [23040, 3072] | BF16
model.layers.12.mlp.up_proj.weight | [23040, 3072] | BF16
model.layers.12.post_attention_layernorm.weight | [3072] | BF16
model.layers.12.self_attn.k_proj.weight | [1024, 3072] | BF16
model.layers.12.self_attn.o_proj.weight | [3072, 3072] | BF16
model.layers.12.self_attn.q_proj.weight | [3072, 3072] | BF16
model.layers.12.self_attn.v_proj.weight | [1024, 3072] | BF16
model.layers.13 |   |  
model.layers.13.input_layernorm.weight | [3072] | BF16
model.layers.13.mlp.down_proj.weight | [3072, 23040] | BF16
model.layers.13.mlp.gate_proj.weight | [23040, 3072] | BF16
model.layers.13.mlp.up_proj.weight | [23040, 3072] | BF16
model.layers.13.post_attention_layernorm.weight | [3072] | BF16
model.layers.13.self_attn.k_proj.weight | [1024, 3072] | BF16
model.layers.13.self_attn.o_proj.weight | [3072, 3072] | BF16
model.layers.13.self_attn.q_proj.weight | [3072, 3072] | BF16
model.layers.13.self_attn.v_proj.weight | [1024, 3072] | BF16
model.layers.14 |   |  
model.layers.14.input_layernorm.weight | [3072] | BF16
model.layers.14.mlp.down_proj.weight | [3072, 23040] | BF16
model.layers.14.mlp.gate_proj.weight | [23040, 3072] | BF16
model.layers.14.mlp.up_proj.weight | [23040, 3072] | BF16
model.layers.14.post_attention_layernorm.weight | [3072] | BF16
model.layers.14.self_attn.k_proj.weight | [1024, 3072] | BF16
model.layers.14.self_attn.o_proj.weight | [3072, 3072] | BF16
model.layers.14.self_attn.q_proj.weight | [3072, 3072] | BF16
model.layers.14.self_attn.v_proj.weight | [1024, 3072] | BF16
model.layers.15 |   |  
model.layers.15.input_layernorm.weight | [3072] | BF16
model.layers.15.mlp.down_proj.weight | [3072, 23040] | BF16
model.layers.15.mlp.gate_proj.weight | [23040, 3072] | BF16
model.layers.15.mlp.up_proj.weight | [23040, 3072] | BF16
model.layers.15.post_attention_layernorm.weight | [3072] | BF16
model.layers.15.self_attn.k_proj.weight | [1024, 3072] | BF16
model.layers.15.self_attn.o_proj.weight | [3072, 3072] | BF16
model.layers.15.self_attn.q_proj.weight | [3072, 3072] | BF16
model.layers.15.self_attn.v_proj.weight | [1024, 3072] | BF16
model.layers.16 |   |  
model.layers.16.input_layernorm.weight | [3072] | BF16
model.layers.16.mlp.down_proj.weight | [3072, 23040] | BF16
model.layers.16.mlp.gate_proj.weight | [23040, 3072] | BF16
model.layers.16.mlp.up_proj.weight | [23040, 3072] | BF16
model.layers.16.post_attention_layernorm.weight | [3072] | BF16
model.layers.16.self_attn.k_proj.weight | [1024, 3072] | BF16
model.layers.16.self_attn.o_proj.weight | [3072, 3072] | BF16
model.layers.16.self_attn.q_proj.weight | [3072, 3072] | BF16
model.layers.16.self_attn.v_proj.weight | [1024, 3072] | BF16
model.layers.17 |   |  
model.layers.17.input_layernorm.weight | [3072] | BF16
model.layers.17.mlp.down_proj.weight | [3072, 23040] | BF16
model.layers.17.mlp.gate_proj.weight | [23040, 3072] | BF16
model.layers.17.mlp.up_proj.weight | [23040, 3072] | BF16
model.layers.17.post_attention_layernorm.weight | [3072] | BF16
model.layers.17.self_attn.k_proj.weight | [1024, 3072] | BF16
model.layers.17.self_attn.o_proj.weight | [3072, 3072] | BF16
model.layers.17.self_attn.q_proj.weight | [3072, 3072] | BF16
model.layers.17.self_attn.v_proj.weight | [1024, 3072] | BF16
model.layers.18 |   |  
model.layers.18.input_layernorm.weight | [3072] | BF16
model.layers.18.mlp.down_proj.weight | [3072, 23040] | BF16
model.layers.18.mlp.gate_proj.weight | [23040, 3072] | BF16
model.layers.18.mlp.up_proj.weight | [23040, 3072] | BF16
model.layers.18.post_attention_layernorm.weight | [3072] | BF16
model.layers.18.self_attn.k_proj.weight | [1024, 3072] | BF16
model.layers.18.self_attn.o_proj.weight | [3072, 3072] | BF16
model.layers.18.self_attn.q_proj.weight | [3072, 3072] | BF16
model.layers.18.self_attn.v_proj.weight | [1024, 3072] | BF16
model.layers.19 |   |  
model.layers.19.input_layernorm.weight | [3072] | BF16
model.layers.19.mlp.down_proj.weight | [3072, 23040] | BF16
model.layers.19.mlp.gate_proj.weight | [23040, 3072] | BF16
model.layers.19.mlp.up_proj.weight | [23040, 3072] | BF16
model.layers.19.post_attention_layernorm.weight | [3072] | BF16
model.layers.19.self_attn.k_proj.weight | [1024, 3072] | BF16
model.layers.19.self_attn.o_proj.weight | [3072, 3072] | BF16
model.layers.19.self_attn.q_proj.weight | [3072, 3072] | BF16
model.layers.19.self_attn.v_proj.weight | [1024, 3072] | BF16
model.layers.20 |   |  
model.layers.20.input_layernorm.weight | [3072] | BF16
model.layers.20.mlp.down_proj.weight | [3072, 23040] | BF16
model.layers.20.mlp.gate_proj.weight | [23040, 3072] | BF16
model.layers.20.mlp.up_proj.weight | [23040, 3072] | BF16
model.layers.20.post_attention_layernorm.weight | [3072] | BF16
model.layers.20.self_attn.k_proj.weight | [1024, 3072] | BF16
model.layers.20.self_attn.o_proj.weight | [3072, 3072] | BF16
model.layers.20.self_attn.q_proj.weight | [3072, 3072] | BF16
model.layers.20.self_attn.v_proj.weight | [1024, 3072] | BF16
model.layers.21 |   |  
model.layers.21.input_layernorm.weight | [3072] | BF16
model.layers.21.mlp.down_proj.weight | [3072, 23040] | BF16
model.layers.21.mlp.gate_proj.weight | [23040, 3072] | BF16
model.layers.21.mlp.up_proj.weight | [23040, 3072] | BF16
model.layers.21.post_attention_layernorm.weight | [3072] | BF16
model.layers.21.self_attn.k_proj.weight | [1024, 3072] | BF16
model.layers.21.self_attn.o_proj.weight | [3072, 3072] | BF16
model.layers.21.self_attn.q_proj.weight | [3072, 3072] | BF16
model.layers.21.self_attn.v_proj.weight | [1024, 3072] | BF16
model.layers.22 |   |  
model.layers.22.input_layernorm.weight | [3072] | BF16
model.layers.22.mlp.down_proj.weight | [3072, 23040] | BF16
model.layers.22.mlp.gate_proj.weight | [23040, 3072] | BF16
model.layers.22.mlp.up_proj.weight | [23040, 3072] | BF16
model.layers.22.post_attention_layernorm.weight | [3072] | BF16
model.layers.22.self_attn.k_proj.weight | [1024, 3072] | BF16
model.layers.22.self_attn.o_proj.weight | [3072, 3072] | BF16
model.layers.22.self_attn.q_proj.weight | [3072, 3072] | BF16
model.layers.22.self_attn.v_proj.weight | [1024, 3072] | BF16
model.layers.23 |   |  
model.layers.23.input_layernorm.weight | [3072] | BF16
model.layers.23.mlp.down_proj.weight | [3072, 23040] | BF16
model.layers.23.mlp.gate_proj.weight | [23040, 3072] | BF16
model.layers.23.mlp.up_proj.weight | [23040, 3072] | BF16
model.layers.23.post_attention_layernorm.weight | [3072] | BF16
model.layers.23.self_attn.k_proj.weight | [1024, 3072] | BF16
model.layers.23.self_attn.o_proj.weight | [3072, 3072] | BF16
model.layers.23.self_attn.q_proj.weight | [3072, 3072] | BF16
model.layers.23.self_attn.v_proj.weight | [1024, 3072] | BF16
model.layers.24 |   |  
model.layers.24.input_layernorm.weight | [3072] | BF16
model.layers.24.mlp.down_proj.weight | [3072, 23040] | BF16
model.layers.24.mlp.gate_proj.weight | [23040, 3072] | BF16
model.layers.24.mlp.up_proj.weight | [23040, 3072] | BF16
model.layers.24.post_attention_layernorm.weight | [3072] | BF16
model.layers.24.self_attn.k_proj.weight | [1024, 3072] | BF16
model.layers.24.self_attn.o_proj.weight | [3072, 3072] | BF16
model.layers.24.self_attn.q_proj.weight | [3072, 3072] | BF16
model.layers.24.self_attn.v_proj.weight | [1024, 3072] | BF16
model.layers.25 |   |  
model.layers.25.input_layernorm.weight | [3072] | BF16
model.layers.25.mlp.down_proj.weight | [3072, 23040] | BF16
model.layers.25.mlp.gate_proj.weight | [23040, 3072] | BF16
model.layers.25.mlp.up_proj.weight | [23040, 3072] | BF16
model.layers.25.post_attention_layernorm.weight | [3072] | BF16
model.layers.25.self_attn.k_proj.weight | [1024, 3072] | BF16
model.layers.25.self_attn.o_proj.weight | [3072, 3072] | BF16
model.layers.25.self_attn.q_proj.weight | [3072, 3072] | BF16
model.layers.25.self_attn.v_proj.weight | [1024, 3072] | BF16
model.layers.26 |   |  
model.layers.26.input_layernorm.weight | [3072] | BF16
model.layers.26.mlp.down_proj.weight | [3072, 23040] | BF16
model.layers.26.mlp.gate_proj.weight | [23040, 3072] | BF16
model.layers.26.mlp.up_proj.weight | [23040, 3072] | BF16
model.layers.26.post_attention_layernorm.weight | [3072] | BF16
model.layers.26.self_attn.k_proj.weight | [1024, 3072] | BF16
model.layers.26.self_attn.o_proj.weight | [3072, 3072] | BF16
model.layers.26.self_attn.q_proj.weight | [3072, 3072] | BF16
model.layers.26.self_attn.v_proj.weight | [1024, 3072] | BF16
model.layers.27 |   |  
model.layers.27.input_layernorm.weight | [3072] | BF16
model.layers.27.mlp.down_proj.weight | [3072, 23040] | BF16
model.layers.27.mlp.gate_proj.weight | [23040, 3072] | BF16
model.layers.27.mlp.up_proj.weight | [23040, 3072] | BF16
model.layers.27.post_attention_layernorm.weight | [3072] | BF16
model.layers.27.self_attn.k_proj.weight | [1024, 3072] | BF16
model.layers.27.self_attn.o_proj.weight | [3072, 3072] | BF16
model.layers.27.self_attn.q_proj.weight | [3072, 3072] | BF16
model.layers.27.self_attn.v_proj.weight | [1024, 3072] | BF16
model.norm.weight | [3072] | BF16
lm_head.weight | [131072, 3072] | BF16
@jncraton
Copy link
Contributor

The transformer-based Falcon 3 models use the Llama architecture, so they are already supported. I played around with the 1B, 3B, and 7B models yesterday.

I did run into an issue because the Faclon 3 models did not specify a bos token. Once I set this manually, the model appeared to work correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants