Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does Xformers offer any extra speed over PyTorch anymore? And why is my Xformers file so big? #1174

Open
Mescalamba opened this issue Dec 14, 2024 · 6 comments

Comments

@Mescalamba
Copy link

❓ Questions and Help

I have tested official Xformers for 2.5.1 vs just PyTorch, and speed was actually in negatives, eg. slower than PyTorch. That said, I do have old GPU, it could be different on something newer as I can use only torch attn v1.

I have build myself Xformers 0.29 for nightly torch 2.6 (which btw. seems to work with any nightly version, currently I have 2.6.0.dev20241212). And speed, well its same? Altho I suspect somehow Xformers have some edge when it comes to image quality.

That said, I wonder, why is 0.29 I built so big? Resulting file is 446MB.. thats a lot bigger than last 0.28. Did I built it wrong? I mean, it works fine..

@danthe3rd
Copy link
Contributor

Hi,
What GPU do you have?
How did you build xFormers?
And what is your benchmark for measuring speed?

Some of the components from xFormers have been integrated in PyTorch, so that might be a reason why you don't see any speedup.

@Mescalamba
Copy link
Author

That would explain.

python -m pip wheel . --no-deps

Guess --no-deps is why its big?

Benchmark is simply running same ComfyUI workflow with everything locked. One can gauge difference between sampler, or in this case between xFormers and PyTorch.

I have just old Titan Xp

@danthe3rd
Copy link
Contributor

So for the Titan kernels, we didn't change them in a while, and they are available as part of PyTorch now. You will get exactly the same speed/result with PyTorch's scaled_dot_product_attention :)

bertmaher pushed a commit to bertmaher/xformers that referenced this issue Dec 20, 2024
@danthe3rd
Copy link
Contributor

Updating this - we added support for Flash3 by default in xFormers. This is not yet supported in PyTorch's scaled_dot_product_attention, so we expect xFormers to be quite faster on H100s, until PyTorch supports Flash3.

@Mescalamba
Copy link
Author

Very cool, but Im stuck with my Titan Xp, so I guess no improvements possible there?

@danthe3rd
Copy link
Contributor

Yeah we're no longer working on Titan-type of GPUs, we're focusing mostly on H100s at the moment...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants