Skip to content

Enabling FAv3 by default, removed deprecated components

Compare
Choose a tag to compare
@danthe3rd danthe3rd released this 27 Dec 09:39
· 5 commits to main since this release

Pre-built binary wheels require PyTorch 2.5.1

Improved:

  • [fMHA] Creating a LowerTriangularMask no longer creates a CUDA tensor
  • [fMHA] Updated Flash-Attention to v2.7.2.post1
  • [fMHA] Flash-Attention v3 will now be used by memory_efficient_attention by default when available, unless the operator is enforced with the op keyword-argument. Switching from Flash2 to Flash3 can make transformer trainings ~10% faster end-to-end on H100s
  • [fMHA] Fixed a performance regression with the cutlass backend for the backward pass (#1176) - mostly used on older GPUs (eg V100)
  • Fixed swiglu operator compatibility with torch-compile with PyTorch 2.6
  • Fix activation checkpointing of SwiGLU when AMP is enabled (#1152)

Removed:

  • Following PyTorch, xFormers no longer builds binaries for conda. Pip is now the only recommended way to get xFormers
  • Removed unmaintained/deprecated components in xformers.components.* (see #848)