Skip to content

Releases: facebookresearch/xformers

Open sourcing indexing operators

30 Mar 15:38
Compare
Choose a tag to compare
OpenSource experimental indexing ops

ghstack-source-id: 7f0b1213844a6454e582536c683822d6a0a435b6
Pull Request resolved: https://github.com/fairinternal/xformers/pull/536

__original_commit__ = fairinternal/xformers@539a1388e4031fbc2b847f31a5ca341395659e67

Binaries for PT 2.0, mem-eff with bias & dropout, and varying seqlen

28 Mar 12:59
Compare
Choose a tag to compare

This release brings some improvements to the memory_efficient_attention

Pip wheels now target pytorch 2.0.0 - conda builds are available for PT 2.0.0, 1.13.1 and 1.12.1

Fixed

  • fMHA: Fixed BW pass on Sm86/Sm89 GPUs when K > 64 (RTX 3090, RTX 4090, A6000, ..) [#631]

Added

  • fMHA/CUTLASS: Added tensor attn bias support [#587] - contribution from @jfc4050
  • fMHA/CUTLASS: Added tensor attn bias grad support [#587] - contribution from @jfc4050
  • fMHA/CUTLASS: Added dropout support [#587] - contribution from @jfc4050
  • fMHA: Added support for varying sequence lengths [#500]

v0.0.17rc482

23 Mar 12:40
Compare
Choose a tag to compare
v0.0.17rc482 Pre-release
Pre-release
Fix conda with GLIBC (attempt 2)

ghstack-source-id: fd6e6a4f4909f787c8398b1b04ef13fd994ac1ec
Pull Request resolved: https://github.com/fairinternal/xformers/pull/510

__original_commit__ = fairinternal/xformers@0fbef8c5feb7b76307db32bbc6df8f39afd90751

v0.0.17rc481

21 Mar 18:23
Compare
Choose a tag to compare
v0.0.17rc481 Pre-release
Pre-release
Fix CI - anaconda upload + disable fairinternal wheels

ghstack-source-id: 8a817e879758a391894b3b6829de74d173c2fa67
Pull Request resolved: https://github.com/fairinternal/xformers/pull/505

__original_commit__ = fairinternal/xformers@c13d138e19030bf6e290721a96fe52814eb19a70

Pip wheels, improvements to mem-eff and more

31 Jan 12:27
Compare
Choose a tag to compare

This release contain many improvements to memory_efficient_attention, along with pip wheels now available on windows and linux!

New Features

Improvements

  • Stripe lineinfo from binaries, reducing the binary size [#549]
  • fMHA: Stricter inputs validation to avoid CUDA errors for unsupported inputs [#592]
  • fMHA/Flash-Attention: Updated to Dao-AILab/flash-attention@a1f49a2 with multiple changes from @TriDao that make the operator up to 20% faster
  • Updated triton dependency [#418]

Bug fixes

  • Fixed compatibility with Python 3.7 [#541] - thanks to @susumuota
  • fMHA: Fixed strides for QKV gradients for cutlass attention [#535]
  • fMHA/Flash-Attention: Fixed backward pass wrapper, where non-contiguous gradients could give the wrong result [#548]

v0.0.13

26 Sep 19:07
1d31a3a
Compare
Choose a tag to compare

Lots of improvements and bug fixes around the memory efficient attention.

v0.0.12

08 Aug 15:24
Compare
Choose a tag to compare

[0.0.12] - 2022-08-08

Fixed

  • Removed duplicated biases in the FusedMLP layers [#317]
  • Rotary embeddings respecting input types [#326]
  • Poolformer style instantiating useless projection layers [#349]
  • Fix layer position not being properly tracked, causing extra layernorms for programmatic xformers [#348]
  • Pass use_triton flag to LayerNorm module [#336]

Added

  • Four blocksparsity layouts from DeepSpeed [#320]
  • Support several initialization options [#312]
  • Conv2DFeedforward feedforward part [#321]
  • VisualAttention [#329]
  • Automatic blocksparse for causal attention [#334]
  • Better hierarchical transformer generation [#345]
  • Fused operations with AOTAutograd/NVFuser, integration into MLP [#357]
  • Refactor LRA code to use Pytorch Lightning [#343]

v0.0.11

30 May 21:25
8a2ef26
Compare
Choose a tag to compare

[0.0.11] - 2022-05-30

Fixed

  • Fix some torchscriptability [#246]
  • Fix FourierMix being compatible with AMP [#258]
  • Better asserts on QKV dimensions [#264]
  • Better perfs for FusedMLP and FusedLinearLayer [#283]
  • Deepnorm init missing self-attention [#284]

Added

  • Simplicial Embeddings [#259]
  • Mem efficient attention, FW pass [#267]
  • MHA benchmark
  • MLP benchmark
  • Move all triton kernels to triton v2 [#272]
  • Mem efficient attention, BW pass [#281]
  • Metaformer support [#294]

v0.0.10

15 Mar 15:52
fd5e5c0
Compare
Choose a tag to compare

Fixed

  • Expose bias flag for feedforwards, same default as Timm [#220]
  • Update eps value for layernormm, same default as torch [#221]
  • PreNorm bugfix, only one input was normalized [#233]

Added

  • Add DeepNet (DeepNorm) residual path and init [#227]

v0.0.9

09 Feb 20:50
4c888f2
Compare
Choose a tag to compare

Added

  • Compositional Attention [#41]
  • Experimental Ragged attention [#189]
  • Mixture of Experts [#181]
  • BlockSparseTensor [#202]
  • nd-tensor support for triton softmax [#210]

Fixed

  • bugfix Favor, single feature map [#183]
  • sanity check blocksparse settings [#207]
  • fixed some pickability [#204]