Add Ulysses DistributedAttention compatibility #5525

Kwen-Chen · 2024-05-13T03:55:39Z

The DistributedAttention in DeepSpeed-Ulysses has a compatibility with the training code in Megatron-DeepSpeed because it only takes sequential sequences as input parameters. However, this is not compatible with the frequently used scenarios of specifying parameters, such as the following scenario when using Flash Attention:

ulysses_attn = DistributedAttention(local_attention=flash_attn_func, sequence_process_group=None, scatter_idx=2, gather_idx=1)

attn_output = ulysses_attn(
    query_states,
    key_states,
    value_states,
    dropout,
    softmax_scale,
    causal=causal,
)

Therefore, the **kwargs parameter has been added to increase compatibility with more local attention, while making minimal code modifications.

…lity

The `DistributedAttention` in DeepSpeed-Ulysses has a compatibility with the training code in [Megatron-DeepSpeed](https://github.com/microsoft/Megatron-DeepSpeed/blob/main/megatron/model/transformer.py#L811) because it only takes sequential sequences as input parameters. However, this is not compatible with the frequently used scenarios of specifying parameters, such as the following scenario when using Flash Attention: ```python ulysses_attn = DistributedAttention(local_attention=flash_attn_func, sequence_process_group=None, scatter_idx=2, gather_idx=1) attn_output = ulysses_attn( query_states, key_states, value_states, dropout, softmax_scale, causal=causal, ) ``` Therefore, the `**kwargs` parameter has been added to increase compatibility with more local attention, while making minimal code modifications. Co-authored-by: Kwen-Chen <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Logan Adams <[email protected]>

xs1997zju · 2024-07-22T09:13:58Z

How to set Ulysses in deepspeed config json?

Add Ulysses DistributedAttention compatibility

bca1a9e

Kwen-Chen requested a review from mrwyattii as a code owner May 13, 2024 03:55

Kwen-Chen closed this May 13, 2024

Kwen-Chen reopened this May 13, 2024

Merge branch 'master' into add-ulysses-DistributedAttention-compatibi…

984d2fb

…lity

samadejacobs approved these changes May 14, 2024

View reviewed changes

tjruwase and others added 3 commits May 15, 2024 16:42

Merge branch 'master' into add-ulysses-DistributedAttention-compatibi…

0cf487d

…lity

Merge branch 'master' into add-ulysses-DistributedAttention-compatibi…

2023ff4

…lity

Merge branch 'master' into add-ulysses-DistributedAttention-compatibi…

bbccb77

…lity

loadams added this pull request to the merge queue May 22, 2024

Merged via the queue into microsoft:master with commit f86824b May 22, 2024
12 checks passed

Kwen-Chen deleted the add-ulysses-DistributedAttention-compatibility branch May 23, 2024 04:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Ulysses DistributedAttention compatibility #5525

Add Ulysses DistributedAttention compatibility #5525

Kwen-Chen commented May 13, 2024

xs1997zju commented Jul 22, 2024

Add Ulysses DistributedAttention compatibility #5525

Add Ulysses DistributedAttention compatibility #5525

Conversation

Kwen-Chen commented May 13, 2024

xs1997zju commented Jul 22, 2024