No operator found for `memory_efficient_attention_forward` with inputs #1191

Carkham · 2025-01-06T04:41:28Z

❓ Questions and Help

Hi! I want to use vllm==0.6.6 to accelerate inference. Everything goes well when I use Qwen2VL-2B. But when I change it to InternVL2.5-4B, I get this error:

[rank0]: NotImplementedError: Error in model execution (input dumped to /tmp/err_execute_model_input_20250106-123649.pkl): No operator found for `memory_efficient_attention_forward` with inputs:
[rank0]:      query       : shape=(104, 1025, 16, 64) (torch.bfloat16)
[rank0]:      key         : shape=(104, 1025, 16, 64) (torch.bfloat16)
[rank0]:      value       : shape=(104, 1025, 16, 64) (torch.bfloat16)
[rank0]:      attn_bias   : <class 'NoneType'>
[rank0]:      p           : 0.0
[rank0]: `[email protected]` is not supported because:
[rank0]:     xFormers wasn't build with CUDA support
[rank0]: `cutlassF-pt` is not supported because:
[rank0]:     xFormers wasn't build with CUDA support

The output of python -m xformers.infoxFormers

xFormers 0.0.28.post3
memory_efficient_attention.ckF:                    unavailable
memory_efficient_attention.ckB:                    unavailable
memory_efficient_attention.ck_decoderF:            unavailable
memory_efficient_attention.ck_splitKF:             unavailable
memory_efficient_attention.cutlassF-pt:            available
memory_efficient_attention.cutlassB-pt:            available
[email protected]:         available
[email protected]:         available
[email protected]:             unavailable
[email protected]:             unavailable
memory_efficient_attention.triton_splitKF:         available
indexing.scaled_index_addF:                        available
indexing.scaled_index_addB:                        available
indexing.index_select:                             available
sequence_parallel_fused.write_values:              available
sequence_parallel_fused.wait_values:               available
sequence_parallel_fused.cuda_memset_32b_async:     available
sp24.sparse24_sparsify_both_ways:                  available
sp24.sparse24_apply:                               available
sp24.sparse24_apply_dense_output:                  available
sp24._sparse24_gemm:                               available
[email protected]:                 available
[email protected]:                        available
swiglu.dual_gemm_silu:                             available
swiglu.gemm_fused_operand_sum:                     available
swiglu.fused.p.cpp:                                available
is_triton_available:                               True
pytorch.version:                                   2.5.1+cu121
pytorch.cuda:                                      available
gpu.compute_capability:                            8.0
gpu.name:                                          NVIDIA A100-SXM4-80GB
dcgm_profiler:                                     unavailable
build.info:                                        available
build.cuda_version:                                None
build.hip_version:                                 None
build.python_version:                              3.10.15
build.torch_version:                               2.5.1+cu121
build.env.TORCH_CUDA_ARCH_LIST:                    None
build.env.PYTORCH_ROCM_ARCH:                       None
build.env.XFORMERS_BUILD_TYPE:                     None
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS:        None
build.env.NVCC_FLAGS:                              None
build.env.XFORMERS_PACKAGE_FROM:                   None
source.privacy:                                    open source

How could I solve this?

The text was updated successfully, but these errors were encountered:

lw · 2025-01-06T10:31:15Z

How did you install xFormers? Did you build from source? Please install our pre-built packages from PyPI

Carkham closed this as completed Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No operator found for `memory_efficient_attention_forward` with inputs #1191

No operator found for `memory_efficient_attention_forward` with inputs #1191

Carkham commented Jan 6, 2025 •

edited

Loading

lw commented Jan 6, 2025

No operator found for memory_efficient_attention_forward with inputs #1191

No operator found for memory_efficient_attention_forward with inputs #1191

Comments

Carkham commented Jan 6, 2025 • edited Loading

❓ Questions and Help

lw commented Jan 6, 2025

No operator found for `memory_efficient_attention_forward` with inputs #1191

No operator found for `memory_efficient_attention_forward` with inputs #1191

Carkham commented Jan 6, 2025 •

edited

Loading