Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No operator found for memory_efficient_attention_forward with inputs #1191

Closed
Carkham opened this issue Jan 6, 2025 · 1 comment
Closed

Comments

@Carkham
Copy link

Carkham commented Jan 6, 2025

❓ Questions and Help

Hi! I want to use vllm==0.6.6 to accelerate inference. Everything goes well when I use Qwen2VL-2B. But when I change it to InternVL2.5-4B, I get this error:

[rank0]: NotImplementedError: Error in model execution (input dumped to /tmp/err_execute_model_input_20250106-123649.pkl): No operator found for `memory_efficient_attention_forward` with inputs:
[rank0]:      query       : shape=(104, 1025, 16, 64) (torch.bfloat16)
[rank0]:      key         : shape=(104, 1025, 16, 64) (torch.bfloat16)
[rank0]:      value       : shape=(104, 1025, 16, 64) (torch.bfloat16)
[rank0]:      attn_bias   : <class 'NoneType'>
[rank0]:      p           : 0.0
[rank0]: `[email protected]` is not supported because:
[rank0]:     xFormers wasn't build with CUDA support
[rank0]: `cutlassF-pt` is not supported because:
[rank0]:     xFormers wasn't build with CUDA support

The output of python -m xformers.infoxFormers

xFormers 0.0.28.post3
memory_efficient_attention.ckF:                    unavailable
memory_efficient_attention.ckB:                    unavailable
memory_efficient_attention.ck_decoderF:            unavailable
memory_efficient_attention.ck_splitKF:             unavailable
memory_efficient_attention.cutlassF-pt:            available
memory_efficient_attention.cutlassB-pt:            available
[email protected]:         available
[email protected]:         available
[email protected]:             unavailable
[email protected]:             unavailable
memory_efficient_attention.triton_splitKF:         available
indexing.scaled_index_addF:                        available
indexing.scaled_index_addB:                        available
indexing.index_select:                             available
sequence_parallel_fused.write_values:              available
sequence_parallel_fused.wait_values:               available
sequence_parallel_fused.cuda_memset_32b_async:     available
sp24.sparse24_sparsify_both_ways:                  available
sp24.sparse24_apply:                               available
sp24.sparse24_apply_dense_output:                  available
sp24._sparse24_gemm:                               available
[email protected]:                 available
[email protected]:                        available
swiglu.dual_gemm_silu:                             available
swiglu.gemm_fused_operand_sum:                     available
swiglu.fused.p.cpp:                                available
is_triton_available:                               True
pytorch.version:                                   2.5.1+cu121
pytorch.cuda:                                      available
gpu.compute_capability:                            8.0
gpu.name:                                          NVIDIA A100-SXM4-80GB
dcgm_profiler:                                     unavailable
build.info:                                        available
build.cuda_version:                                None
build.hip_version:                                 None
build.python_version:                              3.10.15
build.torch_version:                               2.5.1+cu121
build.env.TORCH_CUDA_ARCH_LIST:                    None
build.env.PYTORCH_ROCM_ARCH:                       None
build.env.XFORMERS_BUILD_TYPE:                     None
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS:        None
build.env.NVCC_FLAGS:                              None
build.env.XFORMERS_PACKAGE_FROM:                   None
source.privacy:                                    open source

How could I solve this?

@lw
Copy link
Contributor

lw commented Jan 6, 2025

How did you install xFormers? Did you build from source? Please install our pre-built packages from PyPI

@Carkham Carkham closed this as completed Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants