Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vllm 0.6.3 createLLM error TypeError: autotune() got an unexpected keyword argument 'use_cuda_graph' on windows #1138

Open
xiezhipeng-git opened this issue Oct 30, 2024 · 4 comments

Comments

@xiezhipeng-git
Copy link

xiezhipeng-git commented Oct 30, 2024

🐛 Bug

from vllm import LLM, SamplingParams
llm = LLM(model=model_dir,enforce_eager=True)

then

File d:\my\env\python3.10.10\lib\site-packages\xformers\ops\fmha\_triton\splitk_kernels.py:614, in autotune_kernel(kernel)
    [604](file:///D:/my/env/python3.10.10/lib/site-packages/xformers/ops/fmha/_triton/splitk_kernels.py:604) WARPS_VALUES = [1, 2, 4]
    [606](file:///D:/my/env/python3.10.10/lib/site-packages/xformers/ops/fmha/_triton/splitk_kernels.py:606) TRITON_CONFIGS = [
    [607](file:///D:/my/env/python3.10.10/lib/site-packages/xformers/ops/fmha/_triton/splitk_kernels.py:607)     gen_config(block_m, block_n, stages, warps)
    [608](file:///D:/my/env/python3.10.10/lib/site-packages/xformers/ops/fmha/_triton/splitk_kernels.py:608)     for block_m in BLOCK_M_VALUES
   (...)
    [611](file:///D:/my/env/python3.10.10/lib/site-packages/xformers/ops/fmha/_triton/splitk_kernels.py:611)     for warps in WARPS_VALUES
    [612](file:///D:/my/env/python3.10.10/lib/site-packages/xformers/ops/fmha/_triton/splitk_kernels.py:612) ]
--> [614](file:///D:/my/env/python3.10.10/lib/site-packages/xformers/ops/fmha/_triton/splitk_kernels.py:614) kernel = triton.autotune(
    [615](file:///D:/my/env/python3.10.10/lib/site-packages/xformers/ops/fmha/_triton/splitk_kernels.py:615)     configs=TRITON_CONFIGS,
    [616](file:///D:/my/env/python3.10.10/lib/site-packages/xformers/ops/fmha/_triton/splitk_kernels.py:616)     key=AUTOTUNER_KEY,
    [617](file:///D:/my/env/python3.10.10/lib/site-packages/xformers/ops/fmha/_triton/splitk_kernels.py:617)     use_cuda_graph=True,
    [618](file:///D:/my/env/python3.10.10/lib/site-packages/xformers/ops/fmha/_triton/splitk_kernels.py:618) )(kernel)
    [619](file:///D:/my/env/python3.10.10/lib/site-packages/xformers/ops/fmha/_triton/splitk_kernels.py:619) return kernel

TypeError: autotune() got an unexpected keyword argument 'use_cuda_graph'

Command

To Reproduce

Steps to reproduce the behavior:

1.pip3 install -U xformers --index-url https://download.pytorch.org/whl/cu124
2.git clone https://github.com/vllm-project/vllm.git
cd vllm
python use_existing_torch.py
pip install -r requirements-common.txt
python setup.py install
3. use in vllm
from vllm import LLM, SamplingParams
llm = LLM(model=model_dir,enforce_eager=True)

Expected behavior

Environment

Please copy and paste the output from the
environment collection script from PyTorch
(or fill out the checklist below manually).

You can run the script with:python -m torch.utils.collect_env

PyTorch version: 2.5.0+cu124
Is debug build: False
CUDA used to build PyTorch: 12.4
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 11 专业版
GCC version: (x86_64-posix-seh-rev0, Built by MinGW-Builds project) 13.2.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A

Python version: 3.10.10 (tags/v3.10.10:aad5f6a, Feb  7 2023, 17:20:36) [MSC v.1929 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.22631-SP0
Is CUDA available: True
CUDA runtime version: 12.5.40
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 4090
Nvidia driver version: 560.94
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture=9
CurrentClockSpeed=3200
DeviceID=CPU0
Family=207
L2CacheSize=32768
L2CacheSpeed=
Manufacturer=GenuineIntel
MaxClockSpeed=3200
Name=13th Gen Intel(R) Core(TM) i9-13900KS
ProcessorType=3
Revision=

Versions of relevant libraries:
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.4
[pip3] torch==2.5.0+cu124
[pip3] torchaudio==2.5.0+cu124
[pip3] torchvision==0.20.0+cu124
[pip3] triton==2.1.0
[pip3] vector-quantize-pytorch==1.14.24
[conda] Could not collect
  • PyTorch Version (e.g., 1.0):Name: torch Version: 2.5.0+cu124
  • OS (e.g., Linux):windows
  • How you installed PyTorch (conda, pip, source):pip
  • Build command you used (if compiling from source):
  • Python version:3.10.10
  • CUDA/cuDNN version:Build cuda_12.5.r12.5/compiler.34177558_0
  • GPU models and configuration:NVIDIA GeForce RTX 4090
  • Any other relevant information:

Additional context

full info

TypeError Traceback (most recent call last)
Cell In[2], line 5
1 from vllm import LLM, SamplingParams
3 # model_dir='Qwen2.5-14B-Instruct-GPTQ-Int4'
----> 5 llm = LLM(model=model_dir,enforce_eager=True)
6 sampling_params = SamplingParams( top_p=0.9, max_tokens=512,top_k=10)
8 prompt = "1+1等于几"

File d:\my\env\python3.10.10\lib\site-packages\vllm-0.6.3.post2.dev156+g04a3ae0a.d20241030-py3.10.egg\vllm\utils.py:1023, in deprecate_args..wrapper..inner(*args, **kwargs)
1016 msg += f" {additional_message}"
1018 warnings.warn(
1019 DeprecationWarning(msg),
1020 stacklevel=3, # The inner function takes up one level
1021 )
-> 1023 return fn(*args, **kwargs)

File d:\my\env\python3.10.10\lib\site-packages\vllm-0.6.3.post2.dev156+g04a3ae0a.d20241030-py3.10.egg\vllm\entrypoints\llm.py:198, in LLM.init(self, model, tokenizer, tokenizer_mode, skip_tokenizer_init, trust_remote_code, tensor_parallel_size, dtype, quantization, revision, tokenizer_revision, seed, gpu_memory_utilization, swap_space, cpu_offload_gb, enforce_eager, max_context_len_to_capture, max_seq_len_to_capture, disable_custom_all_reduce, disable_async_output_proc, mm_processor_kwargs, task, **kwargs)
172 kwargs["disable_log_stats"] = True
174 engine_args = EngineArgs(
175 model=model,
176 task=task,
(...)
196 **kwargs,
197 )
--> 198 self.llm_engine = LLMEngine.from_engine_args(
199 engine_args, usage_context=UsageContext.LLM_CLASS)
200 self.request_counter = Counter()

File d:\my\env\python3.10.10\lib\site-packages\vllm-0.6.3.post2.dev156+g04a3ae0a.d20241030-py3.10.egg\vllm\engine\llm_engine.py:582, in LLMEngine.from_engine_args(cls, engine_args, usage_context, stat_loggers)
580 executor_class = cls._get_executor_cls(engine_config)
581 # Create the LLM engine.
--> 582 engine = cls(
583 **engine_config.to_dict(),
584 executor_class=executor_class,
585 log_stats=not engine_args.disable_log_stats,
586 usage_context=usage_context,
587 stat_loggers=stat_loggers,
588 )
590 return engine

File d:\my\env\python3.10.10\lib\site-packages\vllm-0.6.3.post2.dev156+g04a3ae0a.d20241030-py3.10.egg\vllm\engine\llm_engine.py:341, in LLMEngine.init(self, model_config, cache_config, parallel_config, scheduler_config, device_config, load_config, lora_config, speculative_config, decoding_config, observability_config, prompt_adapter_config, executor_class, log_stats, usage_context, stat_loggers, input_registry, use_cached_outputs)
337 self.input_registry = input_registry
338 self.input_processor = input_registry.create_input_processor(
339 model_config)
--> 341 self.model_executor = executor_class(
342 model_config=model_config,
343 cache_config=cache_config,
344 parallel_config=parallel_config,
345 scheduler_config=scheduler_config,
346 device_config=device_config,
347 lora_config=lora_config,
348 speculative_config=speculative_config,
349 load_config=load_config,
350 prompt_adapter_config=prompt_adapter_config,
351 observability_config=self.observability_config,
352 )
354 if self.model_config.task != "embedding":
355 self._initialize_kv_caches()

File d:\my\env\python3.10.10\lib\site-packages\vllm-0.6.3.post2.dev156+g04a3ae0a.d20241030-py3.10.egg\vllm\executor\executor_base.py:47, in ExecutorBase.init(self, model_config, cache_config, parallel_config, scheduler_config, device_config, load_config, lora_config, speculative_config, prompt_adapter_config, observability_config)
45 self.prompt_adapter_config = prompt_adapter_config
46 self.observability_config = observability_config
---> 47 self._init_executor()

File d:\my\env\python3.10.10\lib\site-packages\vllm-0.6.3.post2.dev156+g04a3ae0a.d20241030-py3.10.egg\vllm\executor\gpu_executor.py:38, in GPUExecutor._init_executor(self)
33 """Initialize the worker and load the model.
34 """
35 assert self.parallel_config.world_size == 1, (
36 "GPUExecutor only supports single GPU.")
---> 38 self.driver_worker = self._create_worker()
39 self.driver_worker.init_device()
40 self.driver_worker.load_model()

File d:\my\env\python3.10.10\lib\site-packages\vllm-0.6.3.post2.dev156+g04a3ae0a.d20241030-py3.10.egg\vllm\executor\gpu_executor.py:105, in GPUExecutor._create_worker(self, local_rank, rank, distributed_init_method)
101 def _create_worker(self,
102 local_rank: int = 0,
103 rank: int = 0,
104 distributed_init_method: Optional[str] = None):
--> 105 return create_worker(**self._get_create_worker_kwargs(
106 local_rank=local_rank,
107 rank=rank,
108 distributed_init_method=distributed_init_method))

File d:\my\env\python3.10.10\lib\site-packages\vllm-0.6.3.post2.dev156+g04a3ae0a.d20241030-py3.10.egg\vllm\executor\gpu_executor.py:24, in create_worker(worker_module_name, worker_class_name, worker_class_fn, **kwargs)
16 def create_worker(worker_module_name: str, worker_class_name: str,
17 worker_class_fn: Optional[Callable[[], Type[WorkerBase]]],
18 **kwargs):
19 wrapper = WorkerWrapperBase(
20 worker_module_name=worker_module_name,
21 worker_class_name=worker_class_name,
22 worker_class_fn=worker_class_fn,
23 )
---> 24 wrapper.init_worker(**kwargs)
25 return wrapper.worker

File d:\my\env\python3.10.10\lib\site-packages\vllm-0.6.3.post2.dev156+g04a3ae0a.d20241030-py3.10.egg\vllm\worker\worker_base.py:449, in WorkerWrapperBase.init_worker(self, *args, **kwargs)
446 mod = importlib.import_module(self.worker_module_name)
447 worker_class = getattr(mod, self.worker_class_name)
--> 449 self.worker = worker_class(*args, **kwargs)
450 assert self.worker is not None

File d:\my\env\python3.10.10\lib\site-packages\vllm-0.6.3.post2.dev156+g04a3ae0a.d20241030-py3.10.egg\vllm\worker\worker.py:99, in Worker.init(self, model_config, parallel_config, scheduler_config, device_config, cache_config, load_config, local_rank, rank, distributed_init_method, lora_config, speculative_config, prompt_adapter_config, is_driver_worker, model_runner_cls, observability_config)
97 elif self._is_encoder_decoder_model():
98 ModelRunnerClass = EncoderDecoderModelRunner
---> 99 self.model_runner: GPUModelRunnerBase = ModelRunnerClass(
100 model_config,
101 parallel_config,
102 scheduler_config,
103 device_config,
104 cache_config,
105 load_config=load_config,
106 lora_config=self.lora_config,
107 kv_cache_dtype=self.cache_config.cache_dtype,
108 is_driver_worker=is_driver_worker,
109 prompt_adapter_config=prompt_adapter_config,
110 observability_config=observability_config,
111 **speculative_args,
112 )
113 # Uninitialized cache engine. Will be initialized by
114 # initialize_cache.
115 self.cache_engine: List[CacheEngine]

File d:\my\env\python3.10.10\lib\site-packages\vllm-0.6.3.post2.dev156+g04a3ae0a.d20241030-py3.10.egg\vllm\worker\model_runner.py:1013, in GPUModelRunnerBase.init(self, model_config, parallel_config, scheduler_config, device_config, cache_config, load_config, lora_config, kv_cache_dtype, is_driver_worker, prompt_adapter_config, return_hidden_states, observability_config, input_registry, mm_registry)
1008 num_attn_heads = self.model_config.get_num_attention_heads(
1009 self.parallel_config)
1010 needs_attn_backend = (num_attn_heads != 0
1011 or self.model_config.is_attention_free)
-> 1013 self.attn_backend = get_attn_backend(
1014 self.model_config.get_head_size(),
1015 self.model_config.dtype,
1016 self.kv_cache_dtype,
1017 self.block_size,
1018 self.model_config.is_attention_free,
1019 ) if needs_attn_backend else None
1020 if self.attn_backend:
1021 self.attn_state = self.attn_backend.get_state_cls()(
1022 weakref.proxy(self))

File d:\my\env\python3.10.10\lib\site-packages\vllm-0.6.3.post2.dev156+g04a3ae0a.d20241030-py3.10.egg\vllm\attention\selector.py:120, in get_attn_backend(head_size, dtype, kv_cache_dtype, block_size, is_attention_free, is_blocksparse)
118 if backend == _Backend.XFORMERS:
119 logger.info("Using XFormers backend.")
--> 120 from vllm.attention.backends.xformers import ( # noqa: F401
121 XFormersBackend)
122 return XFormersBackend
123 elif backend == _Backend.ROCM_FLASH:

File d:\my\env\python3.10.10\lib\site-packages\vllm-0.6.3.post2.dev156+g04a3ae0a.d20241030-py3.10.egg\vllm\attention\backends\xformers.py:6
3 from typing import Any, Dict, List, Optional, Tuple, Type
5 import torch
----> 6 from xformers import ops as xops
7 from xformers.ops.fmha.attn_bias import (AttentionBias,
8 BlockDiagonalCausalMask,
9 BlockDiagonalMask,
10 LowerTriangularMaskWithTensorBias)
12 from vllm.attention.backends.abstract import (AttentionBackend, AttentionImpl,
13 AttentionMetadata, AttentionType)

File d:\my\env\python3.10.10\lib\site-packages\xformers\ops_init_.py:8
1 # Copyright (c) Facebook, Inc. and its affiliates. All rights reserved.
2 #
3 # This source code is licensed under the BSD license found in the
4 # LICENSE file in the root directory of this source tree.
6 import torch
----> 8 from .fmha import (
9 AttentionBias,
10 AttentionOp,
11 AttentionOpBase,
12 LowerTriangularMask,
13 MemoryEfficientAttentionCkOp,
14 MemoryEfficientAttentionCutlassFwdFlashBwOp,
15 MemoryEfficientAttentionCutlassOp,
16 MemoryEfficientAttentionFlashAttentionOp,
17 MemoryEfficientAttentionSplitKCkOp,
18 memory_efficient_attention,
19 memory_efficient_attention_backward,
20 memory_efficient_attention_forward,
21 memory_efficient_attention_forward_requires_grad,
22 )
23 from .indexing import index_select_cat, scaled_index_add
24 from .ipc import init_ipc

File d:\my\env\python3.10.10\lib\site-packages\xformers\ops\fmha_init_.py:10
6 from typing import Any, List, Optional, Sequence, Tuple, Type, Union, cast
8 import torch
---> 10 from . import (
11 attn_bias,
12 ck,
13 ck_decoder,
14 ck_splitk,
15 cutlass,
16 flash,
17 flash3,
18 triton_splitk,
19 )
20 from .attn_bias import VARLEN_BIASES, AttentionBias, LowerTriangularMask
21 from .common import (
22 AttentionBwOpBase,
23 AttentionFwOpBase,
(...)
29 bmk2bmhk,
30 )

File d:\my\env\python3.10.10\lib\site-packages\xformers\ops\fmha\triton_splitk.py:110
94 return (
95 super(InputsFp8, self).nbytes
96 + (
(...)
105 )
106 )
109 if TYPE_CHECKING or _is_triton_available():
--> 110 from ._triton.splitk_kernels import _fwd_kernel_splitK, _splitK_reduce
111 else:
112 _fwd_kernel_splitK = None

File d:\my\env\python3.10.10\lib\site-packages\xformers\ops\fmha_triton\splitk_kernels.py:632
629 if sys.version_info >= (3, 9):
630 # unroll_varargs requires Python 3.9+
631 for num_groups in [1, 2, 4, 8]:
--> 632 _fwd_kernel_splitK_autotune[num_groups] = autotune_kernel(
633 _get_splitk_kernel(num_groups)
634 )
636 def get_autotuner_cache(
637 num_groups: int,
638 ) -> Dict[Tuple[Union[int, str]], triton.Config]:
639 """Returns a triton.runtime.autotuner.AutoTuner.cache object, which
640 represents mappings from kernel autotune keys (tuples describing kernel inputs)
641 to triton.Config
642 """

File d:\my\env\python3.10.10\lib\site-packages\xformers\ops\fmha_triton\splitk_kernels.py:614, in autotune_kernel(kernel)
604 WARPS_VALUES = [1, 2, 4]
606 TRITON_CONFIGS = [
607 gen_config(block_m, block_n, stages, warps)
608 for block_m in BLOCK_M_VALUES
(...)
611 for warps in WARPS_VALUES
612 ]
--> 614 kernel = triton.autotune(
615 configs=TRITON_CONFIGS,
616 key=AUTOTUNER_KEY,
617 use_cuda_graph=True,
618 )(kernel)
619 return kernel

TypeError: autotune() got an unexpected keyword argument 'use_cuda_graph'

@lw
Copy link
Contributor

lw commented Oct 30, 2024

I'd guess something is up with your version of Triton. Did you perhaps install a very old version on top of the one provided by PyTorch?

@xiezhipeng-git
Copy link
Author

xiezhipeng-git commented Oct 30, 2024

That seems to be the case.I'll try to update it.

@xiezhipeng-git
Copy link
Author

xiezhipeng-git commented Oct 30, 2024

File d:\my\env\python3.10.10\lib\site-packages\vllm-0.6.3.post2.dev156+g04a3ae0a.d20241030-py3.10.egg\vllm\model_executor\layers\fused_moe\fused_moe.py:8
      [5](file:///D:/my/env/python3.10.10/lib/site-packages/vllm-0.6.3.post2.dev156+g04a3ae0a.d20241030-py3.10.egg/vllm/model_executor/layers/fused_moe/fused_moe.py:5) from typing import Any, Callable, Dict, Optional, Tuple
      [7](file:///D:/my/env/python3.10.10/lib/site-packages/vllm-0.6.3.post2.dev156+g04a3ae0a.d20241030-py3.10.egg/vllm/model_executor/layers/fused_moe/fused_moe.py:7) import torch
----> [8](file:///D:/my/env/python3.10.10/lib/site-packages/vllm-0.6.3.post2.dev156+g04a3ae0a.d20241030-py3.10.egg/vllm/model_executor/layers/fused_moe/fused_moe.py:8) import triton
      [9](file:///D:/my/env/python3.10.10/lib/site-packages/vllm-0.6.3.post2.dev156+g04a3ae0a.d20241030-py3.10.egg/vllm/model_executor/layers/fused_moe/fused_moe.py:9) import triton.language as tl
     [11](file:///D:/my/env/python3.10.10/lib/site-packages/vllm-0.6.3.post2.dev156+g04a3ae0a.d20241030-py3.10.egg/vllm/model_executor/layers/fused_moe/fused_moe.py:11) import vllm.envs as envs

File d:\my\env\python3.10.10\lib\site-packages\triton\__init__.py:8
      [2](file:///D:/my/env/python3.10.10/lib/site-packages/triton/__init__.py:2) __version__ = '3.0.0'
      [4](file:///D:/my/env/python3.10.10/lib/site-packages/triton/__init__.py:4) # ---------------------------------------
      [5](file:///D:/my/env/python3.10.10/lib/site-packages/triton/__init__.py:5) # Note: import order is significant here.
      [6](file:///D:/my/env/python3.10.10/lib/site-packages/triton/__init__.py:6) 
      [7](file:///D:/my/env/python3.10.10/lib/site-packages/triton/__init__.py:7) # submodules
----> [8](file:///D:/my/env/python3.10.10/lib/site-packages/triton/__init__.py:8) from .runtime import (
      [9](file:///D:/my/env/python3.10.10/lib/site-packages/triton/__init__.py:9)     autotune,
     [10](file:///D:/my/env/python3.10.10/lib/site-packages/triton/__init__.py:10)     Config,
     [11](file:///D:/my/env/python3.10.10/lib/site-packages/triton/__init__.py:11)     heuristics,
     [12](file:///D:/my/env/python3.10.10/lib/site-packages/triton/__init__.py:12)     JITFunction,
     [13](file:///D:/my/env/python3.10.10/lib/site-packages/triton/__init__.py:13)     KernelInterface,
     [14](file:///D:/my/env/python3.10.10/lib/site-packages/triton/__init__.py:14)     reinterpret,
     [15](file:///D:/my/env/python3.10.10/lib/site-packages/triton/__init__.py:15)     TensorWrapper,
     [16](file:///D:/my/env/python3.10.10/lib/site-packages/triton/__init__.py:16)     OutOfResources,
     [17](file:///D:/my/env/python3.10.10/lib/site-packages/triton/__init__.py:17)     InterpreterError,
     [18](file:///D:/my/env/python3.10.10/lib/site-packages/triton/__init__.py:18)     MockTensor,
     [19](file:///D:/my/env/python3.10.10/lib/site-packages/triton/__init__.py:19) )
     [20](file:///D:/my/env/python3.10.10/lib/site-packages/triton/__init__.py:20) from .runtime.jit import jit
     [21](file:///D:/my/env/python3.10.10/lib/site-packages/triton/__init__.py:21) from .compiler import compile, CompilationError

File d:\my\env\python3.10.10\lib\site-packages\triton\runtime\__init__.py:[1](file:///D:/my/env/python3.10.10/lib/site-packages/triton/runtime/__init__.py:1)
----> 1 from .autotuner import (Autotuner, Config, Heuristics, autotune, heuristics)
      [2](file:///D:/my/env/python3.10.10/lib/site-packages/triton/runtime/__init__.py:2) from .cache import RedisRemoteCacheBackend, RemoteCacheBackend
      [3](file:///D:/my/env/python3.10.10/lib/site-packages/triton/runtime/__init__.py:3) from .driver import driver

File d:\my\env\python3.10.10\lib\site-packages\triton\runtime\autotuner.py:9
      [6](file:///D:/my/env/python3.10.10/lib/site-packages/triton/runtime/autotuner.py:6) import inspect
      [7](file:///D:/my/env/python3.10.10/lib/site-packages/triton/runtime/autotuner.py:7) from typing import Dict
----> [9](file:///D:/my/env/python3.10.10/lib/site-packages/triton/runtime/autotuner.py:9) from ..testing import do_bench, do_bench_cudagraph
     [10](file:///D:/my/env/python3.10.10/lib/site-packages/triton/runtime/autotuner.py:10) from .jit import KernelInterface
     [11](file:///D:/my/env/python3.10.10/lib/site-packages/triton/runtime/autotuner.py:11) from .errors import OutOfResources

File d:\my\env\python3.10.10\lib\site-packages\triton\testing.py:7
      [5](file:///D:/my/env/python3.10.10/lib/site-packages/triton/testing.py:5) from contextlib import contextmanager
      [6](file:///D:/my/env/python3.10.10/lib/site-packages/triton/testing.py:6) from typing import Any, Dict, List
----> [7](file:///D:/my/env/python3.10.10/lib/site-packages/triton/testing.py:7) from . import language as tl
     [10](file:///D:/my/env/python3.10.10/lib/site-packages/triton/testing.py:10) def nvsmi(attrs):
     [11](file:///D:/my/env/python3.10.10/lib/site-packages/triton/testing.py:11)     attrs = ','.join(attrs)

File d:\my\env\python3.10.10\lib\site-packages\triton\language\__init__.py:4
      [1](file:///D:/my/env/python3.10.10/lib/site-packages/triton/language/__init__.py:1) """isort:skip_file"""
      [2](file:///D:/my/env/python3.10.10/lib/site-packages/triton/language/__init__.py:2) # Import order is significant here.
----> [4](file:///D:/my/env/python3.10.10/lib/site-packages/triton/language/__init__.py:4) from . import math
      [5](file:///D:/my/env/python3.10.10/lib/site-packages/triton/language/__init__.py:5) from . import extra
      [6](file:///D:/my/env/python3.10.10/lib/site-packages/triton/language/__init__.py:6) from .standard import (
      [7](file:///D:/my/env/python3.10.10/lib/site-packages/triton/language/__init__.py:7)     argmax,
      [8](file:///D:/my/env/python3.10.10/lib/site-packages/triton/language/__init__.py:8)     argmin,
   (...)
     [24](file:///D:/my/env/python3.10.10/lib/site-packages/triton/language/__init__.py:24)     zeros_like,
     [25](file:///D:/my/env/python3.10.10/lib/site-packages/triton/language/__init__.py:25) )

File d:\my\env\python3.10.10\lib\site-packages\triton\language\math.py:1
----> [1](file:///D:/my/env/python3.10.10/lib/site-packages/triton/language/math.py:1) from . import core
      [2](file:///D:/my/env/python3.10.10/lib/site-packages/triton/language/math.py:2) from . import semantic
      [3](file:///D:/my/env/python3.10.10/lib/site-packages/triton/language/math.py:3) from functools import wraps

File d:\my\env\python3.10.10\lib\site-packages\triton\language\core.py:10
      [8](file:///D:/my/env/python3.10.10/lib/site-packages/triton/language/core.py:8) from typing import Union, Callable, List, Sequence, TypeVar, Optional
      [9](file:///D:/my/env/python3.10.10/lib/site-packages/triton/language/core.py:9) import builtins
---> [10](file:///D:/my/env/python3.10.10/lib/site-packages/triton/language/core.py:10) from ..runtime.jit import jit
     [11](file:///D:/my/env/python3.10.10/lib/site-packages/triton/language/core.py:11) import inspect
     [12](file:///D:/my/env/python3.10.10/lib/site-packages/triton/language/core.py:12) import os

File d:\my\env\python3.10.10\lib\site-packages\triton\runtime\jit.py:12
     [10](file:///D:/my/env/python3.10.10/lib/site-packages/triton/runtime/jit.py:10) from functools import cached_property
     [11](file:///D:/my/env/python3.10.10/lib/site-packages/triton/runtime/jit.py:11) from typing import Callable, Generic, Iterable, Optional, TypeVar, Union, overload, Dict, Any, Tuple
---> [12](file:///D:/my/env/python3.10.10/lib/site-packages/triton/runtime/jit.py:12) from ..runtime.driver import driver
     [13](file:///D:/my/env/python3.10.10/lib/site-packages/triton/runtime/jit.py:13) from types import ModuleType
     [15](file:///D:/my/env/python3.10.10/lib/site-packages/triton/runtime/jit.py:15) TRITON_MODULE = __name__[:-len(".runtime.jit")]

File d:\my\env\python3.10.10\lib\site-packages\triton\runtime\driver.py:1
----> [1](file:///D:/my/env/python3.10.10/lib/site-packages/triton/runtime/driver.py:1) from ..backends import backends
      [2](file:///D:/my/env/python3.10.10/lib/site-packages/triton/runtime/driver.py:2) from ..backends import DriverBase
      [5](file:///D:/my/env/python3.10.10/lib/site-packages/triton/runtime/driver.py:5) def _create_driver():

File d:\my\env\python3.10.10\lib\site-packages\triton\backends\__init__.py:50
     [45](file:///D:/my/env/python3.10.10/lib/site-packages/triton/backends/__init__.py:45)         backends[name] = Backend(_find_concrete_subclasses(compiler, BaseBackend),
     [46](file:///D:/my/env/python3.10.10/lib/site-packages/triton/backends/__init__.py:46)                                  _find_concrete_subclasses(driver, DriverBase))
     [47](file:///D:/my/env/python3.10.10/lib/site-packages/triton/backends/__init__.py:47)     return backends
---> [50](file:///D:/my/env/python3.10.10/lib/site-packages/triton/backends/__init__.py:50) backends = _discover_backends()

File d:\my\env\python3.10.10\lib\site-packages\triton\backends\__init__.py:43, in _discover_backends()
     [41](file:///D:/my/env/python3.10.10/lib/site-packages/triton/backends/__init__.py:41) if name.startswith('__'):
     [42](file:///D:/my/env/python3.10.10/lib/site-packages/triton/backends/__init__.py:42)     continue
---> [43](file:///D:/my/env/python3.10.10/lib/site-packages/triton/backends/__init__.py:43) compiler = _load_module(name, os.path.join(root, name, 'compiler.py'))
     [44](file:///D:/my/env/python3.10.10/lib/site-packages/triton/backends/__init__.py:44) driver = _load_module(name, os.path.join(root, name, 'driver.py'))
     [45](file:///D:/my/env/python3.10.10/lib/site-packages/triton/backends/__init__.py:45) backends[name] = Backend(_find_concrete_subclasses(compiler, BaseBackend),
     [46](file:///D:/my/env/python3.10.10/lib/site-packages/triton/backends/__init__.py:46)                          _find_concrete_subclasses(driver, DriverBase))

File d:\my\env\python3.10.10\lib\site-packages\triton\backends\__init__.py:12, in _load_module(name, path)
     [10](file:///D:/my/env/python3.10.10/lib/site-packages/triton/backends/__init__.py:10) spec = importlib.util.spec_from_file_location(name[:-3], path)
     [11](file:///D:/my/env/python3.10.10/lib/site-packages/triton/backends/__init__.py:11) module = importlib.util.module_from_spec(spec)
---> [12](file:///D:/my/env/python3.10.10/lib/site-packages/triton/backends/__init__.py:12) spec.loader.exec_module(module)
     [13](file:///D:/my/env/python3.10.10/lib/site-packages/triton/backends/__init__.py:13) return module

File d:\my\env\python3.10.10\lib\site-packages\triton\backends\amd\compiler.py:2
      [1](file:///D:/my/env/python3.10.10/lib/site-packages/triton/backends/amd/compiler.py:1) from triton.backends.compiler import BaseBackend, GPUTarget
----> [2](file:///D:/my/env/python3.10.10/lib/site-packages/triton/backends/amd/compiler.py:2) from triton._C.libtriton import ir, passes, llvm, amd
      [3](file:///D:/my/env/python3.10.10/lib/site-packages/triton/backends/amd/compiler.py:3) from dataclasses import dataclass
      [4](file:///D:/my/env/python3.10.10/lib/site-packages/triton/backends/amd/compiler.py:4) from typing import Any, Tuple

ImportError: DLL load failed while importing libtriton: 动态链接库(DLL)初始化例程失败。

there are new error .and windows only triton=3.0.0 version

@lw
Copy link
Contributor

lw commented Oct 31, 2024

You seem to be using AMD on Windows, which are two setups we don't fully support because we're unable to test them ourselves.

The issue though still seems to come from your installation of Triton, I'd suggest you check with them.

bertmaher pushed a commit to bertmaher/xformers that referenced this issue Dec 20, 2024
* Update SAC to work with latest PyTorch

This will break for older PyTorch though

* Lint
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants