xformers on nvidia jetson orin agx #1173

obliviate1230 · 2024-12-12T07:27:02Z

❓ Questions and Help

I want to use xformers on nvidia jetson orin agx, but I got this error.

I run torch.distributed.is_available() in Python and get False. This problem may be preventing xformers from working properly, but I need to figure out how to fix it. Thanks for your reply.

Environment

JetPack: 6.1
Python: 3.10
PyTorch: 2.5
Torchvision: 0.20

Error

Traceback (most recent call last):
  File "/home/nvidia/cogvlm-chat.py", line 7, in <module>
    model = AutoModelForCausalLM.from_pretrained(
  File "/home/nvidia/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 553, in from_pretrained
    model_class = get_class_from_dynamic_module(
  File "/home/nvidia/.local/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 553, in get_class_from_dynamic_module
    return get_class_in_module(class_name, final_module, force_reload=force_download)
  File "/home/nvidia/.local/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 250, in get_class_in_module
    module_spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/home/nvidia/.cache/huggingface/modules/transformers_modules/THUDM/cogvlm-chat-hf/e29dc3ba206d524bf8efbfc60d80fc4556ab0e3c/modeling_cogvlm.py", line 19, in <module>
    from .visual import EVA2CLIPModel
  File "/home/nvidia/.cache/huggingface/modules/transformers_modules/THUDM/cogvlm-chat-hf/e29dc3ba206d524bf8efbfc60d80fc4556ab0e3c/visual.py", line 4, in <module>
    import xformers.ops as xops
  File "/home/nvidia/.local/lib/python3.10/site-packages/xformers/ops/__init__.py", line 25, in <module>
    from .modpar_layers import ColumnParallelLinear, RowParallelLinear
  File "/home/nvidia/.local/lib/python3.10/site-packages/xformers/ops/modpar_layers.py", line 11, in <module>
    from .differentiable_collectives import (
  File "/home/nvidia/.local/lib/python3.10/site-packages/xformers/ops/differentiable_collectives.py", line 27, in <module>
    ) -> Tuple[torch.Tensor, Optional[torch.distributed.Work]]:
AttributeError: module 'torch.distributed' has no attribute 'Work'

The text was updated successfully, but these errors were encountered:

lw · 2024-12-12T08:51:10Z

Yes indeed if you get torch.distributed.is_available() it might explain this issue. I didn't think that this was possible (i.e., I thought that the modules/classes would still be available even if they don't work). I'd be interested in understanding what's going on in your system for this to happen. Could you elaborate what system you have (Linux? Mac? Windows?) and how you installed PyTorch (the exact build configuration, with or without CUDA, ...). If you run python -m xformers.info you should be able to get some of that info.

In the meantime you might be able to unblock yourself by replacing torch.distributed.Work with a string "torch.distributed.Work". Although it's possible that this will just uncover even more errors down the line.

obliviate1230 · 2024-12-12T14:07:41Z

Thanks for your reply.
The detail information of the system are

Machine: aarch64
System: Linux
Distribution: Ubuntu 22.04 Jammy
Release: 5.15.148-tegra
Python: 3.10.12
CUDA: 12.6.68
cuDNN:9.3.0.75

I got the PyTorch on jetson download center. PyTorch for jetpack version JP6.1.

I run python -m xformers.info and get a similar error.

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/nvidia/.local/lib/python3.10/site-packages/xformers/info.py", line 11, in <module>
    from . import __version__, _cpp_lib, _is_opensource, _is_triton_available, ops
  File "/home/nvidia/.local/lib/python3.10/site-packages/xformers/ops/__init__.py", line 25, in <module>
    from .modpar_layers import ColumnParallelLinear, RowParallelLinear
  File "/home/nvidia/.local/lib/python3.10/site-packages/xformers/ops/modpar_layers.py", line 11, in <module>
    from .differentiable_collectives import (
  File "/home/nvidia/.local/lib/python3.10/site-packages/xformers/ops/differentiable_collectives.py", line 27, in <module>
    ) -> Tuple[torch.Tensor, Optional[torch.distributed.Work]]:
AttributeError: module 'torch.distributed' has no attribute 'Work'

It seems to be an issue with my platform.

lw · 2024-12-12T14:15:57Z

I don't know the "jetson download center". Could you try installing PyTorch from the official channels? You can find details here: https://pytorch.org/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xformers on nvidia jetson orin agx #1173

xformers on nvidia jetson orin agx #1173

obliviate1230 commented Dec 12, 2024

lw commented Dec 12, 2024

obliviate1230 commented Dec 12, 2024

lw commented Dec 12, 2024

xformers on nvidia jetson orin agx #1173

xformers on nvidia jetson orin agx #1173

Comments

obliviate1230 commented Dec 12, 2024

❓ Questions and Help

Environment

Error

lw commented Dec 12, 2024

obliviate1230 commented Dec 12, 2024

lw commented Dec 12, 2024