Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xformers on nvidia jetson orin agx #1173

Open
obliviate1230 opened this issue Dec 12, 2024 · 3 comments
Open

xformers on nvidia jetson orin agx #1173

obliviate1230 opened this issue Dec 12, 2024 · 3 comments

Comments

@obliviate1230
Copy link

❓ Questions and Help

I want to use xformers on nvidia jetson orin agx, but I got this error.

I run torch.distributed.is_available() in Python and get False. This problem may be preventing xformers from working properly, but I need to figure out how to fix it. Thanks for your reply.

Environment

JetPack: 6.1
Python: 3.10
PyTorch: 2.5
Torchvision: 0.20

Error

Traceback (most recent call last):
  File "/home/nvidia/cogvlm-chat.py", line 7, in <module>
    model = AutoModelForCausalLM.from_pretrained(
  File "/home/nvidia/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 553, in from_pretrained
    model_class = get_class_from_dynamic_module(
  File "/home/nvidia/.local/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 553, in get_class_from_dynamic_module
    return get_class_in_module(class_name, final_module, force_reload=force_download)
  File "/home/nvidia/.local/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 250, in get_class_in_module
    module_spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/home/nvidia/.cache/huggingface/modules/transformers_modules/THUDM/cogvlm-chat-hf/e29dc3ba206d524bf8efbfc60d80fc4556ab0e3c/modeling_cogvlm.py", line 19, in <module>
    from .visual import EVA2CLIPModel
  File "/home/nvidia/.cache/huggingface/modules/transformers_modules/THUDM/cogvlm-chat-hf/e29dc3ba206d524bf8efbfc60d80fc4556ab0e3c/visual.py", line 4, in <module>
    import xformers.ops as xops
  File "/home/nvidia/.local/lib/python3.10/site-packages/xformers/ops/__init__.py", line 25, in <module>
    from .modpar_layers import ColumnParallelLinear, RowParallelLinear
  File "/home/nvidia/.local/lib/python3.10/site-packages/xformers/ops/modpar_layers.py", line 11, in <module>
    from .differentiable_collectives import (
  File "/home/nvidia/.local/lib/python3.10/site-packages/xformers/ops/differentiable_collectives.py", line 27, in <module>
    ) -> Tuple[torch.Tensor, Optional[torch.distributed.Work]]:
AttributeError: module 'torch.distributed' has no attribute 'Work'
@lw
Copy link
Contributor

lw commented Dec 12, 2024

Yes indeed if you get torch.distributed.is_available() it might explain this issue. I didn't think that this was possible (i.e., I thought that the modules/classes would still be available even if they don't work). I'd be interested in understanding what's going on in your system for this to happen. Could you elaborate what system you have (Linux? Mac? Windows?) and how you installed PyTorch (the exact build configuration, with or without CUDA, ...). If you run python -m xformers.info you should be able to get some of that info.

In the meantime you might be able to unblock yourself by replacing torch.distributed.Work with a string "torch.distributed.Work". Although it's possible that this will just uncover even more errors down the line.

@obliviate1230
Copy link
Author

Thanks for your reply.
The detail information of the system are

  • Machine: aarch64
  • System: Linux
  • Distribution: Ubuntu 22.04 Jammy
  • Release: 5.15.148-tegra
  • Python: 3.10.12
  • CUDA: 12.6.68
  • cuDNN:9.3.0.75

I got the PyTorch on jetson download center. PyTorch for jetpack version JP6.1.

I run python -m xformers.info and get a similar error.

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/nvidia/.local/lib/python3.10/site-packages/xformers/info.py", line 11, in <module>
    from . import __version__, _cpp_lib, _is_opensource, _is_triton_available, ops
  File "/home/nvidia/.local/lib/python3.10/site-packages/xformers/ops/__init__.py", line 25, in <module>
    from .modpar_layers import ColumnParallelLinear, RowParallelLinear
  File "/home/nvidia/.local/lib/python3.10/site-packages/xformers/ops/modpar_layers.py", line 11, in <module>
    from .differentiable_collectives import (
  File "/home/nvidia/.local/lib/python3.10/site-packages/xformers/ops/differentiable_collectives.py", line 27, in <module>
    ) -> Tuple[torch.Tensor, Optional[torch.distributed.Work]]:
AttributeError: module 'torch.distributed' has no attribute 'Work'

It seems to be an issue with my platform.

@lw
Copy link
Contributor

lw commented Dec 12, 2024

I don't know the "jetson download center". Could you try installing PyTorch from the official channels? You can find details here: https://pytorch.org/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants