Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuda.parallel: Forbid non-contiguous arrays as inputs (or outputs) #3233

Merged

Conversation

shwina
Copy link
Contributor

@shwina shwina commented Jan 2, 2025

Description

Closes #3223.

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@shwina shwina requested a review from a team as a code owner January 2, 2025 16:08
@shwina shwina requested a review from leofang January 2, 2025 16:08
Copy link
Contributor

github-actions bot commented Jan 2, 2025

🟩 CI finished in 23m 32s: Pass: 100%/1 | Total: 23m 32s | Avg: 23m 32s | Max: 23m 32s
  • 🟩 python: Pass: 100%/1 | Total: 23m 32s | Avg: 23m 32s | Max: 23m 32s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 23m 32s | Avg: 23m 32s | Max: 23m 32s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 23m 32s | Avg: 23m 32s | Max: 23m 32s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 23m 32s | Avg: 23m 32s | Max: 23m 32s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 23m 32s | Avg: 23m 32s | Max: 23m 32s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 23m 32s | Avg: 23m 32s | Max: 23m 32s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 23m 32s | Avg: 23m 32s | Max: 23m 32s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 23m 32s | Avg: 23m 32s | Max: 23m 32s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 23m 32s | Avg: 23m 32s | Max: 23m 32s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
CUDA Experimental
+/- python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
CUDA Experimental
+/- python
CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 1)

# Runner
1 linux-amd64-gpu-v100-latest-1

Copy link
Contributor

github-actions bot commented Jan 3, 2025

🟥 CI finished in 25m 36s: Pass: 0%/1 | Total: 25m 36s | Avg: 25m 36s | Max: 25m 36s
  • 🟥 python: Pass: 0%/1 | Total: 25m 36s | Avg: 25m 36s | Max: 25m 36s

    🟥 cpu
      🟥 amd64              Pass:   0%/1   | Total: 25m 36s | Avg: 25m 36s | Max: 25m 36s
    🟥 ctk
      🟥 12.6               Pass:   0%/1   | Total: 25m 36s | Avg: 25m 36s | Max: 25m 36s
    🟥 cudacxx
      🟥 nvcc12.6           Pass:   0%/1   | Total: 25m 36s | Avg: 25m 36s | Max: 25m 36s
    🟥 cudacxx_family
      🟥 nvcc               Pass:   0%/1   | Total: 25m 36s | Avg: 25m 36s | Max: 25m 36s
    🟥 cxx
      🟥 GCC13              Pass:   0%/1   | Total: 25m 36s | Avg: 25m 36s | Max: 25m 36s
    🟥 cxx_family
      🟥 GCC                Pass:   0%/1   | Total: 25m 36s | Avg: 25m 36s | Max: 25m 36s
    🟥 gpu
      🟥 v100               Pass:   0%/1   | Total: 25m 36s | Avg: 25m 36s | Max: 25m 36s
    🟥 jobs
      🟥 Test               Pass:   0%/1   | Total: 25m 36s | Avg: 25m 36s | Max: 25m 36s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
CUDA Experimental
+/- python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
CUDA Experimental
+/- python
CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 1)

# Runner
1 linux-amd64-gpu-v100-latest-1

Copy link
Contributor

github-actions bot commented Jan 3, 2025

🟥 CI finished in 24m 57s: Pass: 0%/1 | Total: 24m 57s | Avg: 24m 57s | Max: 24m 57s
  • 🟥 python: Pass: 0%/1 | Total: 24m 57s | Avg: 24m 57s | Max: 24m 57s

    🟥 cpu
      🟥 amd64              Pass:   0%/1   | Total: 24m 57s | Avg: 24m 57s | Max: 24m 57s
    🟥 ctk
      🟥 12.6               Pass:   0%/1   | Total: 24m 57s | Avg: 24m 57s | Max: 24m 57s
    🟥 cudacxx
      🟥 nvcc12.6           Pass:   0%/1   | Total: 24m 57s | Avg: 24m 57s | Max: 24m 57s
    🟥 cudacxx_family
      🟥 nvcc               Pass:   0%/1   | Total: 24m 57s | Avg: 24m 57s | Max: 24m 57s
    🟥 cxx
      🟥 GCC13              Pass:   0%/1   | Total: 24m 57s | Avg: 24m 57s | Max: 24m 57s
    🟥 cxx_family
      🟥 GCC                Pass:   0%/1   | Total: 24m 57s | Avg: 24m 57s | Max: 24m 57s
    🟥 gpu
      🟥 v100               Pass:   0%/1   | Total: 24m 57s | Avg: 24m 57s | Max: 24m 57s
    🟥 jobs
      🟥 Test               Pass:   0%/1   | Total: 24m 57s | Avg: 24m 57s | Max: 24m 57s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
CUDA Experimental
+/- python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
CUDA Experimental
+/- python
CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 1)

# Runner
1 linux-amd64-gpu-v100-latest-1

@shwina shwina force-pushed the cuda-parallel-forbid-non-contiguous-arrays branch from 3770e06 to 8b1fdd9 Compare January 3, 2025 15:46
Copy link
Contributor

github-actions bot commented Jan 3, 2025

🟩 CI finished in 23m 48s: Pass: 100%/1 | Total: 23m 48s | Avg: 23m 48s | Max: 23m 48s
  • 🟩 python: Pass: 100%/1 | Total: 23m 48s | Avg: 23m 48s | Max: 23m 48s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 23m 48s | Avg: 23m 48s | Max: 23m 48s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 23m 48s | Avg: 23m 48s | Max: 23m 48s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 23m 48s | Avg: 23m 48s | Max: 23m 48s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 23m 48s | Avg: 23m 48s | Max: 23m 48s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 23m 48s | Avg: 23m 48s | Max: 23m 48s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 23m 48s | Avg: 23m 48s | Max: 23m 48s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 23m 48s | Avg: 23m 48s | Max: 23m 48s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 23m 48s | Avg: 23m 48s | Max: 23m 48s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
CUDA Experimental
+/- python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
CUDA Experimental
+/- python
CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 1)

# Runner
1 linux-amd64-gpu-v100-latest-1

@shwina shwina requested a review from leofang January 3, 2025 18:55
Copy link
Contributor

github-actions bot commented Jan 3, 2025

🟩 CI finished in 23m 45s: Pass: 100%/1 | Total: 23m 45s | Avg: 23m 45s | Max: 23m 45s
  • 🟩 python: Pass: 100%/1 | Total: 23m 45s | Avg: 23m 45s | Max: 23m 45s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 23m 45s | Avg: 23m 45s | Max: 23m 45s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 23m 45s | Avg: 23m 45s | Max: 23m 45s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 23m 45s | Avg: 23m 45s | Max: 23m 45s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 23m 45s | Avg: 23m 45s | Max: 23m 45s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 23m 45s | Avg: 23m 45s | Max: 23m 45s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 23m 45s | Avg: 23m 45s | Max: 23m 45s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 23m 45s | Avg: 23m 45s | Max: 23m 45s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 23m 45s | Avg: 23m 45s | Max: 23m 45s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
CUDA Experimental
+/- python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
CUDA Experimental
+/- python
CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 1)

# Runner
1 linux-amd64-gpu-v100-latest-1

Copy link
Member

@leofang leofang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Sorry for delay in reviews.

if f_contiguous:
try:
return cp.asfortranarray(arr)
except ImportError: # cublas unavailable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: What's the version of CuPy that triggered this? Importing CuPy should not fail due to lack of any CTK library... If so we might have something to fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cupy-cuda12x-13.3.0.

Here's what the failure looks like:

2025-01-03T00:16:23.4239558Z _________________ ERROR at setup of test_reduce_2d_array[True] _________________
2025-01-03T00:16:23.4241093Z 
2025-01-03T00:16:23.4242663Z request = <SubRequest 'array_2d' for <Function test_reduce_2d_array[True]>>
2025-01-03T00:16:23.4244087Z 
2025-01-03T00:16:23.4245368Z     @pytest.fixture(params=[True, False])
2025-01-03T00:16:23.4246832Z     def array_2d(request):
2025-01-03T00:16:23.4248065Z         f_contiguous = request.param
2025-01-03T00:16:23.4249357Z         arr = cp.random.rand(5, 10)
2025-01-03T00:16:23.4250661Z         if f_contiguous:
2025-01-03T00:16:23.4252062Z >           return cp.asfortranarray(arr)
2025-01-03T00:16:23.4253052Z 
2025-01-03T00:16:23.4253482Z tests/test_reduce.py:511: 
2025-01-03T00:16:23.4255090Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
2025-01-03T00:16:23.4257843Z ../../build/cuda12.6-gcc13/python/cupy/_manipulation/kind.py:62: in asfortranarray
2025-01-03T00:16:23.4259978Z     return _core.asfortranarray(a, dtype)
2025-01-03T00:16:23.4261749Z cupy/_core/core.pyx:2766: in cupy._core.core.asfortranarray
2025-01-03T00:16:23.4263333Z     ???
2025-01-03T00:16:23.4264614Z cupy/_core/core.pyx:2784: in cupy._core.core.asfortranarray
2025-01-03T00:16:23.4266229Z     ???
2025-01-03T00:16:23.4267531Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
2025-01-03T00:16:23.4268823Z 
2025-01-03T00:16:23.4269155Z >   ???
2025-01-03T00:16:23.4270863Z E   ImportError: libcublas.so.12: cannot open shared object file: No such file or directory
2025-01-03T00:16:23.4272563Z 
2025-01-03T00:16:23.4273016Z cupy/_core/core.pyx:2710: ImportError

@shwina shwina force-pushed the cuda-parallel-forbid-non-contiguous-arrays branch from cc8a9fa to abaa341 Compare January 4, 2025 12:32
Copy link
Contributor

github-actions bot commented Jan 4, 2025

🟩 CI finished in 23m 45s: Pass: 100%/1 | Total: 23m 45s | Avg: 23m 45s | Max: 23m 45s
  • 🟩 python: Pass: 100%/1 | Total: 23m 45s | Avg: 23m 45s | Max: 23m 45s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 23m 45s | Avg: 23m 45s | Max: 23m 45s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 23m 45s | Avg: 23m 45s | Max: 23m 45s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 23m 45s | Avg: 23m 45s | Max: 23m 45s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 23m 45s | Avg: 23m 45s | Max: 23m 45s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 23m 45s | Avg: 23m 45s | Max: 23m 45s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 23m 45s | Avg: 23m 45s | Max: 23m 45s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 23m 45s | Avg: 23m 45s | Max: 23m 45s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 23m 45s | Avg: 23m 45s | Max: 23m 45s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
CUDA Experimental
+/- python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
CUDA Experimental
+/- python
CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 1)

# Runner
1 linux-amd64-gpu-v100-latest-1

@shwina shwina merged commit 7b8563f into NVIDIA:main Jan 4, 2025
18 checks passed

d_in = cp.zeros((size, 2))[:, 0]
with pytest.raises(ValueError, match="Non-contiguous arrays are not supported."):
_ = algorithms.reduce_into(d_in, d_out, binary_op, h_init)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm seeing this warning:

=========================================================================== warnings summary ===========================================================================
tests/test_reduce.py::test_reduce_non_contiguous
  /home/coder/cccl/python/devenv/lib/python3.12/site-packages/_pytest/unraisableexception.py:85: PytestUnraisableExceptionWarning: Exception ignored in: <function _Reduce.__del__ at 0x7bf123139080>
  
  Traceback (most recent call last):
    File "/home/coder/cccl/python/cuda_parallel/cuda/parallel/experimental/algorithms/reduce.py", line 132, in __del__
      bindings.cccl_device_reduce_cleanup(ctypes.byref(self.build_result))
                                                       ^^^^^^^^^^^^^^^^^
  AttributeError: '_Reduce' object has no attribute 'build_result'
  
    warnings.warn(pytest.PytestUnraisableExceptionWarning(msg))

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================= 1 passed, 93 deselected, 1 warning in 0.44s ==============================================================

I'm working around it like this: bcf0de8

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense - nice catch!

While we're at it, we should probably turn warnings into errors so that we don't let these things slip in the future like I did here. Perhaps we could start with these three lines?

https://github.com/rapidsai/cudf/blob/955b1f4566abccf920a022dc78a1e654acf0de16/python/cudf/pyproject.toml#L97-L100

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[FEA]: cuda.parallel - forbid (or handle) non-contiguous arrays as inputs to algorithms
3 participants