Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panzer: Memory access error with HIP #13668

Open
kgottiparthi opened this issue Dec 11, 2024 · 3 comments
Open

Panzer: Memory access error with HIP #13668

kgottiparthi opened this issue Dec 11, 2024 · 3 comments

Comments

@kgottiparthi
Copy link

We get the following error when we run our code on Frontier (OLCF). We are not sure where and how the memory access is failing and will be glad if you provide any suggestions to mitigate this.

CFL = 2.828e-08; dt = 1.000e-01; Time = 0.0000000000000e+00
| Nonlinear | F 2-Norm | # Linear | R 2-Norm |
0 3.19e-03
Memory access fault by GPU node-4 (Agent handle: 0xa77bbf0) on address 0xffff00000000. Reason: Unknown.
Aborted

rocgdb report:

#0 0x00007ff2e28d9124 in PHX::MDField<Sacado::Fad::Exp::GeneralFad<Sacado::Fad::Exp::DynamicStorage<double, double> > const, panzer::Cell, panzer::Point, panzer::Dim>::operator()<int, int, int> (this=0x7ff2e28fbdb0 <kokkos_impl_hip_constant_memory_buffer+272>,
indices=<error reading variable: Cannot access memory at address 0x2000000000afc>,
indices=<error reading variable: Cannot access memory at address 0x2000000000afc>,
indices=<error reading variable: Cannot access memory at address 0x2000000000afc>)
at libs/Trilinos-install-16/include/Phalanx_MDField.hpp:461
461 return m_view(indices...);

Thank you,
Kalyan

@ccober6
Copy link
Contributor

ccober6 commented Dec 11, 2024

@trilinos/panzer @rppawlo

@rppawlo
Copy link
Contributor

rppawlo commented Dec 11, 2024

I suspect it is a problem with setting the derivative dimension for the fad object correctly. An MDField is a light weight wrapper around a Kokkos::View. You could configure your build to do array bounds checking with:

-D Kokkos_ENABLE_DEBUG_BOUNDS_CHECK=ON

It that doesn't help, try printing the derivative array dimensions of the mdfields in the failing functor.

@kgottiparthi
Copy link
Author

Thank you. We will try this and update you.

@jhux2 jhux2 changed the title kokkos: Memory access error with HIP Panzer: Memory access error with HIP Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants