-
Notifications
You must be signed in to change notification settings - Fork 572
Profiling CUDA UVM host device memory transfers using Caliper
David Poliakoff's experimental branch of the Caliper profiling library can track the automatic transfers of UVM pages between host and device. Caliper is a standalone dynamic library that conforms to the Kokkos profiling interface. It can do much more than just UVM tracking, but these instructions are only about UVM.
Limitation: doesn't work with MPI yet, but full support of MPI in the Tpetra stack is in progress.
module load $cuda_stuff
Build your app with -DKokkos_ENABLE_PROFILING=ON
, but otherwise the usual CUDA configuration
Recommended to have -DCMAKE_BUILD_TYPE=Debug
to have accurate kernel names in backtraces.
Download Caliper and checkout UVM branch (this is an experimental extension to https://github.com/LLNL/Caliper):
git clone [email protected]:DavidPoliakoff/caliper
cd caliper
git checkout feature/uvm
Build and install caliper - the install prefix can be anywhere. Here it's called $CALIPER_ROOT
.
mkdir build
cd build
cmake -DWITH_KOKKOS_PROFILING=ON -DWITH_CUPTI=ON -DCUDA_TOOLKIT_ROOT_DIR=$CUDA_ROOT -DCUPTI_PREFIX=$CUDA_ROOT/extras/CUPTI -DCMAKE_INSTALL_PREFIX=$CALIPER_ROOT ..
make install
export KOKKOS_PROFILE_LIBRARY=$CALIPER_ROOT/lib64/libcaliper-serial.so
export CALI_CALIPER_ATTRIBUTE_DEFAULT_SCOPE=process
export CALI_LOG_VERBOSITY=1
export CALI_REPORT_CONFIG="SELECT SUM(cupti.uvm.bytes),* GROUP BY alloc.label#cupti.uvm.address,cupti.uvm.direction,function FORMAT TABLE"
export CALI_ALLOC_RESOLVE_ADDRESSES=TRUE
Now, just run a program that uses Kokkos+CUDA+UVM. After Kokkos is finalized, Caliper will print out a report. Here is an example line from running the TpetraExt MatrixMatrix unit tests:
KokkosSparse::StructureC::GPU_EXEC BYTES_TRANSFER_HTOD 98304 Tpetra::CrsMatrix::val
This 4-column format (defined in $CALI_REPORT_CONFIG
) has the demangled kernel/functor name, the transfer direction (here, host to device), the total number of bytes transferred, and the Kokkos::View
label.
The reported byte count is a sum over all the times the transfer happened in the same kernel, on the same view in the same transfer direction.
Copyright © Trilinos a Series of LF Projects, LLC
For web site terms of use, trademark policy and other project policies please see https://lfprojects.org.
Trilinos Developer Home
Trilinos Package Owners
Policies
New Developers
Trilinos PR/CR
Productivity++
Support Policy
Test Dashboard Policy
Testing Policy
Managing Issues
New Issue Quick Ref
Handling Stale Issues and Pull Requests
Release Notes
Software Quality Plan
Compiler Warnings/Errors
Proposing a New Package
Guidance on Copyrights and Licenses
Tools
CMake
Doxygen
git
GitHub Notifications
Mail lists
Clang-format
Version Control
Initial git setup
'feature'/'develop'/'master' (cheatsheet)
Simple centralized workflow
Building
SEMS Dev Env
Mac OS X
ATDM Platforms
Containers
Development Tips
Automated Workflows
Testing
Test Harness
Pull Request Testing
Submitting a Pull Request
Pull Request Workflow
Reproducing PR Errors
Addressing Test Failures
Trilinos Status Table Archive
Pre-push (Checkin) Testing
Remote pull/test/push
PR Creation & Approval Guidelines for Tpetra, Ifpack2, and MueLu Developers