Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbt2.4 calc TSC once, fixes BW calcs #60

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 12 additions & 2 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -109,9 +109,15 @@ elseif("${CMAKE_SYSTEM_PROCESSOR}" STREQUAL "x86")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -m32")
endif()

find_package(hsa-runtime64 REQUIRED
PATHS /opt/rocm
if (DEFINED ROCM_HOME)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to add extra variables. To find hsa-runtime64-config.cmake from a path other than /opt/rocm, CMAKE_PREFIX_PATH can be used.
CMAKE_PREFIX_PATH="custom path for hsa"
This will function same was as ROCM_HOME="custom path for hsa or rocm"

Copy link
Author

@srinivamd srinivamd Aug 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to add extra variables

What is the optimization of you get out of "No need to add extra variables" or what is the downside? The CMAKE_PREFIX_PATH and other cmake defined paths have side-effects and using those variables to get what user wants cmake to do is incorrect application of those variables, which leads to more headaches.
Read https://cmake.org/cmake/help/v3.0/variable/CMAKE_INSTALL_PREFIX.html
CMAKE_PREFIX_PATH is to specify destination dir. That is not the purpose of the change here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you are referring here is CMAKE_INSTALL_PREFIX.

CMAKE_PREFIX_PATH gives a different purpose.

https://cmake.org/cmake/help/v3.0/variable/CMAKE_PREFIX_PATH.html

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, my mistake. But please see my comments on line 170+ as to avoid "deduction logic" and providing user-defined parameter to use with runpath, etc.
My changes allow DevOps and other cmake experts to continue to use CMAKE_PREFIX_PATH to do a find as well as simple users who want to specify the lcoation to use, and use that in runpath.

find_package(hsa-runtime64 REQUIRED
PATHS ${ROCM_HOME}
)
else()
find_package(hsa-runtime64 REQUIRED
PATHS /opt/rocm
)
endif()
message( " hsa-runtime64 found @ ${hsa-runtime64_DIR} " )

# Set project requirements
Expand Down Expand Up @@ -161,6 +167,10 @@ if( DEFINED ENV{ROCM_RPATH})
set ( CMAKE_EXE_LINKER_FLAGS "-Wl,--enable-new-dtags -Wl,--rpath,$ENV{ROCM_RPATH}" )
endif()

if (DEFINED ROCM_HOME)
set ( CMAKE_EXE_LINKER_FLAGS "-Wl,--enable-new-dtags -Wl,--rpath,${ROCM_HOME}/hsa/lib:${ROCM_HOME}/lib64:${ROCM_HOME}/lib" )

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Path for hsa library is already known to the cmake from find_package(hsa_runtime64).
${hsa-runtime64_DIR} will give the path for /lib/cmake/hsa-runtime64
So it will be better to deduce the path from there rather than adding a new variable. That will ensure RPATH is the same as the one linked against.

Copy link
Author

@srinivamd srinivamd Aug 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it will be better to deduce the path..

No, @paulfreddy. The changes allow the user to specify the parameter value to use instead o relying on cmake crap to find stuff, and code in deduction paths in cmakefiles. This changes take a keep it simple approach, and leaves existing users/jenkins/devops who rely on cmake gymnastics for their environment to continue with them.
I had to trace cmake logic to figure what was going wrong with runpath settings - no one should have to go through that IMHO.
So, the user specifiable define parameter, just like compiler defines makes it easier to specify the parameters needed to generate the makefiles. IOW, relying on another packages cmake files, their relative locations, etc. to build an "application" is not "better" approach as it relies on cmake too much.

endif()

# Add install directives for rocm_bandwidth_test
install(TARGETS ${TEST_NAME} RUNTIME DESTINATION bin)

Expand Down
8 changes: 5 additions & 3 deletions hsatimer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -45,9 +45,11 @@
#define NANOSECONDS_PER_SECOND 1000000000

PerfTimer::PerfTimer() {
freq_in_100mhz = MeasureTSCFreqHz();
freq_in_100mhz = PerfTimer::MeasureTSCFreqHz();
}

PerfTimer::PerfTimer(double tscfreq) : freq_in_100mhz{tscfreq} {}

PerfTimer::~PerfTimer() {
while (!_timers.empty()) {
Timer *temp = _timers.back();
Expand Down Expand Up @@ -176,13 +178,13 @@ uint64_t PerfTimer::MeasureTSCFreqHz() {
unsigned int unused;
uint64_t tscTicksEnd;

uint64_t coarseBeginUs = CoarseTimestampUs();
uint64_t coarseBeginUs = PerfTimer::CoarseTimestampUs();
uint64_t tscTicksBegin = __rdtscp(&unused);
do {
tscTicksEnd = __rdtscp(&unused);
} while (tscTicksEnd - tscTicksBegin < 1000000000);

uint64_t coarseEndUs = CoarseTimestampUs();
uint64_t coarseEndUs = PerfTimer::CoarseTimestampUs();

// Compute the TSC frequency and round to nearest 100MHz.
uint64_t coarseIntervalNs = (coarseEndUs - coarseBeginUs) * 1000;
Expand Down
7 changes: 5 additions & 2 deletions hsatimer.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -81,13 +81,14 @@ class PerfTimer {
public:

PerfTimer();
PerfTimer(double);
~PerfTimer();

private:

// AMD timing method
uint64_t CoarseTimestampUs();
uint64_t MeasureTSCFreqHz();
// uint64_t CoarseTimestampUs();
// uint64_t MeasureTSCFreqHz();
Comment on lines +90 to +91

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be remove this commented code?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! How about the cleanup of comments be taken care of in the next change in the interest of time? Let's get this integrated so users can relying on RBT to generate correct results can get the fix and build and use RBT for performance tuning.


// General Linux timing method

Expand All @@ -97,6 +98,8 @@ class PerfTimer {
int StartTimer(int index);
int StopTimer(int index);
void ResetTimer(int index);
static uint64_t MeasureTSCFreqHz();
static uint64_t CoarseTimestampUs();

public:

Expand Down
9 changes: 6 additions & 3 deletions rocm_bandwidth_test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -588,7 +588,7 @@ void RocmBandwidthTest::RunCopyBenchmark(async_trans_t& trans) {
}

// Create a timer object and reset signals
PerfTimer timer;
PerfTimer timer(tscFreq);
uint32_t index = timer.CreateTimer();

// Start the timer and launch forward copy operation
Expand Down Expand Up @@ -803,8 +803,8 @@ RocmBandwidthTest::RocmBandwidthTest(int argc, char** argv) : BaseTest() {

// Initialize version of the test
version_.major_id = 2;
version_.minor_id = 3;
version_.step_id = 11;
version_.minor_id = 4;
version_.step_id = 0;
version_.reserved = 0;

bw_iter_cnt_ = getenv("ROCM_BW_ITER_CNT");
Expand All @@ -821,6 +821,9 @@ RocmBandwidthTest::RocmBandwidthTest(int argc, char** argv) : BaseTest() {
set_num_iteration(num);
}

tscFreq = PerfTimer::MeasureTSCFreqHz();
//printf("TSC measure: %f \n", tscFreq);

exit_value_ = 0;
}

Expand Down
3 changes: 3 additions & 0 deletions rocm_bandwidth_test.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -519,6 +519,9 @@ class RocmBandwidthTest : public BaseTest {
// Flag to print Cpu time
bool print_cpu_time_;

// tsc freq
double tscFreq;

// Determines if user has requested initialization
bool init_;

Expand Down