-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nearbyhint: Fix for ffast-math #550
base: master
Are you sure you want to change the base?
Conversation
The fix on 45cad81 wasn't working on Clang. On ffast-math the compiler is free to assume that "x + v -v = x". 45cad81 was workarounding this fact by storing "x + v" on a volatile variable. For Clang this wasn't enough to stop optimizing, as it correctly detected that the variable is local-scope, so no one can take a reference to it. This commit reworks the fix by defining a function to do the operation and disabling optimizations on that function for all supported compilers (and those using the same frontend). For non-supported compilers an #error is emitted, as the workaround wasn't safe enough. It could even break between compiler versions. This avoids potentially weird behaviour on the future.
@@ -1707,6 +1707,40 @@ namespace xsimd { | |||
} | |||
|
|||
|
|||
#if !defined(__FAST_MATH__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting approach! I wonder if the following works as expected and is faster or not:
namespace details {
template<class T>
T force_add(T x, T y) {
#if defined(__FAST_MATH__) && (defined(__clang__) || defined(__GNUC__))
__attribute__((noinline))
#endif
return x + y;
}
}
then call it appropriately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now on $DAILY_JOB.
Without trying I'd say that a competent compiler will still optimize it away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But notice that the optimization we want to break is ffast-math related. Inlining should make no difference.
Can you give a try to the following patch instead: if it works, it's much less intrusive
|
Now I'm at $DAILY_JOB I don't have the codebase at hand, but I'd say that local scope volatile variables are not enough to stop Clang. It seems that it sees through the intrinsic types, otherwise it woudn't have correctly applied the optimization. The problem with the volatile approach, is that even if it worked now we don't know if a compiler update would silently break it. Notice that to remove the volatile instead of doing "(batch_type)(void*)(&d0)" you can do "const_cast<batch_type const&> (d0)" |
#551 achieves the same result on my laptop, while being less intrusive on the codebase. Can you confirm it works for you too? |
6c6dc1f
to
52984ef
Compare
See commit text.
This fixes:
#548
Notice that I wasn't able to verify it on Visual Studio, just g++ and clang.