-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Makefile updates #198
Makefile updates #198
Conversation
Pull request from dev to master to create COMPAS beta
This should be a pull request into dev not master |
@SimonStevenson Oops - fixed now |
This compiled fine for me, would like to get @jeffriley 's eyes on it as well if possible |
time make -j $(nproc) |
The changes look fine. Happy somebody is looking at the makefile - I've basically been ignoring it. I'll officially review a bit later. |
running ./COMPAS -n 1000 on master and this branch makefile branch So this leads to a factor of >2 speed up. The outputs are however not identical, I need to check that this is expected, and that they produce identical outputs for identical inputs. |
OK spotted a possible bug in default random seed (issue #199 ). Fixing the seed to be 42: I now get identical outputs, and these are the run times: master makefile |
@SimonStevenson Never-mind - fixed :) |
This is awesome @manodeep ! |
As a point of reference, SSE2 operates on 128 bit vector registers (4x floats, 2x doubles) and AVX-512 operates on 512-bit vector registers (16x floats, 8x doubles) - so the best case speedup would be 4x. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have checked that this compiles and runs and produces the same binaries as master. I found that for a small test run this leads to a speed up of >2! I am happy signing off on this.
I just approved this, but @jeffriley pointed out that we should document Manodeep's suggestion |
"time make -j $(nproc) Query: does it make sense to have two compilation options, one that can compile quickly (for quick tests) at the cost of less efficient running, and another that optimises runtime for longer runs at the cost of long compilation times? |
Thank you very much for your advice and help, @manodeep -- we really appreciate it! |
Btw, the reason why I recommended If you guys haven't done so already, I would recommend using Intel Vtune to visually profile the code and see if there are any easy gains. Intel Vtune is available as a module on @SimonStevenson Happy to chat via slack if that would be easier. |
@ilyamandel Not a problem - glad I could help! |
When I compile with this makefile, using gcc version
I get the following warning for every source file:
If I add
to CXXFLAGS the warning goes away |
@jeffriley Good solve - that error message was annoying. I wonder if your solution is identical to |
@manodeep I think -fno-var-tracking-assignments turns assignment tracking off, whereas --param=max-vartrack-size=0 changes the default limit from 10000 (I think) to unlimited. I'm not sure we need it, so we could just use -fno-var-tracking-assignments |
From a quick googling (earlier), that option seems to facilitate debugging. May be worth removing from production runs, particularly so if the compile times change depending on that parameter settiing. |
@SimonStevenson @jeffriley I made a few more changes that incorporate good Should not affect the build, but now has better dependency checker. For instance, if the header files are altered, then the associated object file will be rebuilt. Same for all object files, if the I am done with the edits. Any other edits will be in response to code-review from you guys. |
Thanks @manodeep , I'll rerun my tests. Can you respond to @ilyamandel 's query above regarding optimising for compilation time vs optimising for runtime?
|
Compiles and runs fine for me! manodeep/makefile |
@SimonStevenson Ohh I didn't realise that query was directed at me! Yup - totally possible to create two compilation pathways, but that requires a choice about usability. How do you guys envision running these two different compilation tracks? I have implemented one solution, where the non-optimised build is the default (i.e., mirrors the setup prior to this PR). Running @SimonStevenson When you re-ran the code just now - was the output still identical to before? |
Substantial changes since approved, should be reapproved
@manodeep -- thank you, exactly what I had in mind! |
Hi @manodeep , thanks for this! Compiling with real 3m6.504s Compiling with real 7m10.527s Checked that it runs and produces the same output in each case using
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have checked that the code compiles using both
make
which compiles faster and runs slower
and
make fast
which compiles slower and runs faster
and produces the same output in each case.
Thanks for releasing the code-base. I took a quick look at the
Makefile
and fixed a couple of issues.Optimisation flags should be on the compile-time and not link-time. I moved the
-O3
flag to the compile stage. Plus, the-march=k8
was only targeting the AMD K8 architecture that was released in ~2003-2004 (SSE2 is the "fastest" instructions used on K8, see details below). Modern CPUs have more advanced instructions, and the generic-march=native
will optimise for the CPU thatCOMPAS
is being compiled on. (If you are compiling onOzSTAR
, then I would recommend using-march=skylake-avx512
instead of-march=native
). You guys should re-run any benchmarks and check if there are any speedups.Duplicate
CFLAGS
on two different lines containing-g
and the include options. Replaced with the more appropriateCXXFLAGS
forC++
codeReplaced all hard-coded
COMPAS
with a single-sourced variable ($EXE
)Embedded the path to the
BOOST
library into the executable by using the-Xlinker -rpath -Xlinker </path/to/boost/lib>
. This will allow the executable to always find theBOOST
library used at compile-time (regardless of whether that library is loaded at run-time) -- I suspect this will remove the need for the static linkingNumerous other small-updates to the Makefile structure, including removing of include flags at link-time
I have also attempted to line-up trailing backslash on the
SOURCES
lines.Comparison of instruction-sets enabled by compile-time options.
Here are the instruction sets enabled by
-march=k8
and-march=skylake-avx512
, checked withg++-7.5.0
on my OSX laptop: