-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Applying @time to simulate reveals discrepancies on resources and timing reports #392
Comments
As you said, there are some preprocessing steps before the simularion that are not accounted for in the reported time. As Can you send me the sequence you are using? |
I used the pulseq gradient echo sequence in the examples folder for the screenshot I shared. However, I have two other sequences with different ADC durations that show increased times and resources too. I'm attaching them here: The zip contains a JLD2 file with a dictionary including two sequences. |
Hello there, By checking if sim_params["gpu"] GC.gc(true); CUDA.reclaim() end Since I'm using a dedicated GPU, it makes sense to have it executed all the time. CPU-only users don't need it and should not experience these significant timing discrepancies. In any case, I wonder whether that line can be removed or skipped given that according to https://cuda.juliagpu.org/stable/usage/memory/#Garbage-collection, "There is no need for manual memory management, just make sure your objects are not reachable (i.e., there are no instances or references)." |
We could just remove that line, maybe it doesn't help. If you do a |
@gsahonero can you report if this did not generate any problems?
|
This past month, I have been using the modified version (without the garbage collection function) of |
Where would you put it in the docs? in the docstring for |
Yes! Actually, after checking https://juliahealth.org/KomaMRI.jl/dev/reference/3-koma-core/#KomaMRICore.simulate, I can see that there is no mention of the |
Sounds like a good first PR 😄 (You can do it directly on GitHub by pressing the ✏️ icon) https://github.com/JuliaHealth/KomaMRI.jl/blob/master/KomaMRICore/src/simulation/SimulatorCore.jl |
The PR is there: #431 :) |
Hi, I'm using KomaMRI to simulate MR Fingerprinting sequences. For such long sequences with multiple pulses, I observe a much longer preprocessing time than the simulation time, either on CPU or GPU. For instance, on the simulation below, I get a simulation time of 9min12 while the total run is 49min, giving a preprocessing time of about 40min! I don't know how this could be optimized on a single run of the Anyway, it is a pleasure to use KomaMRI! :-) Thanks to the team for developing this powerful tool! Best, PS1: My goal is to simulate much longer sequences (about 300 pulses) but it seems infeasible today as the preprocessing time is more than linear in the number of pulses: for 40 pulses I get approximately a 15 min preprocessing. PS2: apart from that, when computing on CPU (with smaller sequences), I have trouble with parallel processing: an increase of the number of threads |
Hi @Tooine, first of all, thanks for using Koma 😄! That definitely sounds like a problem. Thanks for attaching the sequence so we can investigate. Are you using the newest version of all the Koma-related packages? (v0.8.X) We fixed a huge slowdown when the sequences had too many ADC points in KomaCore v0.8.3. If you cannot update to v0.8.X, that probably means that you need to update Julia to >=1.9. The CPU problem could be that you did not open Julia with multiple threads Let me know if that helps, if not, I will fix it 💪. @rkierulf Maybe this is a good sequence to test the performance improvements coming to v0.9. |
I forgot to answer this:
You can already do that in a few ways (without waiting for me to fix the issue):
struct MyMethod <: SimulationMethod end # You can add parameters to this
KomaMRICore.sim_output_dim(obj::Phantom{T}, seq::Sequence, sys::Scanner, sim_method::MyMethod) where {T<:Real}
KomaMRICore.run_spin_precession!(p::Phantom{T}, seq::DiscreteSequence{T}, sig::AbstractArray{Complex{T}}, M::Mag{T}, sim_method::MyMethod) where {T<:Real}
# Optional
KomaMRICore.run_spin_excitation!(p::Phantom{T}, seq::DiscreteSequence{T}, sig::AbstractArray{Complex{T}}, M::Mag{T}, sim_method::MyMethod) where {T<:Real} and then sim_param["sim_method"] = MyMethod() Check Important Note that you don't need to "touch" Koma's source code to extend it! You can do it completely externally.
The first one would be the recommended method, as you have more flexibility. If you found the PR useful and can help (1) to clean it up so we can merge it or (2) help document the simulation method extensibility (it is a very powerful but unknown feature) as a tutorial (just create in |
I agree with @cncastillo! You need to use the last version to reduce the simulation time. The speedup between 0.7x and 0.8x is significant. Complementing
In this case, Finally, please, check whether the GPU garbage collection is not affecting the performance when you use a GPU. For this, check this line |
Hi, thanks for those quick and detailed answers! First of all, the CPU problem is solved, I'm using Julia on a Jupyter Notebook with VS Code and indeed, it required a Julia environment with a specified number of threads. I'm not a Julia expert, so I missed this, thanks!
Yes, I'm using
In my code, I only imported
So, in a nutshell (not sure I was very clear), I'm now sure to use
Thanks for highlighting those possibilities. Currently, I'm more in the setting where I get one phantom and several sequences to test, but this might be interesting for me in the future (in which case I can try to submit a PR, but I don't know if i have the necessary Julia skills for this).
Ok, I'm going to check this now. I can see this line in the |
Hi, Just a quick answer: that GPU garbage collection line may take a long time and resources. I understand the line will be removed soon (@cncastillo, I can send a PR just for this if needed). If you want to look deeper at the time and resources to profile what is taking too long for your simulation, use In any case, |
Yeah I agree that the GC line is not worth checking. I will profile your sequence during the weekend. (Btw for this I recommend to use @profview in VSCode instead a sprinkling @times's. I reaaaally suggest reading this website https://modernjuliaworkflows.github.io/ full of tips and tricks!) My intuition is that the problem comes from |
Hi all, I'm a colleague of @Tooine I just looked into this and it seems the problem is in the discretization of the sequence, and more specifically in get_samples. I think the issue may be the one encountered in this thread on the Julia Discourse. Using the sequence attached by @Tooine, I see the following: Whereas by modifying the line slightly according to one of the proposed alternatives in that thread, I get: Disclaimer: I'm an absolute Julia Newbie, so even though I checked the results were identical for our case, there may be something I'm missing. |
Oh god! That could make sense. I believe a similar problem was detected by @gabuzi and fixed in #220, but in that case, the problem was a little different. Could you check if @btime t_rf3 = reduce(vcat, [T0[i] .+ times(seq.RF[1,i], :A) for i in range]) also fixes the problem? We can push a fix for that very easily! |
IIRC I tried exactly that line too, and the result was off. My interpretation was that the shape of [...] and collect(...) seemed to be different, but I didn't look closer. I'll try to find some time to check again later today. |
Hey, thank you all for your consideration of the issue, and thank you @JanWP for probably identifying its cause! I didn't have many time to test the proposed solutions this day, and I will be off for a few weeks, so don't worry if the problem is not solved immediately. |
I believe that it doesn't hurt to push the We run automated benchmarking and will notice if it generates any performance/accuracy regression. I will push a branch with the fix using KomaMRICore 0.0.9-DEV (that has the benchmarks setup), and if everything looks good, backport it to KomaMRICore v0.8.4. |
julia> Meta.@lower [i for i in 1:10]
:($(Expr(:thunk, CodeInfo(
@ none within `top-level scope`
1 ─ %1 = 1:10
│ %2 = Base.Generator(Base.identity, %1)
│ %3 = Base.collect(%2)
└── return %3
))))
julia> Meta.@lower collect(i for i in 1:10)
:($(Expr(:thunk, CodeInfo(
@ none within `top-level scope`
1 ─ %1 = 1:10
│ %2 = Base.Generator(Base.identity, %1)
│ %3 = collect(%2)
└── return %3
)))) Also julia> @btime reduce(vcat, collect(ones(100) for i in 1:10_000));
1.622 ms (10006 allocations: 16.33 MiB)
julia> @btime reduce(vcat, [ones(100) for i in 1:10_000]);
1.622 ms (10006 allocations: 16.33 MiB) I will now use |
Ok! The benchmarks are in! We can see a clear speed increase and reduced memory allocation size. https://juliahealth.org/KomaMRI.jl/benchmarks/?option=memory I just realized the changes are in KomaMRIBase, so I will tag a new non-breaking release for that sub-package (KomaMRIBase v0.8.5). |
I'm a bit late to the party, but I can confirm that this fixes our issue. I put a Before commit e7bfc1f, I get: After the commit: Quite the improvement! Thanks for fixing this so quickly! |
Hi,
I simulated some sequences using
simulate
and found that thetimed
data differed from what@time
provides. Check this screenshot for reference:There, the last line refers to the data that
@time
produces, and the previous one refers to whatsimulate
provides.To reproduce this, the following code can be used:
I think the differences are due to other functions unrelated to
run_sime_time_iter
which is the one with@time
on thesimulate
function, but it would be useful to trace them and check what could be optimized even further (~2GB is way much larger than 300 MB!). Also, it would be useful to clarify that the timing information returned bysimulate
considers only the simulation. Thinking of end-users, they might get confused - like me - by the reported and the perceived time.Best,
Guillermo
The text was updated successfully, but these errors were encountered: