Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Automatically finalize t8code objects when Trixi.jl shuts down #2172

Merged
merged 50 commits into from
Dec 3, 2024

Conversation

jmark
Copy link
Contributor

@jmark jmark commented Nov 19, 2024

This PR provides a solution for automatically finalizing all active t8code related objects before Trixi.jl resp. MPI shuts down. Before this PR t8code objects had to be finalized explicitly by hand before shutting down Trixi or nasty segfaults occurred.

This PR is related this discussion: DLR-AMR/t8code#1295.

This PR depends on DLR-AMR/T8code.jl#75 and DLR-AMR/T8code.jl#76.

@jmark jmark added enhancement New feature or request refactoring Refactoring code without functional changes labels Nov 19, 2024
Copy link
Contributor

Review checklist

This checklist is meant to assist creators of PRs (to let them know what reviewers will typically look for) and reviewers (to guide them in a structured review process). Items do not need to be checked explicitly for a PR to be eligible for merging.

Purpose and scope

  • The PR has a single goal that is clear from the PR title and/or description.
  • All code changes represent a single set of modifications that logically belong together.
  • No more than 500 lines of code are changed or there is no obvious way to split the PR into multiple PRs.

Code quality

  • The code can be understood easily.
  • Newly introduced names for variables etc. are self-descriptive and consistent with existing naming conventions.
  • There are no redundancies that can be removed by simple modularization/refactoring.
  • There are no leftover debug statements or commented code sections.
  • The code adheres to our conventions and style guide, and to the Julia guidelines.

Documentation

  • New functions and types are documented with a docstring or top-level comment.
  • Relevant publications are referenced in docstrings (see example for formatting).
  • Inline comments are used to document longer or unusual code sections.
  • Comments describe intent ("why?") and not just functionality ("what?").
  • If the PR introduces a significant change or new feature, it is documented in NEWS.md with its PR number.

Testing

  • The PR passes all tests.
  • New or modified lines of code are covered by tests.
  • New or modified tests run in less then 10 seconds.

Performance

  • There are no type instabilities or memory allocations in performance-critical parts.
  • If the PR intent is to improve performance, before/after time measurements are posted in the PR.

Verification

  • The correctness of the code was verified using appropriate tests.
  • If new equations/methods are added, a convergence test has been run and the results
    are posted in the PR.

Created with ❤️ by the Trixi.jl community.

jmark and others added 2 commits November 19, 2024 20:15
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Copy link
Member

@andrewwinters5000 andrewwinters5000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great that you determined an "under-the-hood" way to handle this finalization process!

Copy link

codecov bot commented Nov 19, 2024

Codecov Report

Attention: Patch coverage is 91.30435% with 2 lines in your changes missing coverage. Please review.

Project coverage is 96.39%. Comparing base (2edf9cd) to head (5458fb8).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/auxiliary/t8code.jl 50.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2172      +/-   ##
==========================================
- Coverage   96.39%   96.39%   -0.00%     
==========================================
  Files         483      483              
  Lines       38325    38333       +8     
==========================================
+ Hits        36941    36948       +7     
- Misses       1384     1385       +1     
Flag Coverage Δ
unittests 96.39% <91.30%> (-<0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…i-framework/Trixi.jl into fix-t8code-finalize-before-shutdown
@jmark
Copy link
Contributor Author

jmark commented Nov 20, 2024

Although this fix works nicely in serial Julia sessions it is flawed for MPI-based parallel execution. First, the instances of the garbage collector on each Julia process seem to work independently in terms of when and what to free. This leads to stalling MPI processes waiting for other processes who are not finalizing the t8code object at the moment. A similar effect can happen for the last finalization call while shutting down since the Set data structure most probably does not retain the order of insertion. Th latter is crucial to have for multiple MPI processes so that the objects sharing the same MPI arrays are freed.

Phew ...

@jmark jmark marked this pull request as draft November 20, 2024 14:00
@jmark jmark marked this pull request as ready for review November 21, 2024 17:31
@jmark
Copy link
Contributor Author

jmark commented Nov 21, 2024

@andrewwinters5000 @benegee @sloede @ranocha @JoshuaLampert Ready for review.

Copy link
Member

@JoshuaLampert JoshuaLampert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a minor thing, but I'm wondering if we should add something like Base.pointer(fw::ForestWrapper) = fw.pointer in T8code.jl to avoid explicitly accessing the field here. (Sorry I missed that in the review in T8code.jl.)
Otherwise this looks good to me.

@jmark
Copy link
Contributor Author

jmark commented Nov 25, 2024

Only a minor thing, but I'm wondering if we should add something like Base.pointer(fw::ForestWrapper) = fw.pointer in T8code.jl to avoid explicitly accessing the field here. (Sorry I missed that in the review in T8code.jl.) Otherwise this looks good to me.

@JoshuaLampert I already thought about doing this similar to PointerWrapper in P4est.jl. But it didn't work right away and I opted for the stupid way of writing it out explicitly.

I then found out that you can actually mess with unsafe_convert and dispatch on it. So I did: DLR-AMR/T8code.jl#76

src/meshes/t8code_mesh.jl Outdated Show resolved Hide resolved
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@jmark
Copy link
Contributor Author

jmark commented Nov 27, 2024

Thanks a lot! The segfault https://github.com/trixi-framework/Trixi.jl/actions/runs/12052335616/job/33605413710?pr=2172#step:7:1119 happened in two consecutive CI runs, so it looks like it is a real issue.

It seems so. Pretty tricky to narrow down the cause for that. On my machine the forest wrapper test runs just fine.

@ranocha
Copy link
Member

ranocha commented Nov 27, 2024

Do you run them in interactive mode locally?

test/runtests.jl Outdated Show resolved Hide resolved
test/test_t8code_forestwrapper.jl Outdated Show resolved Hide resolved
src/meshes/t8code_mesh.jl Outdated Show resolved Hide resolved
@jmark
Copy link
Contributor Author

jmark commented Nov 28, 2024

Do you run them in interactive mode locally?

Both! Tested in interactive session and in "script" mode. Works as expected on my local machine.

Johannes Markert added 2 commits November 28, 2024 16:04
test/runtests.jl Outdated Show resolved Hide resolved
Copy link
Member

@JoshuaLampert JoshuaLampert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@jmark
Copy link
Contributor Author

jmark commented Dec 3, 2024

All green, besides CodeCov. Is there anything else to do?

@sloede
Copy link
Member

sloede commented Dec 3, 2024

Good to go from my side, but I'd like to have @ranocha sign off as well.

Thanks for all your effort @jmark!

@jmark jmark requested a review from ranocha December 3, 2024 14:46
Copy link
Member

@ranocha ranocha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@ranocha ranocha merged commit 9c69e10 into main Dec 3, 2024
39 of 40 checks passed
@ranocha ranocha deleted the fix-t8code-finalize-before-shutdown branch December 3, 2024 15:26
@bennibolm bennibolm mentioned this pull request Dec 5, 2024
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request refactoring Refactoring code without functional changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants