Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error after run for a few hours #122

Open
hptian opened this issue Nov 14, 2024 · 12 comments
Open

Error after run for a few hours #122

hptian opened this issue Nov 14, 2024 · 12 comments

Comments

@hptian
Copy link

hptian commented Nov 14, 2024

Hi, my case crashed after running for a few hours and returned the following message:
"Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
mpirun noticed that process rank 0 with PID 0 on node ycy-3508 exited on signal 9". But without the libAcoustics, the case is able to run until the end. Can anyone tell me why? Thanks a lot.

@mkraposhin
Copy link
Contributor

Perhaps, libAcoustics exhausted your memory.

Use top to analyze the memory consumption.

@hptian
Copy link
Author

hptian commented Nov 14, 2024

Thanks for your kind reply. I used top to observe what happened when it crashed. The memory usage seems normal, but the cpu usage is 200% for a single processor, while normally it is 100%. After about one minute of 200%, it crashed. I also noticed that the case would always crash at the 1285 step after the start time, but without the libAcoustics, it runs normally. Do you have any idea? Thanks again for providing this useful package for noise calculating.

@hptian
Copy link
Author

hptian commented Nov 14, 2024

By the way, I am using the FW-H method. What is the meaning of OAP printed during the calculation? Thanks

@mkraposhin
Copy link
Contributor

Thanks for your kind reply. I used top to observe what happened when it crashed. The memory usage seems normal, but the cpu usage is 200% for a single processor, while normally it is 100%. After about one minute of 200%, it crashed. I also noticed that the case would always crash at the 1285 step after the start time, but without the libAcoustics, it runs normally. Do you have any idea? Thanks again for providing this useful package for noise calculating.

Well, most likely, this is a memory-related problem. How many CPUs do you use? Are they on the same node? How many copies of the solver do you run per one node?

@mkraposhin
Copy link
Contributor

By the way, I am using the FW-H method. What is the meaning of OAP printed during the calculation? Thanks

OAP = Observed Acoustic Pressure

@hptian
Copy link
Author

hptian commented Nov 21, 2024

Thanks, 128 processors, yes they are on the same node. After I reduced that observer points, the error disappeared. However, after running for about 1 day, the program stopped at "Starting acoustics probe". I used TOP to see what happened but everything is OK. The cpu and the memory are all work well. The calculation does not crash but stopped to waiting for the acoustic probing process, and no errors are reported. Do you have any ideas, thanks a lot.

@mkraposhin
Copy link
Contributor

As the first step, it's neccesary to gather call stack when your simulation fails.

Can you try also different partition approaches (scotch / simple / etc)?

@hptian
Copy link
Author

hptian commented Nov 26, 2024

Thanks a lot, I will try

@mkraposhin
Copy link
Contributor

Thanks a lot, I will try

Because I need some information to localize the source of the error.

@mkraposhin
Copy link
Contributor

One more question: is your mesh static or dynamic (moving)?

@hptian
Copy link
Author

hptian commented Dec 3, 2024

Thanks for your reply. My case is about numerical simulation of a naca airfoil, the airfoil is static and the incoming flow velocity is 30m/s. It is strange that the PSD shape at the observer position is similar to the experiments, but approximately 30 dB lower than that in experiments. I am using the FWH method and the flow field agrees well with experiments. Do you have any ideas? Thanks a lot. P.S. The case has not crashed since then.

@mkraposhin
Copy link
Contributor

Is your case 2D or 3D?

If the case hasn't crashed, then it means there is a memory access error and I hope it will show itself again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants