-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Child process sometimes hang, peg cpu @ 100% #234
Comments
Hmm, how could it deadlock when Which version of httpd and mod_http2 are you running? |
Good point, just trying to sound competent, failing. Server version: Apache/2.4.54 (codeit) Name : mod_http2 |
Same problem here on several different servers: OS CentOS 7 Hope this may help to track down the problem |
@alexskynet you do not have to switch mpm to disable http/2. http/2 is only used if you configure
in your server. If you comment that out or just configure That allows you to determine if the problem is related to http2 or if there is another problem in your server. |
Thank you very much @icing |
I want to add a little piece to the puzzle that may help |
Another info
The difference is that the hanging servers all use Joomla both versions 3 and 4 with fastcgi but have no errors in logs of the sites and in php logs. The working server uses the same configuration but runs a custom php application. From the logs users are using http2 without any kind of hang. We are a little bit confused at this point |
Yeah, hard to see the pattern from here. |
I think you'd need to vary one thing at a time in your setup in order to find out which combination is the critical one. For example, place your |
Yes @icing this is a nightmare: too many parameters moving ... |
Tried to downgrade the mod_http2 from 2.0.4 to 2.0.3 on one server |
Downgrading to 2.0.3 seems to fix the problem |
I was experiencing this issue on 2.0.3 and continue to experience it on 2.0.4. NFS is something we use too so that is an interesting avenue. @icing is there any way we can get better data to troubleshoot this? |
Thank you for your answer @Adam7288 this sounds interesting |
I looked at the changes between 2.0.3 and 2.0.4 and would have been surprised by differences. BUT I can make you a 2.0.5 with the changes I did in Apache trunk. There are difference in connection handling. Hard to tell if they improve your situation, but I'd like to know how those changes fare for you. The alternative in your case would be to try the mod_http2 that was released in Apache 2.5.54. |
Regarding NFS usage: I have no personal experience with running a server that way. As I understand your setups, you use NFS to share the Question is: are files on the NFS share being modified and may that have impact? Have you EnableMmap on on this for example? Do you have |
Only time nfs is invoked on our end is within a php script to access them in some way - for instance a document download. We do not serve up static or dynamic content directly from the nfs share. |
Thank you @icing |
Ok, I'll make a v2.0.5 for testing. In the meantime, you might want to configure |
thank you @icing |
Sorry @icing |
"stuck in read" as from the status handler or some backtrace? |
strace -p |
Status shows
rolled back to 2.0.3: no hangs till now |
Hmm, I assume your site is too busy to raise the http2 log level much? |
I may reinstall the 2.0.5 and try to increase the log for few minutes if it may help |
If you can take it |
Ok give me some minutes |
Stuck in seconds with 2.0.5 [Wed Sep 21 16:09:53.574077 2022] [http2:debug] [pid 23712:tid 139812816349376] h2_workers.c(318): h2_workers: cleanup 25 workers idling |
Thanks @alexskynet for putting v2.0.6 into your grinder. Sorry, that it did nothing to improve.😢 Analyzing the backtraces now. |
There was one change from v2.0.3 to newer ones that involved thread creation for the h2 workers. A quick check to see if that is causing problems would be to configure your server with a fixed number of h2 workers, like in:
so all workers are created at startup and no dynamic creation/desctruction of thread does happen. Could you give this a shot? |
unfortunately no change |
Thread 17 (Thread 0x7ff9e1ff3700 (LWP 8225)): |
Thread 25 (Thread 0x7ff9e5ffb700 (LWP 8217)): |
v2.0.7 released with a fix for Background: the v2.0.x line had improvements to return to @nono303 I mentioned |
Time to compile and run and I'll let you know |
running |
Very first impression is good: no immediate lock so I cross my fingers |
A giant step @icing ! |
Thanks for your patience. Happy to hear that. Let's see what the day brings. As to performance, I made some conservative changes to bring stability. When this version proves to be stable, I can dare to tighten the screws again somewhat. |
You're doing a great job @icing I'm happy to have been useful to help a little bit: that's called opensource! |
running till now ok |
I just want to confirm that 2.0.7 works like a charm CodeIt has already released the mod_http2-2.0.7 rpm with the fix so all the world should be happy now :-) It has been running for several hours on two server with absolutely no troubles Problem solved: well done @icing ! 🥇 |
Thanks again @alexskynet for the help! Closing this as fixed in v2.0.7. |
… I came after the battle (quite busy now) but many Thx @icing for your work and responsiveness
just maybe a little bit more cpu cycle on mod_watchdog, but not sure is related to h2
|
Sorry to reopen this bt today we had hangs on two different servers both running 2.0.7 |
@alexskynet this strack trace does not look right. I assume there is another version of mod_http2 loaded than the v2.0.7 one. v2.0.7 never invokes Could you double check? |
v2.0.8 released. The fixes are unrelated to this, but I added a assertion before the point where the 100% cpu loop seems to happen in reports by @alexskynet. It would be interesting to know if this triggers and if so, what is logged (at level critical) in such a case. |
The epic battle continues with v2.0.9:
|
Hi @icing and thank you for your hard work Testing 2.0.9 with worker Till now what I see in mod_status looks OK: it has been running for 20 minutes now. I cross my fingers and wait ... I'll let you know |
You tricked me before! I remain sceptical...😉 |
12 hours still running fine ... It looks promising |
Hi @icing did some testing for three days now and it works fine |
Hi @alexskynet! This is excellent news. We won. Thank you very much for helping on this. In the meantime, I have added more edge test cases and made some more improvements on reliability. Will release that in some days as a v2.0.10. |
Thanks you!
I'll start a test as soon as possible
Il 30 settembre 2022 16:59:17 CEST, Stefan Eissing ***@***.***> ha scritto:
The epic battle continues with v2.0.9:
* Fixed a bug where errors during reponse body handling did not lead to
a proper RST_STREAM. Instead processing went into an infinite loop.
Extended test cases to catch this condition.
--
Reply to this email directly or view it on GitHub:
#234 (comment)
You are receiving this because you were mentioned.
Message ID: ***@***.***>
SKNT Group SRLU
Via Maggiate 67/a
28021 Borgomanero (NO)
tel. +39 0322-836487/834765
fax +39 0322-836608
http://sknt.it
|
With v2.0.10 just being released and Alessandro's extensive testing, I think we have solve the issues. Many, many thanks to everyone. |
Here is pstack trace for one of the hung processes pegging cpu @ 100% - looks like some kind of deadlock
#0 __lll_unlock_wake () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:371
#1 0x00007f2d659f0f9e in _L_unlock_738 () from /lib64/libpthread.so.0
#2 0x00007f2d659f0f10 in __pthread_mutex_unlock_usercnt (decr=1, mutex=0x7f2c7c5248f8) at pthread_mutex_unlock.c:55
#3 __GI___pthread_mutex_unlock (mutex=0x7f2c7c5248f8) at pthread_mutex_unlock.c:330
#4 0x00007f2d57cb140d in h2_beam_receive () from /etc/httpd/modules/mod_http2.so
#5 0x00007f2d57cc8ee3 in buffer_output_receive () from /etc/httpd/modules/mod_http2.so
#6 0x00007f2d57ccb1ec in stream_data_cb () from /etc/httpd/modules/mod_http2.so
#7 0x00007f2d66cba171 in nghttp2_session_pack_data () from /lib64/libnghttp2.so.14
#8 0x00007f2d66cbaedd in nghttp2_session_mem_send_internal () from /lib64/libnghttp2.so.14
#9 0x00007f2d66cbbae9 in nghttp2_session_send () from /lib64/libnghttp2.so.14
#10 0x00007f2d57cc7544 in h2_session_send () from /etc/httpd/modules/mod_http2.so
#11 0x00007f2d57cc777a in h2_session_process () from /etc/httpd/modules/mod_http2.so
#12 0x00007f2d57cb2149 in h2_c1_run () from /etc/httpd/modules/mod_http2.so
#13 0x00007f2d57cb2569 in h2_c1_hook_process_connection () from /etc/httpd/modules/mod_http2.so
#14 0x00005571f66c33c0 in ap_run_process_connection (c=c@entry=0x7f2d4006d5e0) at connection.c:42
#15 0x00007f2d5a8ab40a in process_socket (thd=thd@entry=0x5571f72c4510, p=, sock=, cs=0x7f2d4006d530, my_child_num=my_child_num@entry=11, my_thread_num=my_thread_num@entry=12) at event.c:1086
#16 0x00007f2d5a8ae6ae in worker_thread (thd=0x5571f72c4510, dummy=) at event.c:2179
#17 0x00007f2d659edea5 in start_thread (arg=0x7f2d3a7f4700) at pthread_create.c:307
#18 0x00007f2d65512b0d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
The text was updated successfully, but these errors were encountered: