You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a pyzmq-specific bug, not an issue of zmq socket behavior. Don't worry if you're not sure! We'll figure it out together.
What pyzmq version?
26.0.3
What libzmq version?
4.3.5
Python version (and how it was installed)
Python 3.12 installed via apt from ppa:deadsnakes/ppa
OS
ubuntu 20.04 inside docker running on ubuntu 20.04 host
What happened?
With a REP socket and a REP socket running in a separate thread, sending request-replies over and over with a SNDHWM=1 on the REP socket, eventually, the REP send hangs. The REP does not throw zmq.Again or zmq.Error but the message appears to get lost. The example shows that the REQ times out receiving the response and then sets a global error flag that shows that the REP thread moved past the send (where the message is lost) and is waiting for the next request.
SNDHWM does not really make sense for a REQ socket but we inadvertently set HWM=1 for all our sockets and occasionally experienced this failure so thought it should be reviewed. Setting HWM=2 appears to resolve the issue.
Code to reproduce bug
"""Test ZMQ REQ-REP with REP HWM=1Example output:Rep HWM=1Req timed out receiving response 2659829Waiting for request after errorRep thread exitingRep HWM=2Rep thread exiting"""importzmqimporttimeimportthreadingURL='ipc:///tmp/zmq_test_pipe'globalrunning, errorerror=Falserunning=Truedefreply_sock(hwm):
ctx=zmq.Context.instance()
sock=ctx.socket(zmq.REP)
print(f'Rep HWM={hwm}')
sock.SNDHWM=hwmsock.RCVTIMEO=100sock.SNDTIMEO=1sock.LINGER=0sock.bind(URL)
resp='x'*100resp_bin=resp.encode('utf-8')
globalrunning, errorwhilerunning:
try:
iferror:
print('Waiting for request after error')
error=False# We caught the error, so clear flagrx=sock.recv()
exceptzmq.Again:
continuetry:
sock.send(resp_bin)
exceptzmq.Again:
print('Reply timed out sending response')
breakexceptExceptionasex:
print(f'Unexpected exception: {ex}')
sock.close()
print(f'Rep thread exiting')
defreq_sock(count):
ctx=zmq.Context.instance()
sock=ctx.socket(zmq.REQ)
sock.RCVTIMEO=100sock.REQ_RELAXED=1# Does not seem to mattersock.LINGER=0sock.connect(URL)
i=0globalrunning, errorwhilerunning:
i+=1try:
msg='test'.encode('utf-8')
sock.send(msg)
exceptzmq.Again:
print('Req timed out sending')
try:
resp=sock.recv()
exceptzmq.Again:
print(f'Req timed out receiving response {i}')
error=Truetime.sleep(1) # Let Rep thread printbreakexceptExceptionasex:
print(f'Unexpected exception: {ex}')
ifi>count: # Unclear how many are needed for 100% prob of failure. 1M was not enough sometimes.breakrunning=Falsesock.close()
defrun_test(hwm, N):
globalrunning, errorrunning=Trueerror=Falserep_thread=threading.Thread(target=reply_sock, daemon=True, args=(hwm,))
rep_thread.start()
time.sleep(0.1) # Let Req get startedreq_sock(N)
rep_thread.join(timeout=3)
ifrep_thread.is_alive():
print('Req thread did not join')
if__name__=="__main__":
TEST=NoneN=5000000run_test(hwm=1, N=N) # Will usually fail < 5M msgsrun_test(hwm=2, N=N) # Never fails
Traceback, if applicable
No response
More info
No response
The text was updated successfully, but these errors were encountered:
This does sound like a libzmq bug. Req/rep with hwm greater than 1 doesn't make a lot of sense since it cannot ever have more than one message outstanding due to the req/rep cycle. Maybe it has to do with the timeouts leaving messages unsent for a short time. Feel free to report it on the libzmq repo, since I don't think there's anything pyzmq can do about low-level socket behavior like hwm.
This is a pyzmq bug
What pyzmq version?
26.0.3
What libzmq version?
4.3.5
Python version (and how it was installed)
Python 3.12 installed via apt from ppa:deadsnakes/ppa
OS
ubuntu 20.04 inside docker running on ubuntu 20.04 host
What happened?
With a REP socket and a REP socket running in a separate thread, sending request-replies over and over with a SNDHWM=1 on the REP socket, eventually, the REP send hangs. The REP does not throw zmq.Again or zmq.Error but the message appears to get lost. The example shows that the REQ times out receiving the response and then sets a global error flag that shows that the REP thread moved past the send (where the message is lost) and is waiting for the next request.
SNDHWM does not really make sense for a REQ socket but we inadvertently set HWM=1 for all our sockets and occasionally experienced this failure so thought it should be reviewed. Setting HWM=2 appears to resolve the issue.
Code to reproduce bug
Traceback, if applicable
No response
More info
No response
The text was updated successfully, but these errors were encountered: