You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We had investigated a weird issue happening on the channel level. On several channels we have identified that it is possible to have duplicate message ids.
To reach that state you have to reset the backlog_id_key (message_id) key without resetting the backlog_key (channel backlog) itself.
The only way message_id gets reset as far as I know is through expiration inside the transactional LUA script but in many cases we didn't expect this to expire at all since new messages where added daily to the channel (default expiration is 7.days).
since it gets the first of the items having the same score without taking into account the global_id order. In the usual case you will always have 1 item there, but in our case it happens to be 2, so it's very possible that we can consume stale data.
Questions
Are there other ways to reset or expire just the backlog_id_key in message bus channel lifecycle? That could explain this inconsistent state of the channel?
Did you ever face anything similar? The LUA script is nearly identical through the gem versions.
Is it reasonable to handle this edge case and always return the last item for a message_id based on the global_id on the redis.zrangebyscore backlog_key, message_id, message_id with a PR?
Versions
redis: 6.x
redis gem: 4.8.1
message_bus: 3.3.8 (we are upgrading to 4.3.8 since having duplicates results in infinite loops on the old version 🙃 )
The text was updated successfully, but these errors were encountered:
The eviction policy on Redis itself is the most important factor when it comes to invalidate keys. Even though message bus is using the same TTL for keys inside the LUA script, the global eviction policy in Redis can affect keys in random ways when memory pressure is present.
Using volatile-lru for example can invalidate backlogs without invalidating the backlog_id_key and vice versa.
In my opinion this could be part of the documentation for the Redis backend in the project.
Description
We had investigated a weird issue happening on the channel level. On several channels we have identified that it is possible to have duplicate message ids.
The result of this is something like
To reach that state you have to reset the
backlog_id_key
(message_id) key without resetting thebacklog_key
(channel backlog) itself.The only way
message_id
gets reset as far as I know is through expiration inside the transactional LUA script but in many cases we didn't expect this to expire at all since new messages where added daily to the channel (default expiration is7.days
).The code that affects us the most right now is the following
https://github.com/discourse/message_bus/blob/main/lib/message_bus/backends/redis.rb#L228-L238
since it gets the first of the items having the same score without taking into account the
global_id
order. In the usual case you will always have 1 item there, but in our case it happens to be 2, so it's very possible that we can consume stale data.Questions
backlog_id_key
in message bus channel lifecycle? That could explain this inconsistent state of the channel?message_id
based on theglobal_id
on theredis.zrangebyscore backlog_key, message_id, message_id
with a PR?Versions
redis: 6.x
redis gem: 4.8.1
message_bus: 3.3.8 (we are upgrading to 4.3.8 since having duplicates results in infinite loops on the old version 🙃 )
The text was updated successfully, but these errors were encountered: