3.11: Deletion of a queue with a lot of bindings causes a delay in publishing to exchanges on the other end of those bindings #12927

lchenay · 2024-12-12T16:33:27Z

lchenay
Dec 12, 2024

Describe the bug

When a queue is deleted with multiple bindings, we do observe a lock of any publishing message during a significant amount of time.

Reproduction steps

Minimal step to reproduce:

Initial state:

create an exchange com.exchange
create a queue com.queue
create 3000 bindings from com.exchange to com.queue

Actions:

send in loop messages and measure the timing to receive a confirmation. KPI of control for the bugs.
delete the queue, or close the channel and rely on auto-delete feature

Behavior observed:

In small environment we observe 70s of lock on those exchange, where any published message are not confirmed.
- We do not observe any message lost
In production environment we do observe 20s with only 3000 bindings

Given the number of bindings (we have up to 10000 bindings sometimes) and some other factor not yet identified, this lock mechanism could last up to 30minutes in production.

If we do explicit remove all bindings before the queue deletion, we don't observe any lock mechanism.
Sadly it's not something doable in our context where application could get forceKilled or crash (and so rely on auto-deletion feature).

Expected behavior

Deletion of a queue should not affect exchange performance.

Additional context

RabbitMQ 3.11.13
Erlang 25.0.4

3 X (8CPU 64Go) clusters

400 messages per seconds, 250 exchanges, 1600 queues, 3000 channels, 900 connections

Answered by lchenay

Dec 17, 2024

In that case, I will not deep dive further this Mnesia issue / reproduce case.

I close the topic and will deal with potential upgrade.
Thanks all!

View full answer

mkuratczyk · 2024-12-12T16:51:45Z

mkuratczyk
Dec 12, 2024
Maintainer

Does it happen on a supported version of RabbitMQ (4.0 that is)?
Rather than the prose, can you provide commands / code we can execute to trigger this? There's a lot of details missing from your description. Please don't make us guess - provide an executable test case

0 replies

michaelklishin · 2024-12-12T17:07:13Z

michaelklishin
Dec 12, 2024
Maintainer

@lchenay FYI, all "upvote fests" are perceived very negatively by our team. A 31 minute old issue immediately attracts a few 👍s, I wonder what may be going on… We are not dummies.

3.11.x has long been out of community support. In fact, it is out of any kind of support, including for paying users.

We need a way to reproduce against 4.0.4. Deleting a lot of bindings is an expensive operation with Mnesia, that's a known fact but a 70s delay sounds completely excessive, and 30 minutes sounds so high that it might be unrelated to that known problem. It sounds like a deadlock in the schema data store.

I have good news for you: in 4.0.4 you can switch to Khepri which is very different from Mnesia when it comes to multi-row operations. It won't lock tables the same way, which very likely is the culprit: Mnesia table locks are taken when N bindings are deleted at once, which is the case when a queue or an exchange is deleted (so are all of its bindings).

Finally, I am not intelligent enough to know what "KPI of control for the bugs" even means.

0 replies

michaelklishin · 2024-12-12T17:13:28Z

michaelklishin
Dec 12, 2024
Maintainer

Another observation: a queue with thousands of bindings is a very rare workload. An exchange with thousands of bindings is fairly common.

If that exchange is a fanout, these days you arguably should use a single stream (streams shipped in 3.9.x) with non-destructive, repeated consumption instead of fanouts to thousands of queues that need thousands bindings that need a lot of schema data store locking, which is expensive and will affect publishers because routing needs to access those bindings.

While not useful specifically to avoid mass deletions of a large number of bindings, 3.11.x even includes superstreams, which prove that by 3.11, streams were fairly mature and had features added on top of them, in both open source and Tanzu RabbitMQ.

0 replies

lchenay · 2024-12-12T19:19:54Z

lchenay
Dec 12, 2024
Author

Thanks you sincerly for the quick answer! Was not expected that much reactivity.

@mkuratczyk will work on exact script reproduce steps as agnostic as possible ; share it ; and will run it on as much version as possible to increase the discussion data.

@michaelklishin I have miss the information of end of support. I haven't this visibility and will ping the infrastructure team right now.

@michaelklishin Sorry for "emoji fest". All of them are part of my company, and are I suppose more a joy expression after 2 weeks of strong investigation on our multiple production incident, rather any attempts of issue prioritisation by-pass. I will make them the feedback.

@michaelklishin I will carefully read all those streams.

Clearly our implementation is not good. Having all those thousands bindings on a single queue seems clearly a miss-usage of the tooling.

1 reply

michaelklishin Dec 12, 2024
Maintainer

I did not mean to say that your implementation "was not good". We would like to support such mass deletions of bindings and with Khepri, I believe we already do (well, up to a certain limit anyway).

But with Mnesia this will result in substantial schema data store (well, Mnesia) table lock contention, and the binding tables are used by the publishing code path because bindings are essential for routing for most exchange types (except for fanouts, but we likely try to load bindings in that case as well).

In fact, during the 4.0 development we have spent a few weeks investigating ways to make Khepri work better with an example workload with 100K bindings on a single queue if my memory serves. It should take way less than 30m now for 100K bindings, although can be comparable to 70s.

So, there are three options for you to consider:

Use a stream instead for large fanouts (on 3.11 if you want or need to stick to an unsupported version for now)
Move to 4.0.x and enable Khepri, which is fully supported now
Combine 1 with 2

@lchenay please pass along these two links to the infrastructure team, just in case:

lchenay · 2024-12-12T21:32:33Z

lchenay
Dec 12, 2024
Author

Using this quick docker-compose to simulate a rabbtimq Cluster locally, testing versions : 3.11.13, 3.13.7 and 4.0.4:

I'm not able yet to reproduce the same amount of slowness as 20s/70s
I do see an impact on cluster size
I do reach 1.7s with 10k bindings on 1 queue, with 4 locals node

--

I must test with a 1/2ms latency between node to be more realistic
I must test with Khepri

--

Here the result locally:

// 3.11.13
// 1 node: 350ms
// 2 node: 800ms
// 3 node: 1200ms
// 4 node: 1700ms

// 3.13.7
// 1 node: 330ms
// 2 node: 750ms
// 3 node: 1200ms
// 4 node: 1700ms

// 4.0.4
// 1 node: 330ms
// 2 node: 750ms
// 3 node: 1200ms
// 4 node: 1700ms

Code to automate all bindings / measures

import amqplib from 'amqplib';

const sendAndMeasure = async (ch1, exchangeName, cb) => {
    const start = new Date();
    ch1.publish(exchangeName, '', Buffer.from('Hello World!'), undefined, () => {
        console.log('Time taken to publish 1 messages:', (new Date()) - start, 'ms');
        cb && cb()
    });   
}

(async () => {
    const exchangeName="com.exchange";
    const queueName="com.queue";

    const conn = await amqplib.connect('amqp://guest:[email protected]:5672/');
    const ch1 = await conn.createConfirmChannel();
    await ch1.assertExchange(exchangeName, 'fanout', { durable: false });
    await ch1.assertQueue(queueName, { durable: false, autoDelete: true });

    // measure when nothing is bound to the queue
    sendAndMeasure(ch1, exchangeName);

    const subCh = await conn.createChannel();
    for (let i = 0; i < 10000; i++) {
        await subCh.bindQueue(queueName, exchangeName, 'random_binding_' + (Math.round(Math.random()*100000)));
    }

    subCh.deleteQueue(queueName);

    // measure during the deletion of the queue
    sendAndMeasure(ch1, exchangeName, process.exit)
})();

0 replies

lchenay · 2024-12-13T08:16:40Z

lchenay
Dec 13, 2024
Author

Adding 1ms delay on network using tc on each container (to simulate inter AZ latency, and be more realistic)

for i in $(seq 1 4); do
    sudo docker exec -it -u=0 rabbitmq-cluster-docker-rabbitmq$i-1 apt update
    sudo docker exec -it -u=0 rabbitmq-cluster-docker-rabbitmq$i-1 apt install iproute2 -y
done

for i in $(seq 1 4); do
    sudo docker exec -it -u=0 rabbitmq-cluster-docker-rabbitmq$i-1 tc qdisc add dev eth0 root netem delay 1ms
done

I did reproduce:
With Mnesia, vesion 4.0.4 and 4 nodes :

With 1000 bindings => 16s to publish 1 message juste after triggering the queue deletion
With 5000 bindings => 79s to publish 1 message juste after triggering the queue deletion

With Khepri, vesion 4.0.4 and 4 nodes :

with 1000 bindings => 45ms to publish 1 message juste after triggering the queue deletion
with 5000 bindings => 400ms to publish 1 message juste after triggering the queue deletion
with 10000 bindings => 800ms to publish 1 message juste after triggering the queue deletion

8 replies

lchenay Dec 17, 2024
Author

Yes khepri does solve the issue.

It's not clear if we should consider Khepri as beta or if suitable for production.
More over as the bugs is reproducible on Mnesia 4.0.4, is Mnesia will be deprecated or not?

mkuratczyk Dec 17, 2024
Maintainer

The very first item of the 4.0 release highlights is pretty clear:

Khepri, an alternative schema data store developed to replace Mnesia, has matured and is now fully supported (it previously was an experimental feature)

mkuratczyk Dec 17, 2024
Maintainer

https://github.com/rabbitmq/rabbitmq-server/releases/tag/v4.0.1

lchenay Dec 17, 2024
Author

In that case, I will not deep dive further this Mnesia issue / reproduce case.

I close the topic and will deal with potential upgrade.
Thanks all!

Answer selected by michaelklishin

michaelklishin Dec 17, 2024
Maintainer

@lchenay Khepri has a dedicated documentation guide, in fact, multiple guides.

michaelklishin Dec 17, 2024
Maintainer

@lchenay on an unrelated note, 4 node clusters (or any other number that's even) are explicitly recommended against.

If the forth node was added, say, to handle more client connections, then there's nothing wrong with that but it does not give you any fault tolerance gains, and you can end up in a split where there is no majority on either side, so Raft won't be able to do its job as expected. And Khepri's distributed features are entirely based on Raft.

lchenay Dec 17, 2024
Author

Thanks you for this feedback.

We do not use any even number nowhere. It was only indirectly in some of my attempt to reproduce the issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3.11: Deletion of a queue with a lot of bindings causes a delay in publishing to exchanges on the other end of those bindings #12927

{{title}}

Replies: 6 comments 9 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

3.11: Deletion of a queue with a lot of bindings causes a delay in publishing to exchanges on the other end of those bindings #12927

lchenay Dec 12, 2024

Describe the bug

Reproduction steps

Expected behavior

Additional context

Replies: 6 comments · 9 replies

mkuratczyk Dec 12, 2024 Maintainer

michaelklishin Dec 12, 2024 Maintainer

michaelklishin Dec 12, 2024 Maintainer

lchenay Dec 12, 2024 Author

michaelklishin Dec 12, 2024 Maintainer

lchenay Dec 12, 2024 Author

lchenay Dec 13, 2024 Author

lchenay Dec 17, 2024 Author

mkuratczyk Dec 17, 2024 Maintainer

mkuratczyk Dec 17, 2024 Maintainer

lchenay Dec 17, 2024 Author

michaelklishin Dec 17, 2024 Maintainer

michaelklishin Dec 17, 2024 Maintainer

lchenay Dec 17, 2024 Author

lchenay
Dec 12, 2024

Replies: 6 comments 9 replies

mkuratczyk
Dec 12, 2024
Maintainer

michaelklishin
Dec 12, 2024
Maintainer

michaelklishin
Dec 12, 2024
Maintainer

lchenay
Dec 12, 2024
Author

michaelklishin Dec 12, 2024
Maintainer

lchenay
Dec 12, 2024
Author

lchenay
Dec 13, 2024
Author

lchenay Dec 17, 2024
Author

mkuratczyk Dec 17, 2024
Maintainer

mkuratczyk Dec 17, 2024
Maintainer

lchenay Dec 17, 2024
Author

michaelklishin Dec 17, 2024
Maintainer

michaelklishin Dec 17, 2024
Maintainer

lchenay Dec 17, 2024
Author