-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem scaling Logstash cluster for single Azure Event Hub #41
Comments
I was facing similar issue when scaling the Logstash in our setup and found out the official Logstash image was still using the 1.1.1 plugin which doesn't have this critical bug fix #52. Unless you updated the image with the most recent azure eventhub plugin, you would stuck with a batch size of 10 no matter what batch_size number is in the config. |
In the end, we opted for using the Kafka interface to Azure Event Hub, which yielded much better performance and scaling. |
is it an option to scale out the logstash instances on a single event hub? We are running this setup with about 6 containers of logstash pointing to same single event hub. Works ok, but does this plugin supports this scale method without possible issues? |
Yes, and also no. It depends on how many partitions are in the underlying topic. Each topic-partition can only be assigned to one consumer in the consumer group, so scaling your Logstashes beyond your total number of partitions in the topic will not yield higher throughput. The log message you reference is part of a "rebalance" of consumers in the group, which can be either a normal part of operation or a signal of back-pressure. The consumers in the group attempt to acquire exclusive leases on partitions in the topic, and some negotiation and log-noise happens when two consumers are both attempting to acquire a lease on the same partition. This can occur as a normal part of startup when one or more new consumers is added to the group, or when back-pressure from the pipeline prevents an input from getting its items in the queue fast enough (in which case the partition is reassigned to another consumer). |
We have a single Azure Event Hub from which we want to read and process event logs. There are around 200k+ events feed into the Hub every 30 seconds. We are currently hosting everything in Azure. If we configure a single Logstash VM, after some optimisation and tinkering of settings, we are able to read around 180k messages every 30 seconds (±10k). The machine is then running on average between 95-100% CPU usage and using 9 out of 16 GB of RAM (stats pulled from htop). As soon as I add the
storage_connection
option to the config, the single machine drops down to around 100k messages per second. After some tweaking, I'm able to get it up to around 120k. The machine now runs between 30-50% CPU usage and about 7GB of RAM used. If I try to add another machine, identical specs and same configs, then the total number of messages processed feed into ES are around 140k, adding a third machine raises the number to around 150k.Anyone knows what could be the cause of the problem? Just adding the blob storage to a single machine almost halves the performance, but can be mitigated through adding more threads and higher batch sizes. All VMs, Storage Account and Azure Event hub are located under the same Azure subscription and in the same region. I noticed that upgrading to a premium Storage Account raised the number with about 5k messages per 30 seconds.
Input config:
pipelines.yml:
Our cluster was initially deployed using the Azure Marketplace Elasticsearch template. I do not believe ES to be the bottleneck given that we were able to feed into it 180k messages from a single machine, and did at that point only max out at around 50-70% CPU usage.
Any tips or help in improving our performance would be much appreciated. If this is the incorrect place to post such a problem, then I apologise, however this seems to be some problem either in the azure_event_hub input plugin itself or in my configuration of it.
The text was updated successfully, but these errors were encountered: