Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem scaling Logstash cluster for single Azure Event Hub #41

Open
kristianvld opened this issue Jun 24, 2019 · 4 comments
Open

Problem scaling Logstash cluster for single Azure Event Hub #41

kristianvld opened this issue Jun 24, 2019 · 4 comments
Assignees
Labels
int-shortlist invalid This doesn't seem right

Comments

@kristianvld
Copy link

We have a single Azure Event Hub from which we want to read and process event logs. There are around 200k+ events feed into the Hub every 30 seconds. We are currently hosting everything in Azure. If we configure a single Logstash VM, after some optimisation and tinkering of settings, we are able to read around 180k messages every 30 seconds (±10k). The machine is then running on average between 95-100% CPU usage and using 9 out of 16 GB of RAM (stats pulled from htop). As soon as I add the storage_connection option to the config, the single machine drops down to around 100k messages per second. After some tweaking, I'm able to get it up to around 120k. The machine now runs between 30-50% CPU usage and about 7GB of RAM used. If I try to add another machine, identical specs and same configs, then the total number of messages processed feed into ES are around 140k, adding a third machine raises the number to around 150k.

Anyone knows what could be the cause of the problem? Just adding the blob storage to a single machine almost halves the performance, but can be mitigated through adding more threads and higher batch sizes. All VMs, Storage Account and Azure Event hub are located under the same Azure subscription and in the same region. I noticed that upgrading to a premium Storage Account raised the number with about 5k messages per 30 seconds.

Input config:

input {
  azure_event_hubs {
     event_hub_connections => ["Endpoint=sb://.....servicebus.windows.net/;SharedAccessKeyName=....;SharedAccessKey=....;EntityPath=...."]
     threads => 32
     codec => plain {
       charset => "ISO-8859-1"
     }
     max_batch_size => 1000
     storage_connection => "DefaultEndpointsProtocol=https;AccountName=....;AccountKey=....;EndpointSuffix=core.windows.net"
     storage_container => "logstash-proxy"
     decorate_events => false
  }
}

pipelines.yml:

- pipeline.id: main
 path.config: "/etc/logstash/conf.d/*.conf"
 pipeline.workers: 16
 pipeline.batch.size: 500

Our cluster was initially deployed using the Azure Marketplace Elasticsearch template. I do not believe ES to be the bottleneck given that we were able to feed into it 180k messages from a single machine, and did at that point only max out at around 50-70% CPU usage.

Any tips or help in improving our performance would be much appreciated. If this is the incorrect place to post such a problem, then I apologise, however this seems to be some problem either in the azure_event_hub input plugin itself or in my configuration of it.

@robbavey robbavey self-assigned this Feb 28, 2020
@henrylilei
Copy link

I was facing similar issue when scaling the Logstash in our setup and found out the official Logstash image was still using the 1.1.1 plugin which doesn't have this critical bug fix #52. Unless you updated the image with the most recent azure eventhub plugin, you would stuck with a batch size of 10 no matter what batch_size number is in the config.

@kristianvld
Copy link
Author

In the end, we opted for using the Kafka interface to Azure Event Hub, which yielded much better performance and scaling.

@lucianaparaschivei
Copy link

is it an option to scale out the logstash instances on a single event hub? We are running this setup with about 6 containers of logstash pointing to same single event hub. Works ok, but does this plugin supports this scale method without possible issues?
we seen some exceptions like
Partition: 22 experienced an error com.microsoft.azure.eventhubs.ReceiverDisconnectedException: New receiver 'nil' with higher epoch of '0' is created hence current receiver 'nil' with epoch '0' is getting disconnected.

@yaauie
Copy link
Contributor

yaauie commented Nov 30, 2021

is it an option to scale out the logstash instances on a single event hub? We are running this setup with about 6 containers of logstash pointing to same single event hub
#41 (comment)

Yes, and also no.

It depends on how many partitions are in the underlying topic. Each topic-partition can only be assigned to one consumer in the consumer group, so scaling your Logstashes beyond your total number of partitions in the topic will not yield higher throughput.

The log message you reference is part of a "rebalance" of consumers in the group, which can be either a normal part of operation or a signal of back-pressure. The consumers in the group attempt to acquire exclusive leases on partitions in the topic, and some negotiation and log-noise happens when two consumers are both attempting to acquire a lease on the same partition. This can occur as a normal part of startup when one or more new consumers is added to the group, or when back-pressure from the pipeline prevents an input from getting its items in the queue fast enough (in which case the partition is reassigned to another consumer).

@roaksoax roaksoax added invalid This doesn't seem right int-shortlist labels Nov 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
int-shortlist invalid This doesn't seem right
Projects
None yet
Development

No branches or pull requests

6 participants