Problem scaling Logstash cluster for single Azure Event Hub #41

kristianvld · 2019-06-24T10:54:43Z

We have a single Azure Event Hub from which we want to read and process event logs. There are around 200k+ events feed into the Hub every 30 seconds. We are currently hosting everything in Azure. If we configure a single Logstash VM, after some optimisation and tinkering of settings, we are able to read around 180k messages every 30 seconds (±10k). The machine is then running on average between 95-100% CPU usage and using 9 out of 16 GB of RAM (stats pulled from htop). As soon as I add the storage_connection option to the config, the single machine drops down to around 100k messages per second. After some tweaking, I'm able to get it up to around 120k. The machine now runs between 30-50% CPU usage and about 7GB of RAM used. If I try to add another machine, identical specs and same configs, then the total number of messages processed feed into ES are around 140k, adding a third machine raises the number to around 150k.

Anyone knows what could be the cause of the problem? Just adding the blob storage to a single machine almost halves the performance, but can be mitigated through adding more threads and higher batch sizes. All VMs, Storage Account and Azure Event hub are located under the same Azure subscription and in the same region. I noticed that upgrading to a premium Storage Account raised the number with about 5k messages per 30 seconds.

Input config:

input {
  azure_event_hubs {
     event_hub_connections => ["Endpoint=sb://.....servicebus.windows.net/;SharedAccessKeyName=....;SharedAccessKey=....;EntityPath=...."]
     threads => 32
     codec => plain {
       charset => "ISO-8859-1"
     }
     max_batch_size => 1000
     storage_connection => "DefaultEndpointsProtocol=https;AccountName=....;AccountKey=....;EndpointSuffix=core.windows.net"
     storage_container => "logstash-proxy"
     decorate_events => false
  }
}

pipelines.yml:

- pipeline.id: main
 path.config: "/etc/logstash/conf.d/*.conf"
 pipeline.workers: 16
 pipeline.batch.size: 500

Our cluster was initially deployed using the Azure Marketplace Elasticsearch template. I do not believe ES to be the bottleneck given that we were able to feed into it 180k messages from a single machine, and did at that point only max out at around 50-70% CPU usage.

Any tips or help in improving our performance would be much appreciated. If this is the incorrect place to post such a problem, then I apologise, however this seems to be some problem either in the azure_event_hub input plugin itself or in my configuration of it.

The text was updated successfully, but these errors were encountered:

henrylilei · 2020-04-20T10:02:47Z

I was facing similar issue when scaling the Logstash in our setup and found out the official Logstash image was still using the 1.1.1 plugin which doesn't have this critical bug fix #52. Unless you updated the image with the most recent azure eventhub plugin, you would stuck with a batch size of 10 no matter what batch_size number is in the config.

kristianvld · 2020-04-28T18:35:14Z

In the end, we opted for using the Kafka interface to Azure Event Hub, which yielded much better performance and scaling.

lucianaparaschivei · 2021-10-07T08:55:06Z

is it an option to scale out the logstash instances on a single event hub? We are running this setup with about 6 containers of logstash pointing to same single event hub. Works ok, but does this plugin supports this scale method without possible issues?
we seen some exceptions like
Partition: 22 experienced an error com.microsoft.azure.eventhubs.ReceiverDisconnectedException: New receiver 'nil' with higher epoch of '0' is created hence current receiver 'nil' with epoch '0' is getting disconnected.

yaauie · 2021-11-30T16:28:54Z

is it an option to scale out the logstash instances on a single event hub? We are running this setup with about 6 containers of logstash pointing to same single event hub
— #41 (comment)

Yes, and also no.

It depends on how many partitions are in the underlying topic. Each topic-partition can only be assigned to one consumer in the consumer group, so scaling your Logstashes beyond your total number of partitions in the topic will not yield higher throughput.

The log message you reference is part of a "rebalance" of consumers in the group, which can be either a normal part of operation or a signal of back-pressure. The consumers in the group attempt to acquire exclusive leases on partitions in the topic, and some negotiation and log-noise happens when two consumers are both attempting to acquire a lease on the same partition. This can occur as a normal part of startup when one or more new consumers is added to the group, or when back-pressure from the pipeline prevents an input from getting its items in the queue fast enough (in which case the partition is reassigned to another consumer).

robbavey self-assigned this Feb 28, 2020

roaksoax added invalid This doesn't seem right int-shortlist labels Nov 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem scaling Logstash cluster for single Azure Event Hub #41

Problem scaling Logstash cluster for single Azure Event Hub #41

kristianvld commented Jun 24, 2019

henrylilei commented Apr 20, 2020

kristianvld commented Apr 28, 2020

lucianaparaschivei commented Oct 7, 2021

yaauie commented Nov 30, 2021

Problem scaling Logstash cluster for single Azure Event Hub #41

Problem scaling Logstash cluster for single Azure Event Hub #41

Comments

kristianvld commented Jun 24, 2019

henrylilei commented Apr 20, 2020

kristianvld commented Apr 28, 2020

lucianaparaschivei commented Oct 7, 2021

yaauie commented Nov 30, 2021