You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I use diskqueue as a staging local queue before pushing to Kafka. I use sarama go kafka library for Kafka part.
I need to have high reliability of delivery, and can tolerate small amount of duplicates (they have extra metadata in data to deduplicate easily on a consumer side). To achieve high reliability and also absorb traffic spikes, this is what I do: In one go routine I queue to diskqueue, and in other I do blocking peek, and when there is something I send it to Kafka, once Kafka is happy, I do a read, removing it from a local diskqueue.
I could achieve this also in more complex fashion by using various synchronization primitives and switching between local diskqueue and kafka depending on a backpressure and in-memory queue size, but that would be really quite complex and error prone. So I just decopouled the two (at the cost of allways going to disk, which in the end is not bad, because this protect me from any program crash or exit).
The issue is peeking performance. Because I can only peek one element, I cannot do Kafka batching, which limits throughput, and makes compression less effective.
I would like to peek at say up to 100 next elements, send them in batch, then remove all sent elements (using readChan).
The text was updated successfully, but these errors were encountered:
I use diskqueue as a staging local queue before pushing to Kafka. I use
sarama
go kafka library for Kafka part.I need to have high reliability of delivery, and can tolerate small amount of duplicates (they have extra metadata in data to deduplicate easily on a consumer side). To achieve high reliability and also absorb traffic spikes, this is what I do: In one go routine I queue to diskqueue, and in other I do blocking peek, and when there is something I send it to Kafka, once Kafka is happy, I do a read, removing it from a local diskqueue.
I could achieve this also in more complex fashion by using various synchronization primitives and switching between local diskqueue and kafka depending on a backpressure and in-memory queue size, but that would be really quite complex and error prone. So I just decopouled the two (at the cost of allways going to disk, which in the end is not bad, because this protect me from any program crash or exit).
The issue is peeking performance. Because I can only peek one element, I cannot do Kafka batching, which limits throughput, and makes compression less effective.
I would like to peek at say up to 100 next elements, send them in batch, then remove all sent elements (using readChan).
The text was updated successfully, but these errors were encountered: