-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Add use_metadata option for index, id, and pipeline #1041
base: main
Are you sure you want to change the base?
Conversation
e82b2aa
to
03d12d0
Compare
I should also mention the |
However that would only work with dynamic values (sprintf) if this was also implemented Default value for sprintf interpolation (Feature)... also outstanding since 2016 |
03d12d0
to
a052928
Compare
a052928
to
e6278b8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FIrst: thank you so much for taking the time to put this together, and for including helpful docs and specs to demonstrate the behavior. I can certainly see the value of using this feature while "reindexing" existing Elasticsearch data.
A couple things to note:
-
In the upcoming Logstash 8,
pipeline.ecs_compatibility
will be on-by-default, so we need to consider where the metadata will be if we are consuming from an Elasticsearch input that is running in an ECS compatibility mode:When ECS compatibility is disabled,
docinfo_target
uses the "@metadata
" field as a default, with ECS enabled the plugin uses a naming convention "[@metadata][input][elasticsearch]
" as a default target for placing document information.
-- Elasticsearch input plugin: Compatibility with the Elastic Common Schema (ECS)This may be as simple as selecting our own docinfo target when the plugin is initialized based on its effective ECS compatibility mode, or we may need to add a config option (or overload the proposed
use_metadata
option) to allow a user to configure where on the event they expect the docinfo to be. -
I've also left a note inline questioning the utility of resolving sprintf format strings. I think this is a risky feature, since it can cause a poorly-formed event to override the plugin's explicit configuration in a way that could cause very confusing behavior.
I will take the time to come back to this in a week or so after considering these things.
* Value type is <<boolean,boolean>> | ||
* Default value is `false` | ||
|
||
Use and preference output parameters defined in the document metadata. The <<plugins-{type}s-{plugin}-index>> (`@metadata._index`), <<plugins-{type}s-{plugin}-document_id>> (`@metadata._id_`), and <<plugins-{type}s-{plugin}-pipeline>> (`@metadata.pipeline`) can be set by their respective `@metadata` fields. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like _id
has a trailing underscore:
Use and preference output parameters defined in the document metadata. The <<plugins-{type}s-{plugin}-index>> (`@metadata._index`), <<plugins-{type}s-{plugin}-document_id>> (`@metadata._id_`), and <<plugins-{type}s-{plugin}-pipeline>> (`@metadata.pipeline`) can be set by their respective `@metadata` fields. | |
Use and preference output parameters defined in the document metadata. The <<plugins-{type}s-{plugin}-index>> (`@metadata._index`), <<plugins-{type}s-{plugin}-document_id>> (`@metadata._id`), and <<plugins-{type}s-{plugin}-pipeline>> (`@metadata.pipeline`) can be set by their respective `@metadata` fields. |
I think we should use the Logstash field reference notation for these, instead of the Elasticsearch dot-notation:
Use and preference output parameters defined in the document metadata. The <<plugins-{type}s-{plugin}-index>> (`@metadata._index`), <<plugins-{type}s-{plugin}-document_id>> (`@metadata._id_`), and <<plugins-{type}s-{plugin}-pipeline>> (`@metadata.pipeline`) can be set by their respective `@metadata` fields. | |
Use and preference output parameters defined in the document metadata. The <<plugins-{type}s-{plugin}-index>> (`[@metadata][index]`), <<plugins-{type}s-{plugin}-document_id>> (`[@metadata][_id]`), and <<plugins-{type}s-{plugin}-pipeline>> (`[@metadata][pipeline]`) can be set by their respective `@metadata` fields. |
event_index = event.get("[@metadata][_index]") | ||
params[:_index] = event.sprintf(event_index) if event_index && !event_index.empty? | ||
event_pipeline = event.get("[@metadata][pipeline]") | ||
params[:pipeline] = event.sprintf(event_pipeline) if event_pipeline && !event_pipeline.empty? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The specs don't have any coverage for sprintf-ing the value in [@metadata][_index]
, and I personally see this as a liability since Event#sprintf
will does not support default values -- when a en event does include an [@metadata][_event]
with a value like logstash-%{missing}
and does not have a value for that referenced field, we end up setting the event's index to the literal logstash-%{missing}
which can be problematic. I would rather not support sprintf's and have a more constrained surface-area:
event_index = event.get("[@metadata][_index]") | |
params[:_index] = event.sprintf(event_index) if event_index && !event_index.empty? | |
event_pipeline = event.get("[@metadata][pipeline]") | |
params[:pipeline] = event.sprintf(event_pipeline) if event_pipeline && !event_pipeline.empty? | |
event_index = event.get("[@metadata][_index]") | |
params[:_index] = event_index if event_index && !event_index.empty? | |
event_pipeline = event.get("[@metadata][pipeline]") | |
params[:pipeline] = event_pipeline if event_pipeline && !event_pipeline.empty? |
@@ -251,6 +251,9 @@ class LogStash::Outputs::ElasticSearch < LogStash::Outputs::Base | |||
# ILM policy to use, if undefined the default policy will be used. | |||
config :ilm_policy, :validate => :string, :default => DEFAULT_POLICY | |||
|
|||
# ILM policy to use, if undefined the default policy will be used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✂️ the comment does not make sense - seems to have been copy pasted
Currently it's not possible to have a single elasticsearch output that can set a
document_id
and also use an autogenerated ID. In our case we have a large number of clusters and require at a minimum two outputs per cluster because of this, the increasing number of outputs has a noticeable performance impact.This PR addresses that by implementing functionality similar to Beats where the
index
,id
, andpipeline
can be defined by values in the@metadata
object:More than half the battle while developing locally was testing the plugin. There's very little definitive information on expected environment, the official documentation doesn't really do a good job here. This community presentation was useful, but ultimately didn't help testing. I just ended up trying to replicate the CI pipeline. As such the tests I've added are probably sub-optimal.
Happy to hear any feedback, and if necessary I can submit a PR just focussing on making the
document_id
behaviour similar topipeline
where empty values get dropped.