Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancement(codecs): Add semantic meaning to gelf decoded data. #22003

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

apollo13
Copy link
Contributor

@apollo13 apollo13 commented Dec 10, 2024

Summary

Set semantic meaning for gelf decoded messages.

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

How did you test this PR?

Inserted data using the following config and verified that the fields where set as expected.


sources:
  my_source_id:
    type: demo_logs
    format: json
    log_namespace: true

transforms:
  test:
    type: remap
    inputs:
      - my_source_id
    source: |

      log(.)
      log(%)

sinks:
  my_sink_id:
    type: elasticsearch
    endpoints:
      - https://localhost:9200
    inputs:
        - test
    api_version: v8
    tls:
      verify_certificate: false
    mode: data_stream
    auth:
      strategy: basic
      user: admin
      password: xxx
    data_stream:
      auto_routing: true
      sync_fields: false

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the "no-changelog" label to this PR.

Checklist

  • Please read our Vector contributor resources.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run dd-rust-license-tool write to regenerate the license inventory and commit the changes (if any). More details here.

References

@apollo13 apollo13 requested a review from a team as a code owner December 10, 2024 12:11
@apollo13 apollo13 force-pushed the gelf_decoder_metadata branch from 8189ba0 to b16e850 Compare December 10, 2024 12:13
@apollo13 apollo13 force-pushed the gelf_decoder_metadata branch from b16e850 to 591f1ee Compare December 10, 2024 13:07
@jszwedko jszwedko enabled auto-merge December 10, 2024 14:16
auto-merge was automatically disabled December 11, 2024 19:28

Head branch was pushed to by a user without write access

@apollo13 apollo13 force-pushed the gelf_decoder_metadata branch from 591f1ee to 0545b71 Compare December 11, 2024 19:28
@apollo13 apollo13 force-pushed the gelf_decoder_metadata branch from 0545b71 to 9fdceca Compare December 11, 2024 19:29
@pront pront enabled auto-merge December 11, 2024 20:01
Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @apollo13 !

@jszwedko
Copy link
Member

@apollo13 it looks like there is a failing gelf test here when you get some time.

@apollo13
Copy link
Contributor Author

@jszwedko Yes, as I have written on discord the failing tests are the new tests that I added. While manual testing did work unless I tested something completely wrong it looks as if the log.get_by_meaning returns nothing. I am not sure what I would need to fix here though or if this is expected at this stage of parsing :/ Do you have any pointers?

@jszwedko
Copy link
Member

@jszwedko Yes, as I have written on discord the failing tests are the new tests that I added. While manual testing did work unless I tested something completely wrong it looks as if the log.get_by_meaning returns nothing. I am not sure what I would need to fix here though or if this is expected at this stage of parsing :/ Do you have any pointers?

Ah, I see, yes, I think the schema isn't applied to the log event at the level this test is at 🤔 Using dbg(&log) seems to show no schema attached. I think that is associated one level up. I see the syslog codec lacks tests for meaning, even though it associates some, so I'd be ok with dropping the tests you have here.

Signed-off-by: Jesse Szwedko <[email protected]>
@jszwedko
Copy link
Member

@apollo13 actually, I was able to modify the tests to set the schema definition (see 45ccc73). It is still failing for message though if you want to take a look. host seems to be working correctly.

@apollo13
Copy link
Contributor Author

Thanks @jszwedko, I am a bit baffled about what is happening though. During the test failure the log event and metadata is shown. The meaning is setup as follows:

                meaning: {
                    "host": Valid(
                        .host,
                    ),
                    "message": Valid(
                        .short_message,
                    ),
                    "timestamp": Valid(
                        .timestamp,
                    ),
                },

Which is what we have defined (ie message should be taken from short_message). But if we look at the log event itself we see:

                KeyString(
                    "message",
                ): Bytes(
                    b"A short message that helps you identify what is going on",
                ),

instead of short_message. Is that a result of parsing into the legacy namespace which rewrites short_message to message or similar?

@apollo13
Copy link
Contributor Author

apollo13 commented Dec 21, 2024

Ah this always parses it into message:

let mut log = LogEvent::from_str_legacy(parsed.short_message.to_string());

@apollo13
Copy link
Contributor Author

I see the syslog codec lacks tests for meaning, even though it associates some, so I'd be ok with dropping the tests you have here.

Looking at the syslog codec it handles the log namespace in schema_definition, I think this might be needed for the gelf decoder as well, what do you think?

@jszwedko
Copy link
Member

jszwedko commented Jan 2, 2025

I see the syslog codec lacks tests for meaning, even though it associates some, so I'd be ok with dropping the tests you have here.

Looking at the syslog codec it handles the log namespace in schema_definition, I think this might be needed for the gelf decoder as well, what do you think?

Ah, yeah, I think you may be right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants