Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Automatic Import] Log Format step fails due to invalid processor config generation #198340

Open
ilyannn opened this issue Oct 30, 2024 · 3 comments
Assignees
Labels
bug Fixes for quality problems that affect the customer experience Feature:AutomaticImport Team:Security-Scalability Team label for Security Integrations Scalability Team

Comments

@ilyannn
Copy link
Contributor

ilyannn commented Oct 30, 2024

Testing with the log samples from https://github.com/connamara/logstash-filter-fix_protocol. This is a format that looks like it should be supported:

logSamples:
  - "2015-08-26 23:08:38,096 FIX.4.2:DUMMY_INC->ANOTHER_INC: 8=FIX.4.2\x019=184\x0135=F\x0134=2\x0149=ANOTHER_INC\x0150=DefaultSenderSubID\x0152=20150826-23:08:38.094\x0156=DUMMY_INC\x011=DefaultAccount\x0111=clordid_of_cancel\x0141=151012569\x0154=1\x0155=ITER\x0160=20250407-13:14:15\x01167=FUT\x01200=201512\x0110=147\x01"
  - "2015-08-31 20:48:26,536 FIXT.1.1:DUMMY_INC->ANOTHER_INC: 8=FIXT.1.1\x019=189\x0135=W\x0134=5\x0149=DUMMY_INC\x0152=20150831-20:48:26.535\x0156=ANOTHER_INC\x0122=99\x0148=ITRZ21\x01262=req_A\x01268=2\x01269=0\x01270=0.01005\x01271=10\x01272=20150831\x01273=20:48:26.514\x01269=1\x01270=0.0101\x01271=2\x01272=20150831\x01273=20:48:26.514\x0110=123\x01"
  - "2015-08-31 17:48:20,890 FIXT.1.1:DUMMY_INC->ANOTHER_INC: 8=FIXT.1.1\x019=140\x0135=W\x0134=2\x0149=DUMMY_INC\x0152=20150831-17:48:20.890\x0156=ANOTHER_INC\x0122=99\x0148=.AQUA-W\x01262=golden_path_test\x01268=1\x01269=3\x01270=640754\x01272=20150831\x01273=17:48:20.882\x0110=070\x01"

we get the error:

Image

The generated processors are:

  - grok:
      field: message
      patterns:
        - "%{TIMESTAMP_ISO8601:ai_fix_202410301257.timestamp}\\s%{DATA:ai_fix_202410301257.protocol}:%{DATA:ai_fix_202410301257.source}->%{DATA:ai_fix_202410301257.destination}:\\s%{GREEDYDATA:message}"
  - kv:
      field: message
      field_split: (?=[0-9]+=)
      value_split: =
      trim_key: " "
      trim_value: " "
      target_field: ai_fix_202410301257.stream 

and the errors are:

  - message:
      - field [message] does not contain value_split [=]
  - message:
      - field [message] does not contain value_split [=]
  - message:
      - field [message] does not contain value_split [=]

We should take a look into whether this is a correct classification and how to improve these processors.

@ilyannn ilyannn added Team:Security-Scalability Team label for Security Integrations Scalability Team bug Fixes for quality problems that affect the customer experience Feature:AutomaticImport labels Oct 30, 2024
@ilyannn
Copy link
Contributor Author

ilyannn commented Nov 8, 2024

Also mentioned in this email: https://groups.google.com/a/elastic.co/g/dev/c/2Es1xrDmvns/m/a7Skv2fKCAAJ

They also asked about the possibility of us supporting additional data formats like FIX TagValue Encoding commonly used with financial transactions.

@ilyannn ilyannn self-assigned this Nov 8, 2024
@ebeahan
Copy link
Member

ebeahan commented Nov 22, 2024

I'm seeing the same behavior for non-FIX format data as well:

kvProcessor:
  - kv:
      field: message
      field_split: " "
      value_split: =
      trim_key: " "
      trim_value: " "
      target_field: juniper_testing.srx

Errors:

errors:
  - message:
      - field [message] does not contain value_split [=]
  - message:
      - field [message] does not contain value_split [=]
  - message:
      - field [message] does not contain value_split [=]

What's more, an earlier form of the KV processor was actually more accurate for the original logs:

Original log sample

Nov 4 16:23:09 test RT_FLOW : RT_FLOW_SESSION_CREATE: session created 10.0.0.100/24065->10.0.0.100/768 icmp 10.0.0.100/24065->10.0.0.100/768 None None 1 alg-policy untrust trust 100000165 N/A(N/A) reth2.0 UNKNOWN UNKNOWN UNKNOWN
kvProcessor:
  - kv:
      field: message
      field_split: " "
      value_split: ->
      trim_key: " "
      trim_value: " "
      target_field: juniper_testing.srx
errors:
  - message:
      - field [message] does not contain value_split [->]
  - message:
      - field [message] does not contain value_split [->]
  - message:
      - field [message] does not contain value_split [->]

It didn't matter in either case for the KV process since the preceding Grok process is invalid and will never produce a message field to test the KV process against properly:

grokPattern: "%{MONTH:juniper_testing.srx.month}\\s+%{MONTHDAY:juniper_testing.srx.day}\\s+%{TIME:juniper_testing.srx.time}\\s+%{HOSTNAME:juniper_testing.srx.hostname}\\s+%{WORD:juniper_testing.srx.program}\\s*:\\s*%{GREEDYDATA:message}"

Also worth noting these errors above were produced using Anthropic Claude 3.5 v2.

If I move to Claude 3.5 v1, I still see errors but due to the Grok expression:

grokPattern: "%{SYSLOGTIMESTAMP:juniper_testing.srx.timestamp} %{HOSTNAME:juniper_testing.srx.hostname} %{WORD:juniper_testing.srx.program}:%{GREEDYDATA:message}"

From what I can tell, the mistake in the Grok is a single space between juniper_testing.srx.program and the colon, ::

Nov 4 16:23:09 test RT_FLOW : RT_FLOW_SESSION_CREATE

@ebeahan ebeahan changed the title [Automatic Import] FIX format [Automatic Import] Log Format step fails due to invalid processor config generation Nov 22, 2024
@ilyannn ilyannn removed their assignment Dec 18, 2024
@haetamoudi haetamoudi self-assigned this Dec 31, 2024
@haetamoudi
Copy link
Contributor

  1. FIX logs files have been added to the list of unsupported format by [Automatic Import] Restrict unsupported log formats #202994
    So now, instead of the standard error message we get
    Image

  2. I tested with the following logs:

  • Nov 4 16:23:09 test RT_FLOW : RT_FLOW_SESSION_CREATE: session created 10.0.0.100/24065->10.0.0.100/768 icmp 10.0.0.100/24065->10.0.0.100/768 None None 1 alg-policy untrust trust 100000165 N/A(N/A) reth2.0 UNKNOWN UNKNOWN UNKNOWN
  • 2025-01-07T10:15:00Z firewall: action=allowed src_ip=10.0.0.5 dst_ip=10.0.0.10 src_port=443 dst_port=8080 protocol=tcp policy=allow-https

with the following connectors:

  • anthropic.claude-3-opus-20240229-v1:0
  • anthropic.claude-3-5-sonnet-20240620-v1:0
    I could successfully create the pipelines and test them with the sample logs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:AutomaticImport Team:Security-Scalability Team label for Security Integrations Scalability Team
Projects
None yet
Development

No branches or pull requests

3 participants