Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grok should have a replace functionality like we have for mutate #103

Open
saurabh8585 opened this issue Dec 7, 2016 · 4 comments
Open

Comments

@saurabh8585
Copy link

Usecase

I have few custom regex patterns which looks for some sensitive information in the log messages like credit card number, social security number etc.

I have applied these patterns inside grok and matching each log message for regex's I wrote in a file inside patterns folder.

Log message which has a matching pattern would be added with a custom field named "Infosec_Pattern" with matching pattern values like "CCN, SSN" etc.

Logstash version 2.3.1

Below is the sample filter config

filter
{
  grok {
    patterns_dir => ["/logstash/patterns"]
    match => { "message" => "%{CCN}" }
    add_field => { "Infosec_Pattern" => "CCN" }
  }
  grok {
    patterns_dir => ["/logstash/patterns"]
    match => { "message" => "%{SSN}" }
    add_field => { "Infosec_Pattern" => "SSN" }
  }
}

This works perfect. Now what I want is:

Replace a matched string with some value like "XXXXXXXX" in the message since the matching string contains sensitive information.

In order to do this, I need to make use of mutate where I have to again find the pattern in log message and replace it with desired value using gsub.

Below is the sample filter config (with mutate section)

filter
{
  ... 
  ... ## Some groks (See above filter config for example)
  ... 
  mutate {
    remove_field => "tags"
    gsub => [
      "message","[0-9]{16}","XXXXXXXXXXXXX"    #### The regex pattern supposedly matches credit card no which has 16 digit
    ]
  }
}

Output after applying above sample config

Parsed log message without having mutate section looks like below:

{
            "message" => "Saurabh ccn is 5123456789012345",
           "@version" => "1",
         "@timestamp" => "2016-12-07T12:01:09.554Z",
               "host" => "d7231b98ec06",
    "Infosec_Pattern" => "CCN"
}

Parsed log message having mutate section looks like below:

{
            "message" => "Saurabh ccn is XXXXXXXXXXXXX",
           "@version" => "1",
         "@timestamp" => "2016-12-07T11:57:50.075Z",
               "host" => "d7231b98ec06",
    "Infosec_Pattern" => "CCN"
}

As we can clearly see, we need to match a pattern twice if I want to replace the matched string in the original message field.

I tried to use overwrite inside grok but that is not helping much as sensitive data can be present anywhere in the string. And also I would not be able to replace the data with some desired value like "XXXX" using overwrite.

Expectation

  1. Add a functionality in grok itself to replace matched string with some desired value.
    OR
  2. Add a functionality in mutate to include the custom regex pattern like we do in grok.

Option 1 seems to be a best fit for this.

@jordansissel
Copy link
Contributor

Grok is primarily for parsing, not modifying data. The mutate filter (since it does text replacement already), or a new filter, feels like a better place to implement this proposal.

@jordansissel
Copy link
Contributor

Otherwise, I am in favor of this feature.

@saurabh8585
Copy link
Author

Thanks @jordansissel for supporting this issue.

Since this issue interests you, I have 1 more point to make it more interesting.

Currently, we do write 1 custom regex pattern on each line like below.

../my_pattern_directory/my_pattern_file

CCN_MASTER [1-2]{16}
CCN_VISA [2-3]{15}
CCN_AMEX [3-4]{14}
CCN_MAESTRO [4-5]{13}

Inorder to apply above patterns on a log message, we need to write filter like something as shown below

filter
{
  grok {
    patterns_dir => ["/logstash/patterns"]
    match => { "message" => "%{CCN_MASTER}" }
    add_field => { "Infosec_Pattern_Found" => "CCN" }
  }
  grok {
    patterns_dir => ["/logstash/patterns"]
    match => { "message" => "%{CCN_VISA}" }
    add_field => { "Infosec_Pattern_Found" => "CCN" }
  }
}

As we can see, the grok count will increases as we have more no of patterns. Also, the "Infosec_Pattern_Found" field getting added redundantly here.

Proposed solution

Instead of identifying custom patterns individually, we can group them like below.

../my_pattern_directory/my_pattern_file

CCN
{
  MASTER [1-2]{16}
  VISA [2-3]{15}
  AMEX [3-4]{14}
  MAESTRO [4-5]{13}
}

And the corresponding filter looks something like below.

filter
{
  grok {
    patterns_dir => ["/logstash/patterns"]
    match => { "message" => "%{CCN}" }
    add_field => { "Infosec_Pattern_Found" => "CCN" }
  }
}

OR

filter
{
  grok {
    patterns_dir => ["/logstash/patterns"]
    match => { "message" => "%{CCN.MASTER}" }
    add_field => { "Infosec_Pattern_Found" => "CCN" }
  }
}

This way, we will achieve:

  • Flexibility of applying multiple similar/different type of regex at a single go
  • Getting rid of writing hundreds of grok w.r.t. individual regex patterns
  • Config file would be small and hence more readable and clear
  • Less error prone

Please do consider this point as well if it seems feasible. Let me know if we can track this altogether in a different ticket.

@jordansissel
Copy link
Contributor

You can do this today:

CCN_MASTER [1-2]{16}
CCN_VISA [2-3]{15}
CCN_AMEX [3-4]{14}
CCN_MAESTRO [4-5]{13}

# Create a pattern called CCN that matches any of the above:
CCN %{CCN_MASTER}|%{CCN_VISA}|%{CCN_AMEX}|%{CCN_MAESTRO}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants