Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[filebeat] Elasticsearch state storage for httpjson and cel inputs #41446

Open
wants to merge 59 commits into
base: main
Choose a base branch
from

Conversation

aleksmaus
Copy link
Contributor

@aleksmaus aleksmaus commented Oct 24, 2024

Proposed commit message

[filebeat] Elasticsearch state storage for httpjson input

This is a POC for Elasticsearch as State Store Backend for Security Integrations for Agentless solution.

The scope of this change was narrowed down to supporting only httpjson inputs in order to support Okta integration for the initial release. All the other integrations inputs still use the file storage as before.
This is a short term solution for the state storage for k8s environment.

This is the first cut and the details can change depending on the feedback.

Current feature currently could be enabled AGENTLESS_ELASTICSEARCH_STATE_STORE_ENABLED, to be decided how this would be configurable in k8s.

This change currently contains the hacky approach to the AGENTLESS_ELASTICSEARCH_APIKEY overwrite. This allows to the user to provide the ApiKey with elevated permissions that are required in order to be able to create/write/read the state index per input. THIS IS FOR DEVELOPMENT/TESTING ONLY. REMOVE BEFORE THE MERGE.

The existing code relied on the inputs state storage to be fully configurable before the main beat managers runs. The change delays the configuration of httpjson input to the time when the actual configuration is received from the Agent.

There is an assumption that the index template for the state storage indices is already in place before the storage is used

PUT _index_template/agentless_state_template
{
  "index_patterns": [
    "agentless-state-*"
  ],
  "priority": 300,
  "template": {
    "mappings": {
      "properties": {
        "v": {
          "type": "object",
          "enabled": false
        },
        "updated_at": {
          "type": "date",
          "format": "strict_date_optional_time||epoch_millis"
        }
      }
    },
    "settings": {
      "number_of_shards": 1
    }
  }
}

Example of the state storage index content for Okta integration:

{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "agentless-state-httpjson-okta.system-028ecf4b-babe-44c6-939e-9e3096af6959",
        "_id": "httpjson::httpjson-okta.system-028ecf4b-babe-44c6-939e-9e3096af6959::https://dev-36006609.okta.com/api/v1/logs",
        "_seq_no": 39,
        "_primary_term": 1,
        "_score": 1,
        "_source": {
          "v": {
            "ttl": 1800000000000,
            "updated": "2024-10-24T20:21:22.032Z",
            "cursor": {
              "published": "2024-10-24T20:19:53.542Z"
            }
          }
        }
      }
    ]
  }
}

The naming convention for all state store is agentless-state-<input id>, since the expectation for agentless we would have only one agent per policy and the agents are ephemeral.

Currently in order to run the agent with Elasticsearch state storage a couple of environment variables would be required:

sudo AGENTLESS_ELASTICSEARCH_STATE_STORE_ENABLED=1 AGENTLESS_ELASTICSEARCH_APIKEY=xxxxxxxx-xvpDXfB:jVMRsW7SRIxxxxxxxxx ./elastic-agent -e

where the ApiKey in the

DEPENDENCIES / TODOS:

  • Approval of teams for this approach
  • Kibana (?) side change is required for the agentless-state index template boostrapping
  • Kibana or the intergration package (or both) change is required in order to include the permissions for agentless-state- with the Elasticsearch ApiKey (Remove the hack). I suspect that Kibana fleet code could be modified to recognize agentless supporting integration and include the proper index name for the agentless-state for the ApiKey permissions.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Disruptive User Impact

The change should have no impact, and without the feature enabled the filebeat should work as before using the file system storage for the state.

@aleksmaus aleksmaus self-assigned this Oct 24, 2024
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Oct 24, 2024
@aleksmaus aleksmaus added the Team:Security-Deployment and Devices Deployment and Devices Team in Security Solution label Oct 24, 2024
Copy link
Contributor

mergify bot commented Oct 24, 2024

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @aleksmaus? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit

Copy link
Contributor

mergify bot commented Oct 24, 2024

backport-8.x has been added to help with the transition to the new branch 8.x.
If you don't need it please use backport-skip label and remove the backport-8.x label.

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Oct 24, 2024
@mergify mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Oct 24, 2024
@aleksmaus aleksmaus changed the title [filebeat] Elasticsearch state storage for httpjson input [filebeat] Elasticsearch state storage for httpjson and cel inputs Oct 30, 2024
@aleksmaus
Copy link
Contributor Author

@belimawr @cmacknz (or whoever wants/have time to be involved)
I need your feedback on this draft, if this approach is something that we could eventually merge (the ApiKey workaround will be removed once we adjust the kibana fleet).
I think this is an ok solution given the circumstances:

  1. This is fully backwards compatible. If the feature is not enabled, everything would work as before.
  2. The only inputs that are enabled for Elasticsearch backed state storage are the httpjson and cel. We only enabling the limited number of integration relying on httpjson or cel inputs for the first release.
  3. The state initialization for the inputs is delayed until we get the configuration only if the feature is enabled for the input.
  4. The agent logs monitoring will still use the local storage, since we agreed that loosing the agent log when the pod relocated is acceptable.

@cmacknz
Copy link
Member

cmacknz commented Nov 1, 2024

@leehinman I'd appreciate a review here to make sure this can co-exist with Beats receivers in agent since that would be the long term way we plan to run agentless inputs.

@cmacknz cmacknz requested a review from leehinman November 1, 2024 20:23
filebeat/beater/filebeat.go Outdated Show resolved Hide resolved

// List of input types Elasticsearch state store is enabled for
var esTypesEnabled = map[string]void{
"httpjson": {},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be configuration instead of in the code, maybe another env var?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure can do. Something like this?
AGENTLESS_ELASTICSEARCH_STATE_STORE_INPUT_TYPES=httpjson,cel

@cmacknz
Copy link
Member

cmacknz commented Jan 20, 2025

/test

type UnsubscribeFunc func()

type Notifier struct {
mx sync.RWMutex
Copy link
Contributor

@orestisfl orestisfl Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mx sync.RWMutex
mx sync.Mutex

Another perhaps overly cautious observation: The only purpose of a RWMutex is to allow calling Notify() concurrently. However, I don't see the benefit of that: Notify() should finish quite fast since it just starts goroutines and more importantly, concurrent calls of Notify() lead to a race condition between the same callback which could mean that the wrong item gets stored due to timing issues.

Changing to a simple Mutex doesn't fix the race condition but there's no reason for that RW there.

orestisfl and others added 12 commits January 21, 2025 16:47
The only purpose of a `RWMutex` is to allow calling `Notify()`
concurrently. However, I don't see the benefit of that: `Notify()`
should finish quite fast since it just starts goroutines and more
importantly, concurrent calls of `Notify()` lead to a race condition
between the same callback which _could_ mean that the wrong item gets
stored due to timing issues.

Changing to a simple `Mutex` doesn't fix the race condition but there's
no reason for that `RW` there.
Copy link
Contributor

mergify bot commented Jan 22, 2025

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b poc/es_state_store upstream/poc/es_state_store
git merge upstream/main
git push upstream poc/es_state_store

@cmacknz
Copy link
Member

cmacknz commented Jan 22, 2025

/test

@orestisfl orestisfl mentioned this pull request Jan 23, 2025
6 tasks
@orestisfl
Copy link
Contributor

/test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.x Automated backport to the 8.x branch with mergify enhancement Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team Team:Security-Deployment and Devices Deployment and Devices Team in Security Solution
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants