Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instance registers the docker0 ip address #553

Open
nunofernandes opened this issue Jan 3, 2024 · 9 comments · May be fixed by #555
Open

Instance registers the docker0 ip address #553

nunofernandes opened this issue Jan 3, 2024 · 9 comments · May be fixed by #555

Comments

@nunofernandes
Copy link

Hello,

We have an onprem server (rocky linux 8) with SSM agent (amazon-ssm-agent-3.2.2016.0-1.x86_64).

At AWS Fleet Manager we have that instance registered with the ip address from docker0 (172.17.0.1):

image

It was working fine until we lost the dhcp for a few hours and now even after restarting the SSM agent, I always get the docker0's IP registered.

If I do an ifconfig docker0 down; systemctl restart amazon-ssm-agent.service; ifconfig docker0 up it works (registers the correct ip) but after some time, it gets back to the previous docker0 ip address registered in SSM.

I think it's the code at agent/platform/platform.go that is sorting the interfaces differently (guessing):

	if interfaces, err = net.Interfaces(); err == nil {
		interfaces = filterInterface(interfaces)
		sort.Sort(byIndex(interfaces))
		candidates := make([]net.IP, 0)

What would be the best option here (except rebooting the server)?

nunofernandes added a commit to nunofernandes/amazon-ssm-agent that referenced this issue Jan 11, 2024
@Aperocky
Copy link
Contributor

Aperocky commented Oct 22, 2024

Thanks for reaching out regarding this. We have recently restructured our interface ip reporting to avoid a quadratic computation expansion due to golang syscall behavior on Linux dumping the entire routetable with each interface.Addrs(). This leads to escalating CPU usage for system with many network interfaces.

This does not address this problem, as the order of the interfaces returned are decided by the OS. However, we do want to verify if this still exist in agent version 3.3.1142.0, and if it does, we can evaluate further on a similar fix as your PR.

@nunofernandes
Copy link
Author

Hello.. Tried with the latest one

# yum update https://s3.amazonaws.com/ec2-downloads-windows/SSMAgent/latest/linux_amd64/amazon-ssm-agent.rpm
....
Upgraded:
  amazon-ssm-agent-3.3.987.0-1.x86_64

# rpm -qi amazon-ssm-agent
Name        : amazon-ssm-agent
Version     : 3.3.987.0
Release     : 1
Architecture: x86_64
Install Date: 2024-10-23T17:24:47 CEST
Group       : Amazon/Tools
Size        : 127685837
License     : Apache License, Version 2.0
Signature   : RSA/SHA1, 2024-09-23T12:51:02 CEST, Key ID bc1f495c97dd04ed
Source RPM  : amazon-ssm-agent-3.3.987.0-1.src.rpm
Build Date  : 2024-09-23T11:56:15 CEST
Build Host  : build.amazon.com
Relocations : (not relocatable)
Packager    : Amazon.com, Inc. <http://aws.amazon.com>
Vendor      : Amazon.com
URL         : http://docs.aws.amazon.com/ssm/latest/APIReference/Welcome.html
Summary     : Manage EC2 Instances using SSM APIs
Description :
This package provides Amazon SSM Agent for managing EC2 Instances using SSM APIs

That is not the version you said: 3.3.1142.0. Waiting for that one to land on the RPM repo/url. With the version available, it still happens:

image

Once that version lands on the repo, I can try it.. Do you know when that version will be available?

@Aperocky
Copy link
Contributor

The version is deploying through regions now and will reach global sometimes next week, for testing purposes you can receive the latest version here:

$ sudo yum update https://s3.eu-north-1.amazonaws.com/amazon-ssm-eu-north-1/latest/linux_amd64/amazon-ssm-agent.rpm
Last metadata expiration check: 1 day, 19:49:40 ago on Mon Oct 21 19:52:51 2024.
amazon-ssm-agent.rpm                                                                              8.9 MB/s |  24 MB     00:02
Dependencies resolved.
==================================================================================================================================
 Package                            Architecture             Version                         Repository                      Size
==================================================================================================================================
Upgrading:
 amazon-ssm-agent                   x86_64                   3.3.1142.0-1                    @commandline                    24 M

Transaction Summary
==================================================================================================================================
Upgrade  1 Package

@nunofernandes
Copy link
Author

nunofernandes commented Oct 23, 2024

Hello,

Just tested that new version and I still get the ip address from docker0:

image

So, the issue is still there :(

@Aperocky
Copy link
Contributor

Aperocky commented Oct 23, 2024

I see, this looks like we need a dedicated way to filter this out if we decide to go there, when this feature was first designed, we did not define the exact interface to return. We will evaluate potential changes and/or documentation to define this feature. One of the first thing that comes to mind is to go for default NI but the ways to capture that would be distinct across the different platforms we support, and since golang library does not have that capability out of the box, we have to implement potentially unstable methods for different OS as they evolve. That would need to be evaluated further before we take it up.

@nunofernandes
Copy link
Author

That is why I sent the patch #555 that would allow the user to exclude certain interfaces that they know aren't meant to be used.
Let me know if that is the route forward and if so, I can rebase the patch with the current codebase.

@Aperocky
Copy link
Contributor

Aperocky commented Oct 24, 2024

Unfortunately that route is blocked now as we do not filter via interface anymore, the reason for that being golang syscall behavior dumping the entire routetable when looking up the property of a single interface. This means for hosts with large number of interfaces (e.g. high number of containers). The CPU consumption of this behavior becomes quadratic if we loop over and filter interfaces, and it is very important to us that we keep our resource consumption low.

@nunofernandes
Copy link
Author

Well.. what about the following scheme (haven't seen the current codebase so, I'm just in suggestion mode here):

  1. fetches the ip addresses from the EXCLUDED interfaces (shouldn't be that many interfaces to loop and getting the ip address of a know interface should be faster (maybe without looping through all interfaces and causing the cpu consumption)).
  2. use the current code to find the ip address that would be announced to SSM
  3. if the ip found in step2 matches one of the ones that are part of the EXCLUDED list, then continue to the next ip

Would that work?

@Aperocky
Copy link
Contributor

We are considering a solution of potentially using default internet route, however this may have some backwards compatibility concerns and we are discussing internally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants