Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[native] Add LinuxMemoryChecker check/warning to ensure system-mem-limit-gb is reasonably set #24149

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

minhancao
Copy link
Contributor

@minhancao minhancao commented Nov 26, 2024

Description

Add LinuxMemoryChecker check and warning to ensure system-memory-gb < system-mem-limit-gb < actual total memory capacity.

For cgroup v1:
Set actual total memory to be the smaller number between /proc/meminfo and memory.limit_in_bytes

For cgroup v2:
Set actual total memory to be the smaller number between /proc/meminfo and memory.max
If memory.max contains "max" string, then look at /proc/meminfo for the MemTotal, otherwise use the value in memory.max.

VELOX_CHECK_LT(system-mem-limit-gb, actual total memory capacity):

system-mem-limit-gb is higher than the actual total memory capacity. Expected: system-mem-limit-gb < actual total memory capacity.

Warning to output to worker's log:

system-mem-limit-gb is smaller than system-memory-gb. Expected: system-mem-limit-gb >= system-memory-gb.

Motivation and Context

Impact

Test Plan

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== NO RELEASE NOTE ==

@minhancao minhancao self-assigned this Nov 26, 2024
@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Nov 26, 2024
@prestodb-ci prestodb-ci requested review from a team, psnv03 and pramodsatya and removed request for a team November 26, 2024 02:04
@minhancao minhancao marked this pull request as ready for review November 26, 2024 02:07
@minhancao minhancao requested a review from a team as a code owner November 26, 2024 02:07
@minhancao minhancao changed the title [native] Add LinuxMemoryChecker warnings to ensure system-memory-gb < system-mem-limit-gb < actual total memory capacity [native] Add LinuxMemoryChecker warnings to ensure system-mem-limit-gb is reasonably set Nov 26, 2024
@minhancao minhancao force-pushed the linuxmemorychecker_mem_limit_check branch from 4478ae1 to 15f55bb Compare November 26, 2024 02:29
@minhancao minhancao changed the title [native] Add LinuxMemoryChecker warnings to ensure system-mem-limit-gb is reasonably set [native] Add LinuxMemoryChecker check/warning to ensure system-mem-limit-gb is reasonably set Nov 26, 2024
@minhancao minhancao force-pushed the linuxmemorychecker_mem_limit_check branch from 15f55bb to 7646600 Compare November 26, 2024 06:06
Copy link
Contributor

@czentgr czentgr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test with fake files again just like we did with the original tests for this class?
That way we can try the "max" value for cgv2, and gigantic values and reasonable values. Basically testing the various situations we saw when investigating this.

@minhancao minhancao force-pushed the linuxmemorychecker_mem_limit_check branch 2 times, most recently from 8da401b to 4ae2cee Compare December 3, 2024 20:08
Copy link
Contributor

@pramodsatya pramodsatya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @minhancao, could you please squash the commits?

std::string statFile_;
std::string memInfoFile_ = "/proc/meminfo";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: const std::string kMemInfoFile_

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has to stay as a non-const variable since I need to change its path to point to the meminfo test file when I am running the LinuxMemoryCheckerTests.

@minhancao minhancao force-pushed the linuxmemorychecker_mem_limit_check branch from 85b3b9d to dab2335 Compare December 13, 2024 00:00
Copy link
Collaborator

@majetideepak majetideepak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is ideal if we can avoid checking in data files for testing.
We only need a few fields from the file for testing.
Can we write these required fields to a temporary file as part of the testing?

VELOX_CHECK_LE(
config_.systemMemLimitBytes,
getActualTotalMemory(),
"system-mem-limit-gb is higher than the actual total memory capacity.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add the actual numbers that were found if this fails? Especially because the total memory capacity doesn't come from the config but is determined.

// For cgroup v1, memory.limit_in_bytes can default to a really big numeric
// value in bytes like 9223372036854771712 to represent that
// memory.limit_in_bytes is not set to a value. The default value here is
// set to PAGE_COUNTER_MAX, which is LONG_MAX/PAGE_SIZE on 64-bit platform.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "on the 64-bit platform".

@@ -0,0 +1 @@
9223372036854771712
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, lets not add example files but generate the content in memory.
The logic relies on reading files. So we could write to the temp dir path which is used in other tests and use those files for the test.

@minhancao minhancao force-pushed the linuxmemorychecker_mem_limit_check branch 2 times, most recently from 71c9fa9 to 89a50a8 Compare January 16, 2025 23:53
@minhancao minhancao force-pushed the linuxmemorychecker_mem_limit_check branch from 89a50a8 to 163880b Compare January 22, 2025 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
from:IBM PR from IBM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants