Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect L1 cache size at compile time #419

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

jserv
Copy link
Contributor

@jserv jserv commented Jun 20, 2021

Common cache line sizes are 32, 64 and 128 bytes. On x86_64 the standard
cache line size is 64B. Even though this is not architecturally required,
all the x86_64 implementations stick to it. Some AArch64 processors also
follow the x86_64 style with 64B cachelines. However, on Apple M1
devices, the underlying hardware is using a 128B cache line size. Quote
from Apple Developer documentation [1]:

Some features of Apple silicon are decidedly different than those of
Intel-based Mac computers, and may impact your code if you don't fetch
them dynamically. These features include:

  • Cache line sizes are different. Fetch the hw.cachelinesize setting using sysctl.

M1 cache lines are double of what is commonly used by x86_64 and other
Arm implementation. The cache line sizes for Arm depend on implementations,
not architectures. For example, TI AM57x (Cortex-A15) uses 64B cache
line while TI AM437x (Cortex-A9) uses 32B cache line. And, there are
even Arm implementations with cache line sizes configurable at boot time.

This patch attempts to detect L1 cache size at compile time. For Aarch64
hosts, the build process would collect system information and determine
L1 cache line size. At present, both macOS and Linux are supported. For
Arm targets, the software packages are usually cross-compiled, and
developers should specify the appropriate MI_CACHE_LINE setting in
advance.

64B is the default cache line size if none of the above is able to set.

[1] https://developer.apple.com/documentation/apple-silicon/addressing-architectural-differences-in-your-macos-code

Common cache line sizes are 32, 64 and 128 bytes. On x86_64 the standard
cache line size is 64B. Even though this is not architecturally required,
all the x86_64implementations stick to it. Some AArch64 processors also
follow the x86_64 style with 64B cachelines. However, on Apple M1
devices, the underlying hardware is using a 128B cache line size. Quote
from Apple Developer documentation [1]:
  "Some features of Apple silicon are decidedly different than those of
   Intel-based Mac computers, and may impact your code if you don't fetch
   them dynamically. These features include:
   * Cache line sizes are different. Fetch the hw.cachelinesize setting
     using sysctl."

M1 cache lines are double of what is commonly used by x86_64 and other
Arm implementation. The cache line sizes for Arm depend on implementations,
not architectures. For example, TI AM57x (Cortex-A15) uses 64B cache
line while TI AM437x (Cortex-A9) uses 32B cache line. And, there are
even Arm implementations with cache line sizes configurable at boot time.

This patch attempts to detect L1 cache size at compile time. For Aarch64
hosts, the build process would collect system information and determine
L1 cache line size. At present, both macOS and Linux are supported. For
Arm targets, the software packages are usually cross-compiled, and
developers should specify the appropriate MI_CACHE_LINE setting in
advance.

64B is the default cache line size if none of the above is able to set.

[1] https://developer.apple.com/documentation/apple-silicon/addressing-architectural-differences-in-your-macos-code
@jserv jserv changed the base branch from master to dev June 20, 2021 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant