Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification on the Purpose of Red-Black Trees in Box64 #2039

Open
devarajabc opened this issue Nov 17, 2024 · 4 comments
Open

Clarification on the Purpose of Red-Black Trees in Box64 #2039

devarajabc opened this issue Nov 17, 2024 · 4 comments

Comments

@devarajabc
Copy link
Contributor

devarajabc commented Nov 17, 2024

I have noticed that in custommem.c, memory ranges mapped by mmap are simultaneously recorded and removed in multiple red-black trees.
Each node encapsulate a memory range along with detailed information about that range.
My question pertains to the necessity of using three distinct red-black tree nodes to record the same memory range.
Is this approach taken solely to maintain three separate pieces of information related to each memory range?

If that's the case, could a bitwise operation be implemented to consolidate the information, potentially reducing the number of trees required?

@ptitSeb
Copy link
Owner

ptitSeb commented Nov 17, 2024

While mapmem and mapallmem could certainly be fused without too much impact, I think memprot should still be separated because it's changed often, and on a fine granularity. It will be highly fragmented.

On the the other hand, having everything in the same map could make gatthering all needed information on a address space in less calls, but that would require some changes in the code.

For now, everything is separate because it's easier to manipulate that way.

@devarajabc
Copy link
Contributor Author

I've noticed that the functions rb_get and rb_get_end are frequently used in Box64, for example in SDL2 functions, as demonstrated in projects like mado :

Window Event rb_set rb_unset rb_get rb_get_end rb_get_righter
No Action (Only Close Window) 6261 332 37695 14404 6
Drag Window (Around the Border Three Times) 6568 342 45753 15068 6
Mouse Click (Ten Times) 6569 342 40242 15067 6
Mouse Move (Around the Border Three Times) 6420 338 42496 14735 6
Calculator Operations (Three Times) 6627 343 43870 15185 6

However, I've observed that the size of a RB-tree node is too large that it cannot fit more than one node in a cache line, which makes it cache-unfriendly.
To enhance performance, I am considering modifying the node structure.

Before proceeding, I would like to clarify the purpose of using Red-Black Trees in Box64.
From my observations, it seems that Red-Black Trees are used primarily to manage the mapped memory ranges. When a range is mapped, it is either added as a new node or extends an existing node in the red-black trees ( mapmem, mapallmem, and memprot). Upon node creation, the function rbtreeMalloc maps a range of memory for the node and allocates additional memory for two blockmark_t structures, which define relationships between different memory blocks. This mapped range is also added to the read-block tree.
Could you please confirm if my understanding is correct and provide further details if there are additional considerations or functionalities of Red-Black Trees in Box64 that I should be aware of?
Thank you so much for your assistance.

@rajdakin
Copy link
Collaborator

rajdakin commented Nov 19, 2024

The main functionality of the red-black trees is to provide a fast, cheap way to map ranges of addresses to a number and acccessing by providing an address in this range. The only functionalities required are:

  • Data structure creation/destruction;
  • Query an address, get an uint32_t (rb_get);
  • Query the end of the range containing a given address (rb_get_end);
  • Set an address range (rb_set);
  • Unset an address range (rb_unset);
  • Query the first address after all set addresses (rb_get_righter).
    There is also print_rbtree which allows for easy-ish debugging.

The mental model is:

Address: 0        123       456       789      1240    (end)
         +---------+---------+---------+---------+---------+
         | #unset# |    1    | #unset# |    2    | #unset# |
         +---------+---------+---------+---------+---------+
get 200:                ^ 1
get_end 200:                 ^ 456
get_end 100:       ^ 123
rb_get_righter:                                  ^ 1240

The exact data structure itself is not really important. In fact, before the current implementation of red-black trees, it used to be (nested) arrays, but these were too memory expensive. (However, note that rb_(un)set may intersect with range boundaries in a lot of ways.)

EDIT: Also, rbtreeMalloc only allocates space for a new node. rbtreeMalloc is in fact just #defined to customMalloc since the functions may be called in signal handlers, so the functions must also be signal-safe, and an exception may be raised in eg rb_set, so these functions should also be reentrant (though the current implementation is not and seems to work well enough).

@devarajabc
Copy link
Contributor Author

devarajabc commented Nov 21, 2024

Thanks for your explanation.
Yes, the size of an rb-tree node is indeed 48 bytes, which does not include the additional memory required for two blockmark_t structures.

But I don't think the exact data structure is unimportant.
Since common cache line sizes are 64 bytes, reducing the size of the node structure can increase the number of nodes that fit in a cache line, making it more cache-friendly.

To achieve this, I want to make the following modifications:

  1. Fuse mapmem and mapallmem to reduce the total number of nodes (or explore other ways to reduce the number of red-black trees).
  2. Remove the parent pointer, similar to the red-black tree in jemalloc, which can save 8 bytes in the node structure.
  3. Use the LSB of the left pointer to store meta (the actual pointer is left & ~1UL), which can save 1 byte.

However, despite the aforementioned methods, the size of the node remains too large for the cache line. Could you please advise me on any strategies I can employ to reduce the size of the node?

Additionally, what is the difference between a dynablock (native instruction block) and a block? Finally, I want to confirm whether the data for mapallmem and mmapmem refers to the same thing (allocated). Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants