Added an arena allocator for draw calls vertices #805
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The Issue
Altering textures in
draw_texture(...)
calls causes high memory usagehere is a minimal reproducible example made by Calandiel and Luna from the discord server:
texture size is not the issue
nor it's the texture loading, as commenting the
draw_texture()
calls returns memory usage to normalcalling
draw_texture()
3200 times isn't the issue either, as calling them using the same texture doesn't result in such high memory usagethe issue is altering textures when calling
draw_texture()
Cause
macroquad does not batch draw calls when the texture changes (in
QuadGl::geometry()
), which causes it to create a new DrawCall everytimedraw_texture()
is called, each DrawCall allocate a fixed size buffer for it's vertices (10k vertices by default, 40 bytes per vertex, that's 400k per draw call!), anddraw_texture()
only needs 4 verticesSolution
one way of fixing this would be reducing the space allocated for vertices, and only increase it when needed, the problem with this approach is that'd result in too many reallocations which isn't ideal, specially with the way macroquad implements draw calls batching
instead, as suggested by @InnocentusLime in the quads discord server, we could have one big buffer that'd store all the vertices from all the draw calls, and each DrawCall stores an offset of that buffer as well as a length, that way draw calls vertices can be tightly packed together and only take as much space as they need, i.e. an arena allocator
Implementation
my implementation uses a few small buffers (boxed slices) instead of one big buffer, so if more space is needed at any point, we can just allocate another small buffer instead of having to reallocate the entire thing (if it was a single growable buffer).
when a buffer doesn't have enough space to allocate the required slice, the next buffer will be used, and so on.
and when clearing the arena, it'd just drop back to using the first buffer
draw calls initially allocate the entire 10,000 vertices, and when it's time to start putting vertices into another DrawCall, the current DrawCall will call
realloc()
to free any unused space.as the call to realloc is guaranteed to reduce the allocated space, no copying is needed, and only some numbers in the ArenaAccesor and Arena are changed
Drawbacks
Compatibility
this shouldn't break anything as it keeps the original max_vertices
one thing that might is addition of
self.draw_calls_count = 0
inQuadGl::update_drawcall_capacity()
, although that seems unlikelyeitherway it should be heavily tested, especially it's impact on performance and whether the cpu for memory tradeoff is worth it