Added an arena allocator for draw calls vertices #805

yui-915 · 2024-09-12T09:40:11Z

The Issue

Altering textures in draw_texture(...) calls causes high memory usage
here is a minimal reproducible example made by Calandiel and Luna from the discord server:

use macroquad::prelude::*;
#[macroquad::main("Circle!")]
async fn main() {
    let mut v: Vec<Texture2D> = vec![];
    for _ in 0..256 {
        v.push(load_texture("ball.png").await.unwrap());
    }
    loop {
        for i in 0..40*80 {
            let j = i % 256;
            draw_texture(&v[j], 0.0, 0.0, WHITE);
        }
        next_frame().await;
    }
}

texture size is not the issue
nor it's the texture loading, as commenting the draw_texture() calls returns memory usage to normal
calling draw_texture() 3200 times isn't the issue either, as calling them using the same texture doesn't result in such high memory usage
the issue is altering textures when calling draw_texture()

Cause

macroquad does not batch draw calls when the texture changes (in QuadGl::geometry()), which causes it to create a new DrawCall everytime draw_texture() is called, each DrawCall allocate a fixed size buffer for it's vertices (10k vertices by default, 40 bytes per vertex, that's 400k per draw call!), and draw_texture() only needs 4 vertices

Solution

one way of fixing this would be reducing the space allocated for vertices, and only increase it when needed, the problem with this approach is that'd result in too many reallocations which isn't ideal, specially with the way macroquad implements draw calls batching

instead, as suggested by @InnocentusLime in the quads discord server, we could have one big buffer that'd store all the vertices from all the draw calls, and each DrawCall stores an offset of that buffer as well as a length, that way draw calls vertices can be tightly packed together and only take as much space as they need, i.e. an arena allocator

Implementation

my implementation uses a few small buffers (boxed slices) instead of one big buffer, so if more space is needed at any point, we can just allocate another small buffer instead of having to reallocate the entire thing (if it was a single growable buffer).

when a buffer doesn't have enough space to allocate the required slice, the next buffer will be used, and so on.
and when clearing the arena, it'd just drop back to using the first buffer

draw calls initially allocate the entire 10,000 vertices, and when it's time to start putting vertices into another DrawCall, the current DrawCall will call realloc() to free any unused space.
as the call to realloc is guaranteed to reduce the allocated space, no copying is needed, and only some numbers in the ArenaAccesor and Arena are changed

Drawbacks

this only solves the problem on the cpu side, as the full length buffers are still being allocated on the gpu side
increased indirection and overhead due to the use of Rc and RefCell (I could be possible to implement it without those, so it should be considered more than this pr)
the arena allocator doesn't make perfect use of the memory it allocated, take this as an example:
- initialize an arena allocator with buffer_size=10
- allocate a slice of length 3, which would be on buffer 1
- allocate a slice of length 8, which doesn't fit on buffer 1 anymore, so it'd be spaced on buffer 2
- allocate a slice of length 7, which would fit on buffer 1, but the allocator only looks at buffer 2, and since it doesn't have enough space, it'll be placed on buffer 3
the allocator itself it kinda unsafe, as ArenaAccessor doesn't own it's underlying data, nor have a lifetime associated with it, so it can still be used after clearing the arena (it shouldn't !), this shouldn't cause a problem in the current usage as all ArenaAccessors are reinitialized after a clear call at some point before being used

Compatibility

this shouldn't break anything as it keeps the original max_vertices
one thing that might is addition of self.draw_calls_count = 0 in QuadGl::update_drawcall_capacity(), although that seems unlikely

eitherway it should be heavily tested, especially it's impact on performance and whether the cpu for memory tradeoff is worth it

not-fl3 · 2024-09-12T13:53:16Z

draw calls initially allocate the entire 10,000 vertices, and when it's time to start putting vertices into another DrawCall, the current DrawCall will call realloc() to free any unused space.

I like the idea and I can see that it works!

The problem here - with reducing 10000 to like 8000 we save pretty much the same amount of memory on the example from the PR. Not ever mentioning that with the amount of draw calls when memory becomes an issue we arelady lost all the fps.

In other words, I do not really see those cpu buffers as the bottleneck. Said this, if this PR solves a problem for you - it might solve it for other people too, lets keep it open!

profan · 2024-09-29T13:26:34Z

Does this overlap with #807?

It looks like some part of this has already been implemented there, but maybe they are actually different?

InnocentusLime · 2024-09-29T15:39:59Z

Does this overlap with #807?

It looks like some part of this has already been implemented there, but maybe they are actually different?

They are very similiar, because me and yui-915 were solving the same problem. I suppose what I did there was more or less a strange Arena impl.

My version simply replaced each DrawCall owning its own Vec with each DrawCall slicing a big Vec instead.

Added an arena allocator for draw calls vertices

633129c

yui-915 mentioned this pull request Sep 12, 2024

Made drawcalls batching search depth adjustable #809

Closed

not-fl3 closed this Sep 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added an arena allocator for draw calls vertices #805

Added an arena allocator for draw calls vertices #805

yui-915 commented Sep 12, 2024

not-fl3 commented Sep 12, 2024

profan commented Sep 29, 2024

InnocentusLime commented Sep 29, 2024

Added an arena allocator for draw calls vertices #805

Added an arena allocator for draw calls vertices #805

Conversation

yui-915 commented Sep 12, 2024

The Issue

Cause

Solution

Implementation

Drawbacks

Compatibility

not-fl3 commented Sep 12, 2024

profan commented Sep 29, 2024

InnocentusLime commented Sep 29, 2024