Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added an arena allocator for draw calls vertices #805

Closed
wants to merge 1 commit into from

Conversation

yui-915
Copy link
Contributor

@yui-915 yui-915 commented Sep 12, 2024

The Issue

Altering textures in draw_texture(...) calls causes high memory usage
here is a minimal reproducible example made by Calandiel and Luna from the discord server:

use macroquad::prelude::*;
#[macroquad::main("Circle!")]
async fn main() {
    let mut v: Vec<Texture2D> = vec![];
    for _ in 0..256 {
        v.push(load_texture("ball.png").await.unwrap());
    }
    loop {
        for i in 0..40*80 {
            let j = i % 256;
            draw_texture(&v[j], 0.0, 0.0, WHITE);
        }
        next_frame().await;
    }
}

texture size is not the issue
nor it's the texture loading, as commenting the draw_texture() calls returns memory usage to normal
calling draw_texture() 3200 times isn't the issue either, as calling them using the same texture doesn't result in such high memory usage
the issue is altering textures when calling draw_texture()

Cause

macroquad does not batch draw calls when the texture changes (in QuadGl::geometry()), which causes it to create a new DrawCall everytime draw_texture() is called, each DrawCall allocate a fixed size buffer for it's vertices (10k vertices by default, 40 bytes per vertex, that's 400k per draw call!), and draw_texture() only needs 4 vertices

Solution

one way of fixing this would be reducing the space allocated for vertices, and only increase it when needed, the problem with this approach is that'd result in too many reallocations which isn't ideal, specially with the way macroquad implements draw calls batching

instead, as suggested by @InnocentusLime in the quads discord server, we could have one big buffer that'd store all the vertices from all the draw calls, and each DrawCall stores an offset of that buffer as well as a length, that way draw calls vertices can be tightly packed together and only take as much space as they need, i.e. an arena allocator

Implementation

my implementation uses a few small buffers (boxed slices) instead of one big buffer, so if more space is needed at any point, we can just allocate another small buffer instead of having to reallocate the entire thing (if it was a single growable buffer).

when a buffer doesn't have enough space to allocate the required slice, the next buffer will be used, and so on.
and when clearing the arena, it'd just drop back to using the first buffer

draw calls initially allocate the entire 10,000 vertices, and when it's time to start putting vertices into another DrawCall, the current DrawCall will call realloc() to free any unused space.
as the call to realloc is guaranteed to reduce the allocated space, no copying is needed, and only some numbers in the ArenaAccesor and Arena are changed

Drawbacks

  • this only solves the problem on the cpu side, as the full length buffers are still being allocated on the gpu side
  • increased indirection and overhead due to the use of Rc and RefCell (I could be possible to implement it without those, so it should be considered more than this pr)
  • the arena allocator doesn't make perfect use of the memory it allocated, take this as an example:
    • initialize an arena allocator with buffer_size=10
    • allocate a slice of length 3, which would be on buffer 1
    • allocate a slice of length 8, which doesn't fit on buffer 1 anymore, so it'd be spaced on buffer 2
    • allocate a slice of length 7, which would fit on buffer 1, but the allocator only looks at buffer 2, and since it doesn't have enough space, it'll be placed on buffer 3
  • the allocator itself it kinda unsafe, as ArenaAccessor doesn't own it's underlying data, nor have a lifetime associated with it, so it can still be used after clearing the arena (it shouldn't !), this shouldn't cause a problem in the current usage as all ArenaAccessors are reinitialized after a clear call at some point before being used

Compatibility

this shouldn't break anything as it keeps the original max_vertices
one thing that might is addition of self.draw_calls_count = 0 in QuadGl::update_drawcall_capacity(), although that seems unlikely

eitherway it should be heavily tested, especially it's impact on performance and whether the cpu for memory tradeoff is worth it

@not-fl3
Copy link
Owner

not-fl3 commented Sep 12, 2024

draw calls initially allocate the entire 10,000 vertices, and when it's time to start putting vertices into another DrawCall, the current DrawCall will call realloc() to free any unused space.

I like the idea and I can see that it works!

The problem here - with reducing 10000 to like 8000 we save pretty much the same amount of memory on the example from the PR. Not ever mentioning that with the amount of draw calls when memory becomes an issue we arelady lost all the fps.

In other words, I do not really see those cpu buffers as the bottleneck. Said this, if this PR solves a problem for you - it might solve it for other people too, lets keep it open!

@profan
Copy link
Contributor

profan commented Sep 29, 2024

Does this overlap with #807?

It looks like some part of this has already been implemented there, but maybe they are actually different?

@InnocentusLime
Copy link
Contributor

Does this overlap with #807?

It looks like some part of this has already been implemented there, but maybe they are actually different?

They are very similiar, because me and yui-915 were solving the same problem. I suppose what I did there was more or less a strange Arena impl.

My version simply replaced each DrawCall owning its own Vec with each DrawCall slicing a big Vec instead.

@not-fl3 not-fl3 closed this Sep 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants