Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid copy on Refit #6478

Merged
merged 17 commits into from
Jul 10, 2024
Merged

Avoid copy on Refit #6478

merged 17 commits into from
Jul 10, 2024

Conversation

cbourjau
Copy link
Contributor

@cbourjau cbourjau commented Jun 12, 2024

The C-API exposes LGBM_BoosterRefit which receives the predicted leaf indices as a flat buffer that is laid out in the shape of nrow x ncol. The Boosting::RefitTree function, on the other hand, expects these indices as a nested vector object (std::vector<std::vector<int>>). Creating this nested object requires various additional allocations and amounts to an entire copy of the initial buffer doubling the memory requirements.

This PR changes the API of Boosting::RefitTree to take a pointer to a flat buffer of int32s, just like the C-API, thus avoiding the copy. This assumes that this part of the API is not regarded as stable. The C-API remains unchanged.

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your interest in LightGBM. It'll be at least a few days until someone's able to review this, as we're currently focused on finishing a release (#6439 (comment)).

@jameslamb
Copy link
Collaborator

[trigger ci]

Every CI run on this PR will require a maintainer manually approving it, because you've never contributed here before. Sorry for the inconvenience, but GitHub introduced this as a security measure a few years ago and we've decided to leave it enabled. We do occasionally receive malicious pull requests trying to use our CI resources 🙃

@cbourjau
Copy link
Contributor Author

Thanks for the feedback! Some CI does appear to run, though, and some of it exhibited IO failures the first time around which I was trying to address 🤷 .

Copy link
Collaborator

@borchero borchero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, looking forward to this performance improvement! I left a few comments, esp. regarding the changes to the application code

src/application/application.cpp Outdated Show resolved Hide resolved
src/application/application.cpp Outdated Show resolved Hide resolved
src/application/application.cpp Outdated Show resolved Hide resolved
src/boosting/gbdt.cpp Outdated Show resolved Hide resolved
include/LightGBM/boosting.h Outdated Show resolved Hide resolved
@cbourjau cbourjau requested a review from borchero June 17, 2024 15:33
Copy link
Collaborator

@borchero borchero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates, I think we're getting there! 😁

src/application/application.cpp Outdated Show resolved Hide resolved
src/application/application.cpp Outdated Show resolved Hide resolved
src/application/application.cpp Show resolved Hide resolved
src/application/application.cpp Outdated Show resolved Hide resolved
src/application/application.cpp Outdated Show resolved Hide resolved
@borchero
Copy link
Collaborator

Thanks @cbourjau! AppVeyor CI will be fixed by #6490, that only leaves the linting job 😁

Copy link
Collaborator

@borchero borchero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks a lot for your contribution @cbourjau 🚀

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!

I'll dismiss my prior blocking review so it doesn't block merging. But I would like @shiyu1994 or @guolinke to also approve before we merge this. They might remember a reason that this copy was used in #1124.

@jameslamb jameslamb dismissed their stale review July 9, 2024 02:33

@borchero reviewed and all my concerns were addressed

ncol = Common::StringToArray<int>(result_reader.Lines()[0], '\t').size();
}
std::vector<int> pred_leaf;
pred_leaf.reserve(nrow * ncol);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we use resize instead of reserve?

Copy link
Collaborator

@shiyu1994 shiyu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution. This should be very helpful.

@jameslamb jameslamb merged commit 1886bf5 into microsoft:master Jul 10, 2024
41 checks passed
@cbourjau cbourjau deleted the avoid-copy-refit branch July 11, 2024 10:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants