-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid copy on Refit #6478
Avoid copy on Refit #6478
Conversation
898acdb
to
aa3bcff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your interest in LightGBM. It'll be at least a few days until someone's able to review this, as we're currently focused on finishing a release (#6439 (comment)).
Every CI run on this PR will require a maintainer manually approving it, because you've never contributed here before. Sorry for the inconvenience, but GitHub introduced this as a security measure a few years ago and we've decided to leave it enabled. We do occasionally receive malicious pull requests trying to use our CI resources 🙃 |
Thanks for the feedback! Some CI does appear to run, though, and some of it exhibited IO failures the first time around which I was trying to address 🤷 . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, looking forward to this performance improvement! I left a few comments, esp. regarding the changes to the application code
Co-authored-by: Oliver Borchert <[email protected]>
Co-authored-by: Oliver Borchert <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates, I think we're getting there! 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Thanks a lot for your contribution @cbourjau 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this!
I'll dismiss my prior blocking review so it doesn't block merging. But I would like @shiyu1994 or @guolinke to also approve before we merge this. They might remember a reason that this copy was used in #1124.
@borchero reviewed and all my concerns were addressed
src/application/application.cpp
Outdated
ncol = Common::StringToArray<int>(result_reader.Lines()[0], '\t').size(); | ||
} | ||
std::vector<int> pred_leaf; | ||
pred_leaf.reserve(nrow * ncol); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we use resize
instead of reserve
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution. This should be very helpful.
The C-API exposes
LGBM_BoosterRefit
which receives the predicted leaf indices as a flat buffer that is laid out in the shape ofnrow x ncol
. TheBoosting::RefitTree
function, on the other hand, expects these indices as a nested vector object (std::vector<std::vector<int>>
). Creating this nested object requires various additional allocations and amounts to an entire copy of the initial buffer doubling the memory requirements.This PR changes the API of
Boosting::RefitTree
to take a pointer to a flat buffer ofint32
s, just like the C-API, thus avoiding the copy. This assumes that this part of the API is not regarded as stable. The C-API remains unchanged.