[python-package] remove uses of deprecated NumPy random number generation APIs, require 'numpy>=1.17.0' #6468
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Contributes to #6454
This PR moves
lightgbm
closer to compatibility withnumpy
2.xruff
rules to linting config (ruff docs)np.random.{method}
convenience functions (which are deprecated) in favor of the correspondingnp.random.Generator.{method}
callsAnd to accompany that:
numpy>1.17.0
Notes for Reviewers
How I identified these
That
ruff
rule identified 95 instances of the following.Why these particular replacements?
Based mainly on the docstrings of those
numpy
functions (which recommend replacements). I tried to follow these rules:details (click me)
np.random.choice(a, size)
Chooses a random sample (with replacement by default) of size
size
from arraya
replacement:
np.random.Generator.choice()
np.random.normal(loc, scale)
Chooses from a normal distribution centered at
loc
with standard deviationscale
.replacement:
np.random.Generator.normal()
np.random.permutation()
Takes in a sequence and randomly permutes it.
replacement:
np.random.Generator.permutation()
np.random.rand()
Chooses random floats from a uniform distribution
[0, 1)
replacement:
np.random.Generator.uniform()
np.random.randint(low, high)
Chooses random integers from uniform distribution
[low, high)
.replacement:
np.random.Generator.integers()
np.random.randn()
Chooses random floats from the standard normal distribution
replacement:
np.random.Generator.standard_normal()
np.random.uniform(low, high)
Chooses random floats from uniform distribution
[low, high)
.replacement:
np.random.Generator.uniform()
np.random.seed(seed)
Resets the random number generator seed for the
np.random.RandomState
object. That affects all futurenp.random.{method}()
calls.replacement: explicitly creating a
np.random.Generator
with that seed, e.g.np.random.default_rng(seed)
Why mark this
breaking
and add a floor onnumpy
?The
np.random.Generator
class was introduced innumpy==1.17.0
.That release was in July 2019... almost 5 years ago. The highest Python version it had wheels for was Python 3.7 (https://pypi.org/project/numpy/1.17.0/#files).
scikit-learn
also has had a higher floor than this (numpy>=1.17.3
) for 2 years (scikit-learn/scikit-learn#22674), so anyone usinglightgbm
with any version ofscikit-learn
release in the last 2 years won't be affected at all by this change.Given that, I don't think adding that a
>= 1.17.0
floor should be too disruptive, and in exchange we can use the non-deprecatednp.random.Generator
API unconditionally.What else remains for NumPy 2.0 support?
I'm not sure. In #6467, I'm working on adding a CI job that tests
lightgbm
against NumPy,pandas
,pyarrow
,scipy
, andscikit-learn
nightlies.Pulling these randomness changes off of that because I think they're well-defined and ready to review.