Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python-package] remove uses of deprecated NumPy random number generation APIs, require 'numpy>=1.17.0' #6468

Merged
merged 1 commit into from
Jun 4, 2024

Conversation

jameslamb
Copy link
Collaborator

@jameslamb jameslamb commented May 30, 2024

Contributes to #6454

This PR moves lightgbm closer to compatibility with numpy 2.x

  • adds the NumPy 2.0 deprecation ruff rules to linting config (ruff docs)
  • removes all uses of np.random.{method} convenience functions (which are deprecated) in favor of the corresponding np.random.Generator.{method} calls

And to accompany that:

  • adds a floor of numpy>1.17.0

Notes for Reviewers

How I identified these

That ruff rule identified 95 instances of the following.

NPY002 Replace legacy `np.random.choice` call with `np.random.Generator`
NPY002 Replace legacy `np.random.normal` call with `np.random.Generator`
NPY002 Replace legacy `np.random.rand` call with `np.random.Generator`
NPY002 Replace legacy `np.random.randint` call with `np.random.Generator`
NPY002 Replace legacy `np.random.permutation` call with `np.random.Generator`
NPY002 Replace legacy `np.random.randn` call with `np.random.Generator`
NPY002 Replace legacy `np.random.seed` call with `np.random.Generator`
NPY002 Replace legacy `np.random.uniform` call with `np.random.Generator`

Why these particular replacements?

Based mainly on the docstrings of those numpy functions (which recommend replacements). I tried to follow these rules:

details (click me)

np.random.choice(a, size)

Chooses a random sample (with replacement by default) of size size from array a

replacement: np.random.Generator.choice()

np.random.normal(loc, scale)

Chooses from a normal distribution centered at loc with standard deviation scale.

replacement: np.random.Generator.normal()

np.random.permutation()

Takes in a sequence and randomly permutes it.

replacement: np.random.Generator.permutation()

np.random.rand()

Chooses random floats from a uniform distribution [0, 1)

replacement: np.random.Generator.uniform()

np.random.randint(low, high)

Chooses random integers from uniform distribution [low, high).

replacement: np.random.Generator.integers()

np.random.randn()

Chooses random floats from the standard normal distribution

replacement: np.random.Generator.standard_normal()

np.random.uniform(low, high)

Chooses random floats from uniform distribution [low, high).

replacement: np.random.Generator.uniform()

np.random.seed(seed)

Resets the random number generator seed for the np.random.RandomState object. That affects all future
np.random.{method}() calls.

replacement: explicitly creating a np.random.Generator with that seed, e.g. np.random.default_rng(seed)

Why mark this breaking and add a floor on numpy?

The np.random.Generator class was introduced in numpy==1.17.0.

That release was in July 2019... almost 5 years ago. The highest Python version it had wheels for was Python 3.7 (https://pypi.org/project/numpy/1.17.0/#files).

scikit-learn also has had a higher floor than this (numpy>=1.17.3) for 2 years (scikit-learn/scikit-learn#22674), so anyone using lightgbm with any version of scikit-learn release in the last 2 years won't be affected at all by this change.

Given that, I don't think adding that a >= 1.17.0 floor should be too disruptive, and in exchange we can use the non-deprecated np.random.Generator API unconditionally.

What else remains for NumPy 2.0 support?

I'm not sure. In #6467, I'm working on adding a CI job that tests lightgbm against NumPy, pandas, pyarrow, scipy, and scikit-learn nightlies.

Pulling these randomness changes off of that because I think they're well-defined and ready to review.

Copy link
Collaborator

@borchero borchero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Good to see that the API changes should be minimal 😄

@jameslamb
Copy link
Collaborator Author

Thanks very much @borchero !

I'm going to merge this to keep making progress towards the release... hopefully we can get it out before NumPy 2.0 is released June 16th.

@jameslamb jameslamb merged commit e0cda88 into master Jun 4, 2024
39 checks passed
@jameslamb jameslamb deleted the fix/numpy-2.0-randomness branch June 4, 2024 01:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants