[python-package] remove uses of deprecated NumPy random number generation APIs, require 'numpy>=1.17.0' #6468

jameslamb · 2024-05-30T03:24:04Z

Contributes to #6454

This PR moves lightgbm closer to compatibility with numpy 2.x

adds the NumPy 2.0 deprecation ruff rules to linting config (ruff docs)
removes all uses of np.random.{method} convenience functions (which are deprecated) in favor of the corresponding np.random.Generator.{method} calls

And to accompany that:

adds a floor of numpy>1.17.0

Notes for Reviewers

How I identified these

That ruff rule identified 95 instances of the following.

NPY002 Replace legacy `np.random.choice` call with `np.random.Generator`
NPY002 Replace legacy `np.random.normal` call with `np.random.Generator`
NPY002 Replace legacy `np.random.rand` call with `np.random.Generator`
NPY002 Replace legacy `np.random.randint` call with `np.random.Generator`
NPY002 Replace legacy `np.random.permutation` call with `np.random.Generator`
NPY002 Replace legacy `np.random.randn` call with `np.random.Generator`
NPY002 Replace legacy `np.random.seed` call with `np.random.Generator`
NPY002 Replace legacy `np.random.uniform` call with `np.random.Generator`

Why these particular replacements?

Based mainly on the docstrings of those numpy functions (which recommend replacements). I tried to follow these rules:

details (click me)

np.random.choice(a, size)

Chooses a random sample (with replacement by default) of size size from array a

replacement: np.random.Generator.choice()

np.random.normal(loc, scale)

Chooses from a normal distribution centered at loc with standard deviation scale.

replacement: np.random.Generator.normal()

np.random.permutation()

Takes in a sequence and randomly permutes it.

replacement: np.random.Generator.permutation()

np.random.rand()

Chooses random floats from a uniform distribution [0, 1)

replacement: np.random.Generator.uniform()

np.random.randint(low, high)

Chooses random integers from uniform distribution [low, high).

replacement: np.random.Generator.integers()

np.random.randn()

Chooses random floats from the standard normal distribution

replacement: np.random.Generator.standard_normal()

np.random.uniform(low, high)

Chooses random floats from uniform distribution [low, high).

replacement: np.random.Generator.uniform()

np.random.seed(seed)

Resets the random number generator seed for the np.random.RandomState object. That affects all future
np.random.{method}() calls.

replacement: explicitly creating a np.random.Generator with that seed, e.g. np.random.default_rng(seed)

Why mark this `breaking` and add a floor on `numpy`?

The np.random.Generator class was introduced in numpy==1.17.0.

That release was in July 2019... almost 5 years ago. The highest Python version it had wheels for was Python 3.7 (https://pypi.org/project/numpy/1.17.0/#files).

scikit-learn also has had a higher floor than this (numpy>=1.17.3) for 2 years (scikit-learn/scikit-learn#22674), so anyone using lightgbm with any version of scikit-learn release in the last 2 years won't be affected at all by this change.

Given that, I don't think adding that a >= 1.17.0 floor should be too disruptive, and in exchange we can use the non-deprecated np.random.Generator API unconditionally.

What else remains for NumPy 2.0 support?

I'm not sure. In #6467, I'm working on adding a CI job that tests lightgbm against NumPy, pandas, pyarrow, scipy, and scikit-learn nightlies.

Pulling these randomness changes off of that because I think they're well-defined and ready to review.

…tion APIs

borchero

Nice! Good to see that the API changes should be minimal 😄

jameslamb · 2024-06-04T01:17:31Z

Thanks very much @borchero !

I'm going to merge this to keep making progress towards the release... hopefully we can get it out before NumPy 2.0 is released June 16th.

[python-package] remove uses of deprecated NumPy random number genera…

f361c7f

…tion APIs

jameslamb added the breaking label May 30, 2024

jameslamb requested review from guolinke, shiyu1994, jmoralez and borchero as code owners May 30, 2024 03:24

jameslamb added the awaiting review label May 31, 2024

borchero approved these changes Jun 3, 2024

View reviewed changes

jameslamb removed the awaiting review label Jun 4, 2024

jameslamb merged commit e0cda88 into master Jun 4, 2024
39 checks passed

jameslamb deleted the fix/numpy-2.0-randomness branch June 4, 2024 01:17

jameslamb mentioned this pull request Jun 20, 2024

update floors, add run_constrained entries for optional dependencies conda-forge/lightgbm-feedstock#54

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[python-package] remove uses of deprecated NumPy random number generation APIs, require 'numpy>=1.17.0' #6468

[python-package] remove uses of deprecated NumPy random number generation APIs, require 'numpy>=1.17.0' #6468

jameslamb commented May 30, 2024 •

edited

Loading

borchero left a comment

jameslamb commented Jun 4, 2024

[python-package] remove uses of deprecated NumPy random number generation APIs, require 'numpy>=1.17.0' #6468

[python-package] remove uses of deprecated NumPy random number generation APIs, require 'numpy>=1.17.0' #6468

Conversation

jameslamb commented May 30, 2024 • edited Loading

Notes for Reviewers

How I identified these

Why these particular replacements?

Why mark this breaking and add a floor on numpy?

What else remains for NumPy 2.0 support?

borchero left a comment

Choose a reason for hiding this comment

jameslamb commented Jun 4, 2024

jameslamb commented May 30, 2024 •

edited

Loading

Why mark this `breaking` and add a floor on `numpy`?