Some equivalent sources tests failing on scikit-learn 1.6.0 #546

santisoler · 2024-12-10T23:06:25Z

After the recent release of scikit-learn 1.6.0, some of the tests on equivalent sources have started to fail. The reason of failure is due to differences between the actual and expected predictions made by some of the equivalent sources. Although the tolerances for those tests are a little bit arbitrary, it worries me that after this release our equivalent sources became less accurate than before.

Last tests with scikit-learn 1.5.2 (passed): https://github.com/fatiando/harmonica/actions/runs/12085108636/job/33701663972
Example of tests failing (scikit-learn 1.6.0): https://github.com/fatiando/harmonica/actions/runs/12265983495/job/34223197151

We could increase some of the tolerances to make them pass, but I'm worried that there might be something more complex under the hood that would bite us in the future. The main thing that worries me is that the test using float32 is failing on ~1/3 of the predicted points: https://github.com/fatiando/harmonica/actions/runs/12265983495/job/34223197151#step:12:1241

santisoler · 2024-12-10T23:11:01Z

cc @leouieda

santisoler · 2025-01-07T23:38:06Z

After investigating a little bit on this, I think the differences are coming from the fitting process of LinearRegression objects between sklearn 1.5.2 and 1.6.0. They seem to produce different coefficients provided the same jacobian and data (when they are float32).

Here I have a minimal example:

import harmonica as hm
import numpy as np
import verde as vd

# Define some observation points
region = (-3e3, -1e3, 5e3, 7e3)
shape = (40, 40)
coordinates = vd.grid_coordinates(region=region, shape=shape, extra_coords=0)

# Define some synthetic point sources and compute the g_z field they generate
points = vd.grid_coordinates(region=region, shape=(6, 6), extra_coords=-1e3)
masses = vd.synthetic.CheckerBoard(amplitude=1e13, region=region).predict(points)
data = hm.point_gravity(coordinates, points, masses, field="g_z")


# Define two equivalent sources: one with f64 and one with f32.
# Set damping to None so we make Verde's least squares to use sklearn's
# LinearRegression.
damping = None
eqs_f64 = hm.EquivalentSources(damping=damping, dtype=np.float64)
eqs_f32 = hm.EquivalentSources(damping=damping, dtype=np.float32)

# Fit the two equivalent sources with the same data
eqs_f64.fit(coordinates, data)
eqs_f32.fit(coordinates, data)

# Print out the fitted coefficients
print(f"{eqs_f64.coefs_=}")
print(f"{eqs_f32.coefs_=}")

When I run it with sklearn 1.5.2 I get the following results:

eqs_f64.coefs_=array([  9553.2290226 , -12750.25574555,  15314.744578  , ...,
       -15314.74474181,  12750.25585894,  -9553.22906564])
eqs_f32.coefs_=array([ 2237.7256814 ,   746.18660681,  1715.03647668, ...,
       -3354.99101948,  3256.82847107, -5389.38652496])

When I run it with sklearn 1.6.0 I get the following results:

eqs_f64.coefs_=array([  9553.2290226 , -12750.25574555,  15314.744578  , ...,
       -15314.74474181,  12750.25585894,  -9553.22906564])
eqs_f32.coefs_=array([ 560.50814581,  364.78633867,  236.63370232, ..., -237.56391109,
       -364.06362227, -557.84896517])

The coefficients fitted for the f64 equivalent sources are the same, but the ones for f32 differs significantly.

I don't see this behaviour when using sklearn's Ridge (we can do that by setting a damping value different than None): even with very small dampings, the coefficients between sklearn 1.5.2 and 1.6.0 are consistent (although they differ from the f64 ones).

santisoler · 2025-01-07T23:40:16Z

Besides the behaviour change, I think these tests failing raise a few things:

It's not advisable to fit the sources without damping. The problem is usually ill-constrained, and we need to regularize it.
Maybe using f32 elements in the jacobian doesn't produce very accurate results, so perhaps it's not the best way to save some memory.

santisoler mentioned this issue Dec 10, 2024

Fix broken Miniforge installation in docs.yml #545

Merged

santisoler mentioned this issue Jan 8, 2025

Fix equivalent sources tests failing after sklearn 1.6.0 #548

Merged

santisoler closed this as completed in #548 Jan 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some equivalent sources tests failing on scikit-learn 1.6.0 #546

Some equivalent sources tests failing on scikit-learn 1.6.0 #546

santisoler commented Dec 10, 2024 •

edited

Loading

santisoler commented Dec 10, 2024

santisoler commented Jan 7, 2025

santisoler commented Jan 7, 2025

Some equivalent sources tests failing on scikit-learn 1.6.0 #546

Some equivalent sources tests failing on scikit-learn 1.6.0 #546

Comments

santisoler commented Dec 10, 2024 • edited Loading

santisoler commented Dec 10, 2024

santisoler commented Jan 7, 2025

santisoler commented Jan 7, 2025

santisoler commented Dec 10, 2024 •

edited

Loading