Speed up wave.resource module #352

akeeste · 2024-09-17T19:07:12Z

@ssolson This is a follow-up to my other wave PRs and resolves #331. Handling the various edge cases robustly in pure numpy is difficult, so I want to first resolve #331 by using DataArrays throughout the wave resource functions instead of Datasets.

Similar to Ryan's testing mentioned in #331, I found that using DataArrays/Pandas has a 1000x speed up vs Datasets for very large input data. This should restore MHKiT's speed to it's previous state. Using a pure numpy base would have an additional 5-10x speed up from DataArrays, but I think the current work with DataArrays will:

be sufficient for our users
be easier to develop with
be easier to handle edge cases

…ay base

This reverts commit a2d5f61.

…aframes and 2+ var datasets

akeeste · 2024-10-10T17:26:09Z

@ssolson This PR is ready for review. Tests should pass now

With some modifications to the type handling functions, and an appropriate frequency_dimension input if required, the wave.resource functions should handle Pandas Series, Pandas DataFrames, and xarray DataArrays regardless of input shape, dimensions names, dimension order, etc. I largely moved away from converting data to xarraay Datasets because they are slow and more difficult to work with.

akeeste · 2024-10-10T17:31:43Z

mhkit.loads.graphics is unchanged, so I'm not sure why the pylint loads test is now failing on the number of positional arguments there. This branch is up to date with develop

akeeste · 2024-10-14T14:48:43Z

Thanks @ssolson. I'll merge that in here and fix a couple minor items with some examples

akeeste · 2024-10-14T15:20:27Z

@akeeste TODO:

fix a couple last issues with example notebooks
reduce time required for example notebooks to enforce speed back to previous benchmark

into speed_up_wave

akeeste · 2024-10-14T18:49:59Z

@ssolson this PR is now ready for review and all tests are passing. I tightened up the timing on the environmental contours, 3 extreme response, and PacWave examples.

A straight forward test case on the difference in computational expense is using a wave resource function (e.g. energy_period) with a year of NDBC spectral data, or repeating Ryan's script in #331

ssolson

@akeeste overall this addresses the issue. Thanks for putting this together. I have just a couple questions and a few minor clean up items.

ssolson · 2024-10-16T14:12:55Z

mhkit/tests/loads/test_loads.py

@@ -442,11 +440,9 @@ def test_mler_export_time_series(self):
        mler["WaveSpectrum"] = self.mler["Norm_Spec"].values
        mler["Phase"] = self.mler["phase"].values
        k = resource.wave_number(wave_freq, 70)
-        k = k.fillna(0)
+        np.nan_to_num(k, 0)


This returns k so it would need to be:
k=nan_to_num(k,0)

However k has no nans so I don't think this is needed.

The zero frequency in wave_freq results in a nan value. The call to np.nan_to_num() updates the input k in my testing, which is useful if k is not a numpy array as the input data is both updated and retains its type. If it's more clear for this particular test, I can update to redefine k

This is not a limiting factor to getting the PR through but are you saying np.nan_to_num modifies k inplace when you run it?

The docs say the function returns the modified array and that was my experience when I paused the code here:
https://numpy.org/doc/2.0/reference/generated/numpy.nan_to_num.html

@ssolson you are correct. I didn't look at the docstring close enough when converting pandas fillna to numpy nan_to_num. np.nan_to_num(k, 0) is setting copy to False and changing k in place, instead of specifying the fill value which already defaults to 0. I'll update this line so that we're using the function correctly

ssolson · 2024-10-16T14:53:31Z

mhkit/tests/wave/test_resource_metrics.py

@@ -95,7 +95,8 @@ def test_kfromw(self):

            expected = self.valdata1[i]["k"]
            k = wave.resource.wave_number(f, h, rho)
-            calculated = k.loc[:, "k"].values
+            # calculated = k.loc[:, "k"].values


ssolson · 2024-10-16T14:57:50Z

mhkit/utils/type_handling.py

+            # Rename to "variable" to match how multiple Dataset variables get converted into a DataArray dimension
+            data = xr.DataArray(data)
+            if data.dims[1] == "dim_1":
+                # Slight chance their is already a name for the columns


their => there

ssolson · 2024-10-16T14:59:58Z

mhkit/wave/performance.py

+    LM: pandas DataFrame, xarray DatArray, or xarray Dataset
        Capture length
-    JM: pandas DataFrame or xarray Dataset
+    JM: pandas DataFrame, xarray DatArray, or xarray Dataset
        Wave energy flux
-    frequency: pandas DataFrame or xarray Dataset
+    frequency: pandas DataFrame, xarray DatArray, or xarray Dataset


xararray "DataArray" not "DatArray"

ssolson · 2024-10-16T15:37:10Z

mhkit/wave/resource.py

@@ -87,26 +86,24 @@ def elevation_spectrum(
            + "temporal spacing for eta."
        )

-    S = xr.Dataset()
-    for var in eta.data_vars:


This for loop allowed users to process multiple wave heights in a DataFrame ect. Removing it means the user can only process one eta at a time.

Does this update require use to remove this functionality? Would it make sense to be able to parse each variable in a DataSet into a DataArray? Or to have the user create any needed loop outside of the funtion for simplicity?

E.g. the following test will fail. Can we make this work?

def test_elevation_spectrum_multiple_variables(self): time = np.linspace(0, 100, 1000) eta1 = np.sin(2 * np.pi * 0.1 * time) eta2 = np.sin(2 * np.pi * 0.2 * time) eta3 = np.sin(2 * np.pi * 0.3 * time) eta_dataset = xr.Dataset({ 'eta1': (['time'], eta1), 'eta2': (['time'], eta2), 'eta3': (['time'], eta3) }, coords={'time': time}) sample_rate = 10 nnft = 256 spectra = wave.resource.elevation_spectrum( eta_dataset, sample_rate, nnft )

If so lets finish out this test and add this to the test suite.

Users can still input Datasets, but right now all variables must have the same dimensions, it will be converted to a DataArray up front. The function then returns a DataArray.

I'll look at reinstating a dataset/dataframe loop so that those types are returned

I added this loop again. Our previous slowdown of large pandas DataFrames --> xarray datasets could occur in these two functions now, but I don't think that is a typical use case. For example in the case described in #331, it's unlikely a user would have thousands of different wave elevation time series and convert them all to wave spectra (likewise for thousands of distinct spectra being converted to elevation time series). If that case does come up, the slow down should not be due to our implementation but the large quantity of data involved.

Adam the last thing I was checking this morning was adding the above test to the testing suite.

Currently though when I pass a DataSet I am getting a DataFrame back.

I think the idea was for function return the same type as the user passes?

Right now if a user inputs a multivariate type (Dataset, DataFrame), they will get a multivariate type back, but whether its a Dataset or DataFrame is still controlled by the to_pandas flag, which is consistent with other functions containing this flag.

It might be worth making a more complex default for to_pandas where a use could specify True/False, and if they don't specify the function defaults to return the type (xarray/pandas) that they input. However this should be a separate PR that address all instances of this flag for consistency

Apologies. That was an oversite on my part on the to_pandas parameter.

I still want to add the above test to the testing suite akeeste#6

ssolson · 2024-10-17T16:18:04Z

mhkit/wave/resource.py


    omega = xr.DataArray(
        data=2 * np.pi * f, dims=frequency_dimension, coords={frequency_dimension: f}
    )

-    eta = xr.Dataset()
-    for var in S.data_vars:


Same as above removed the ability to iterate of multiple columns, but we still accept DataSets and multi column pandas

ssolson · 2024-10-17T16:22:00Z

mhkit/wave/resource.py

@@ -1153,7 +1164,7 @@ def wave_number(f, h, rho=1025, g=9.80665, to_pandas=True):
    """
    if isinstance(f, (int, float)):
        f = np.asarray([f])
-    f = convert_to_dataarray(f)
+    # f = convert_to_dataarray(f)


…levation_spectrum

…vation

akeeste · 2024-10-17T20:12:09Z

@ssolson I addressed all your comments and again allowed datasets into wave.resource.surface_elevation and wave.resource.elevation_spectrum

* matplotlib >=3.8 * remove debug

akeeste · 2024-10-22T14:26:34Z

@ssolson akeeste#6 is merged. Anything else you want to address in this PR?

ssolson · 2024-10-22T14:49:10Z

@ssolson akeeste#6 is merged. Anything else you want to address in this PR?

Thanks Adam. All good to merge assuming the tests pass.

akeeste · 2024-10-22T15:15:38Z

All tests passed. Merging!

akeeste added 4 commits September 12, 2024 14:50

fix assignment in type_handling

fcc910e

temporary testing file

b19217c

initial conversion of energy_period and frequency_moment to DataArray

5addc17

energy_period working with variety of types and converting to dataArr…

ac91b92

…ay base

akeeste requested a review from ssolson September 17, 2024 19:07

akeeste added 13 commits September 24, 2024 14:34

extend xr.dataarray basis to all wave.resource functions

5d914a3

remove testing script

8683b88

black formatting

e860034

fix most test formatting

3382825

use dataarrays instead of datasets in wave.performance

c5241af

revert surface_elevation function back to datasets

a2d5f61

Revert "revert surface_elevation function back to datasets"

e016fff

This reverts commit a2d5f61.

allow datasets, 2d dataframes. Update test formatting appropriately

a719620

simplify and improve robustness of convert_to_dataarray for 1-var dat…

d585cb5

…aframes and 2+ var datasets

update test formatting

ac5b436

clean up frequency_bin and method checks in elevation_surface

afc7f8c

update and annotate type_handling

8f1647f

black formatting

c0d72d0

akeeste marked this pull request as ready for review October 2, 2024 18:34

akeeste marked this pull request as draft October 2, 2024 18:44

akeeste added 5 commits October 2, 2024 14:08

minor type fix

8dddf42

update type references in loads

3a170ff

update type references in loads - v2

42a85d8

black formatting

28b847b

fix call to fillna() in MLER example

2f04e87

akeeste marked this pull request as ready for review October 10, 2024 17:24

fix references to k in MLER example

d3fcc02

add variable names to Hm0, Te, and Tp for DataFrame creation

07b5033

akeeste added 7 commits October 14, 2024 10:43

add variable names after data conversion

6dbbc18

Merge branch 'develop' of https://github.com/MHKiT-Software/MHKiT-Python

939d730

into speed_up_wave

update pacwave example

664d2e9

add type check to all wave.resource variable naming

dad0d0a

update wave example with new data types

853521d

pull buoy name from metadata in cdip example

28e67f6

tighten up example timing

69e694b

ssolson assigned akeeste Oct 16, 2024

ssolson added wave module Clean Up Improve code consistency and readability labels Oct 16, 2024

ssolson reviewed Oct 17, 2024

View reviewed changes

akeeste added 5 commits October 17, 2024 12:27

address minor review comments, add some type checking tests

243a999

complicated dataset and dataframe handling in surface_elevation and e…

a7241f3

…levation_spectrum

restore and simplify dataset input to elevation_spectrum, surface_ele…

d13bb22

…vation

black formatting

0b30366

update missed docstring

0c5f5be

akeeste and others added 2 commits October 21, 2024 09:38

use np.nan_to_num correctly in test_loads

ef669bd

Speed up wave (#6)

0a4bbfb

* matplotlib >=3.8 * remove debug

akeeste merged commit 6c2d013 into MHKiT-Software:develop Oct 22, 2024
41 checks passed

akeeste deleted the speed_up_wave branch October 22, 2024 15:15

This was referenced Oct 22, 2024

Xarray conversion - Project Tracking Card #260

Closed

Xarray Strategy causing significant MHKiT computation times (MHKiT slow) #331

Closed

ssolson mentioned this pull request Dec 4, 2024

MHKiT 0.9.0 #361

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up wave.resource module #352

Speed up wave.resource module #352

akeeste commented Sep 17, 2024 •

edited

Loading

akeeste commented Oct 10, 2024

akeeste commented Oct 10, 2024

akeeste commented Oct 14, 2024

akeeste commented Oct 14, 2024 •

edited

Loading

akeeste commented Oct 14, 2024 •

edited

Loading

ssolson left a comment

ssolson Oct 16, 2024

akeeste Oct 17, 2024

ssolson Oct 18, 2024 •

edited

Loading

akeeste Oct 21, 2024

ssolson Oct 16, 2024

ssolson Oct 16, 2024

ssolson Oct 16, 2024

ssolson Oct 16, 2024

akeeste Oct 17, 2024

akeeste Oct 17, 2024

ssolson Oct 21, 2024

akeeste Oct 21, 2024

ssolson Oct 21, 2024

ssolson Oct 17, 2024

ssolson Oct 17, 2024

akeeste commented Oct 17, 2024

akeeste commented Oct 22, 2024

ssolson commented Oct 22, 2024

akeeste commented Oct 22, 2024

Speed up wave.resource module #352

Speed up wave.resource module #352

Conversation

akeeste commented Sep 17, 2024 • edited Loading

akeeste commented Oct 10, 2024

akeeste commented Oct 10, 2024

akeeste commented Oct 14, 2024

akeeste commented Oct 14, 2024 • edited Loading

akeeste commented Oct 14, 2024 • edited Loading

ssolson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ssolson Oct 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

akeeste commented Oct 17, 2024

akeeste commented Oct 22, 2024

ssolson commented Oct 22, 2024

akeeste commented Oct 22, 2024

akeeste commented Sep 17, 2024 •

edited

Loading

akeeste commented Oct 14, 2024 •

edited

Loading

akeeste commented Oct 14, 2024 •

edited

Loading

ssolson Oct 18, 2024 •

edited

Loading