Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nexus transforms improvements #126

Merged
merged 11 commits into from
Nov 5, 2024
Merged

Conversation

SimonHeybrock
Copy link
Member

@SimonHeybrock SimonHeybrock commented Nov 4, 2024

This collects a number of small necessary improvements I ran into when trying to use GenericNeXusWorkflow on NMX files. I recommend looking at the individual commit messages.

Related: #96 (solving the simplest case).

@SimonHeybrock SimonHeybrock force-pushed the nexus-transforms-improvements branch from 16955ab to 853544d Compare November 5, 2024 05:13
@@ -192,10 +192,10 @@ class Filename(sciline.Scope[RunType, Path], Path): ...


@dataclass
class PulseSelection(Generic[RunType]):
class TimeInterval(Generic[RunType]):
"""Range of neutron pulses to load from NXevent_data or NXdata groups."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only pulses or also logs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the logs in NXtransformations it is loading the full log. The time interval is later used to determine which values are relevant.

The workflow is not loading any other logs currently.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But is your intention to use TimeInterval for other logs, too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really at the moment, since the label-based slicing in Scipp and ScippNeXus does not do what we need (include the previous value). We thus want to load "more" than the naive slice says. Unless we get very large logs it seems easier to just load everything, and then move events to log values.

src/ess/reduce/nexus/types.py Outdated Show resolved Hide resolved
# "end" time in the files. We add a dummy end so we can use Scipp's label-
# based indexing for histogram data.
time = t.value.coords['time']
delta = sc.scalar(86_400_000, unit='s', dtype='int64').to(unit=time.unit)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this number come from? Can't you just use np.iinfo('int64').max as the last value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is tricky, since we don't know the input dtype (could be signed or unsigned, or a datetime). There probably is a way (can you think of a simple one?), but just adding 1000 days seemed "safe".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is safe enough. I was more surprised by the concrete number and wondered whether it has some significance because it is not simply 10**10 or something like that.

src/ess/reduce/nexus/workflow.py Outdated Show resolved Hide resolved
It one or more transformations in the chain are time-dependent, the time interval
is used to select a specific time point. If the interval is not a single time point,
an error is raised. This may be extended in the future to a more sophisticated
mechanism, e.g., averaging over the interval to remove noise.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last sentence is not really usage documentation. If you want to track work on this, I would say it should be an issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it kind of is usage documentation: Someone will look for a way of processing the time-series, and this tells them it is not implemented.

# If the NXdetector in the file is not 1-D, we want to match the order of dims.
# zip_pixel_offsets otherwise yields a vector with dimensions in the order given
# by the x/y/z offsets.
offsets = snx.zip_pixel_offsets(da.coords).transpose(da.dims).copy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why copy?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I felt there was little to lose, whereas we still run into some Scipp operations that do not handle non-contiguous data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I see copy() somewhere, my assumption is that it has some significance. E.g., that the result will be modified in-place. So I went looking but didn't find anything.
Essentially, it increases 'noise' for the reader. But leave it or remove it, whichever you prefer.

@SimonHeybrock SimonHeybrock merged commit f5748ef into main Nov 5, 2024
4 checks passed
@SimonHeybrock SimonHeybrock deleted the nexus-transforms-improvements branch November 5, 2024 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants