-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scores for FDataIrregular
objects
#609
Comments
(testing included to assert equality with the `FDataGrid` case)
hi, thank you for opening up the issue. I'm trying to apply FPCA using this code. I have functions with R^3 -> R. can FPCA be implemented on FDataIrregular too? (or should I open up another issue?) |
Hello, @ooodragon94. As I understand, your case is very different from the one I outlined in this issue. There are ways to implement FPCA for irregular data, but we haven't implemented that yet, as FDataIrregular is a very recent addition to the package. You should definitely open another issue explaining the type of data that you have and what you want to do in detail. The development efforts tend to be steered towards what users request, so it will be very useful to know what you would like to have in the package. |
After discussing this issue with @vnmabus and Alberto Suárez, we concluded that the integral of a functional data object should always be the integral over its domain In #610 , I have implemented the changes explained above; that is, dividing each integral by the measure However, once the integral of discretized datasets is properly defined #619 (over the domain of the functional data object), these scores must be redefined so that the integrals are divided by the domain's measure: |
Implement scores for `FDatairregular` objects as described in #609
Motivation
Computing scores between
FDataIrregular
objects is a missing functionality of the package, and it can be useful when measuring the quality of conversions from irregular objects to basis representation.Desired functionality
Compute scores when both
y_true
andy_pred
areFDataIrregular
objects.How to implement each score?
There is a big problem when implementing scores for
FDataIrregular
: the mean of anFDataIrreuglar
objects is not well defined. Most of the scores (for FData objects) involve computing the mean of an FData object.We can surpass this issue in some of the cases when we want the
$D_i$ and $V_i$ correspond to the domain of the $i$ -th irregular curve and its lebesgue measure, respectively. I am not sure if this choice of not using the whole domain $D$ and its volume $V$ is the best, perhaps it would be less confusing to not bother computing the $V_i$ 's, but I believe that the result would be less accurate, implicitly giving more weight to curves that have more spread-out points.
"uniform_average"
of the score and not the"raw_values"
.An example where we can avoid computing the mean is
mean_absolute_error
. The mean absolute error is defined this way:To avoid having to calculate the mean of the
FDataIrregular
whenmultioutput="uniform_average"
, we can change the order of the mean and the integral. That is, instead of:We can use:
Where
This idea can be applied to
mean_absolute_error
,mean_absolute_percentage_error
,mean_squared_error
andmean_squared_log_error
. I am going to implement these in feature/scoring-fdatairregular.r2_score
I believe that the
r2_score
can not be implemented for theFDataIrregular
case, as its definition is to compare how welly_pred
predicts the values ofy_true
in relation to how well the mean does, and the mean is not defined.A possible implementation of
r2_score
forFDataIrregular
objects would be to just compute ther2_score
of(y_true.values, y_pred.values)
. However, I do not think this is a good option, as it disregards the functional structure of the curves, ignoring the points where they are measured and the mean of the values does not have the same meaning as in the other cases (FDataGrid
andFDataBasis
). Moreover, a user can manually callr2_score(y_true.values, y_pred.values)
explicitly, so I do not think we should implement this score for irregular data, as it is not properly defined.The case of
explained_variance_score
is very similar to that ofr2_score
.The text was updated successfully, but these errors were encountered: