-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serializability of SVMs with user-defined/callable kernels #91
Comments
Thanks for this great summary! I am not sure what the correct approach is. Scikit-learn appears to just pass the model to a general serialization library. If that library is pickle, then the serialization fails. JLD.jl has a way to ensure that modules containing the necessary definitions get imported before loading the serialized data. But... the user still needs to specify what modules should be loaded, and I don't think this is the way to go anyway. I think that users would tend to believe that everything was taken care of by LIBSVM, and that they can just conveniently load their model anytime, which is not true -- they would still need to maintain the same (or equivalent) kernel definitions in the specific module. So I feel that we should leave the responsibility completely to users (or downstream libraries). After all, LIBSVM.jl is only a thin wrapper for libsvm. ...which makes me wonder -- did we design the feature of callable kernels carefully enough in the first place? I mean -- it's just for convenience; the full, dense Gram matrix is computed anyway. What if we just provided a function that produces the Gram matrix given kernel function and kept only the precomputed kernel? This would, of course, be a step aside, and I am not sure how widely the callable kernel feature has been adopted by other packages already (for example, MLJ seem to address this only now?). |
Perhaps I'm missing something but wouldn't this restrict prediction? Don't we need to evaluate the kernel function for any new data on which we want to predict: k(x, x_i) for each new pattern x and each training pattern x_i? For what it's worth, I doubt MLJ will move to support pre-computed kernels it's difficult to do without introducing API changes that would be very specific to SVM models. The pending PR referenced above will add callable kernel support, which I think will be much appreciated by our users. |
I see. I thought it might be possible to define serialization strategies for specific types/structs.
This is actually not the case. For training its done this way (because it is necessary), but for predicting on new data its sparse, see here.
That's certainly doable -- you could export the functions that construct the matrices, even the function that constructs the sparse kernel matrix. However, if you want to make use of a serialized model, that would mean that you would also need to provide the training data (I think this is what @ablaom was referring to). Thus, instead of making sure the kernel function is available -- which I guess you'd still have to do -- one additionally would have to ensure the training data is available. Considering the current state, i.e. with the error message being rather descriptive, I think this would be more complicated.
Looking back now, I somewhat agree with you here: Considering that LIBSVM.jl is supposed to be a thin wrapper it would've probably been more appropriate to provide it upstream. Ultimately, this option still exists. However, since both functionality and the associated issue are quarantined away, i.e. they don't impact any other functionality of the package, I'm not sure it justifies the measure. |
You are not missing anything, you are completely right. My point is that (because I don't see an apparent, universal, totally satisfactory solution to this) maybe this is what higher-level libraries should take care of (or just the users of LIBSVM.jl) and not LIBSVM.jl. Libraries like MLJ could build the callable kernel feature upon the LIBSVM's precomputed kernel feature, picking the most suitable solution for them.
I am not an expert in any way here but I could not find a way to do that reliably.
You are right that for prediction we evaluate the kernel only for some points. In that sense the matrix is sparse. However, when preparing Our function could provide two methods # for training, produce l×l matrix
gram(k, X::AbstractMatrix)
# for prediction, produce sparse l×n matrix, evaluate only at SV indices
gram(k, X::AbstractMatrix, svm::SVM)
We would only need the support vectors, right? They are in the |
I see, fair point.
Yes and no, I think. Since, currently, if one provides the pre-computed kernel matrix the SVM never actually gets access to the features of the support vectors. The only thing that is stored in the pre-computed case are the indices of SVs, i.e. their position in the training set, but to calculate the kernel matrix for prediction one does need the actual features. But maybe I am missing some obvious solution to this problem -- everything I can think of either massively clutters the call signatures or the process of constructing the SVM, neither of which seems desireable. Of course if the idea is just to provide these functions for a higher-level library such as MLJ it's not an issue since they would just store the actual data in their |
True. I did not realize.
IMHO this is our responsibility.
abstract type Kernel end
struct Linear <: Kernel end
struct Polynomial <: Kernel end
...
struct Precomputed <: Kernel
support_vectors #maybe keep the indices too
end
kernel_id(::Linear) = 0
kernel_id(::Polynomial) = 1
...
kernel_id(::Precomputed) = 4
EDIT: After some more thinking, it would be a bigger change than I originally figured. So don't mind the above. |
Let me just sum up my thoughts on the matter -- take this with a grain of salt as there is most likely some personal bias involved: @barucden In either case I am happy to defer to more experienced maintainers on these issues though. |
As has been noted by @barucden in #88, SVMs with user-defined/callable kernels are generally not (de-)serializable. Since the issue has recently been brought up again in conjunction with downstream changes in JuliaAI/MLJLIBSVMInterface.jl#13 it would probably be worth having an issue one can reference to track the problem and collate discussion.
Current situation:
An SVM with a user-defined/callable kernel can be serialized and deserialized without problem, while the kernel function is available:
After exiting and re-entering REPL,
kernel
is undefined:execution fails with
If
kernel
is defined at the timedeserialize
is called, the code works:In contrast, serialization using built-in kernels works without a problem:
After exiting and re-entering REPL:
Possible Courses
I don't have too much experience with Julia and Serialization.jl in particular, but I see a few ways of tackling this issue:
Serialization.jl
, since its functionality seems to be rather restricted, but JLD.jl can do it, I think?The text was updated successfully, but these errors were encountered: