Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to do hyperparameter tuning #76

Open
simsurace opened this issue Mar 31, 2022 · 7 comments
Open

How to do hyperparameter tuning #76

simsurace opened this issue Mar 31, 2022 · 7 comments

Comments

@simsurace
Copy link
Member

simsurace commented Mar 31, 2022

I tried to AD aug_elbo in the NegBinomialLikelihood example, i.e. (removed unnecessary bits), purposefully avoiding ParameterHandling.jl and trying only with ForwardDiff.gradient

# # Negative Binomial

# We load all the necessary packages
using AbstractGPs
using ApproximateGPs
using AugmentedGPLikelihoods
using Distributions
using ForwardDiff # <-- try this first
using LinearAlgebra

# We create some random data (sorted for plotting reasons)
N = 100
x = range(-10, 10; length=N)
kernel = with_lengthscale(SqExponentialKernel(), 2.0)
gp = GP(kernel)
lik = NegBinomialLikelihood(15)
lf = LatentGP(gp, lik, 1e-6)
f, y = rand(lf(x));

# ## ELBO
# How can one compute the Augmented ELBO?
# Again AugmentedGPLikelihoods provides helper functions
# to not have to compute everything yourself
function aug_elbo(lik, u_post, x, y)
    qf = marginals(u_post(x))
    qΩ = aux_posterior(lik, y, qf)
    return expected_logtilt(lik, qΩ, y, qf) - aux_kldivergence(lik, qΩ, y) -
           kldivergence(u_post.approx.q, u_post.approx.fz)     # approx.fz is the prior and approx.q is the posterior 
end

function u_posterior(fz, m, S)
    return posterior(SparseVariationalApproximation(Centered(), fz, MvNormal(m, S)))
end

# ## Try to differentiate loss function

function makeloss(x, y)
    N = length(x)
    function loss(θ)
        k = ScaledKernel(
            RBFKernel()  ScaleTransform(inv(θ[1])), 
            θ[2]
        )
        gp = GP(k)
        lik = NegBinomialLikelihood(θ[3])
        fz = gp(x, 1e-8);
        u_post = u_posterior(fz, zeros(N), Matrix{Float64}(I(N)))
        return aug_elbo(lik, u_post, x, y)
    end
end

θ = [1., 1., 15.]

loss = makeloss(x, y)
loss(θ) # works!
ForwardDiff.gradient(loss, θ) # MethodError

There is an easy fix (happy to open a PR): change the definition of aux_posterior as

function aux_posterior(lik::NegBinomialLikelihood, y, f)
    c = sqrt.(second_moment.(f))
    return For(TupleVector(; y=y, c=c)) do φ
        NTDist(PolyaGamma.y + lik.r, φ.c)) # Distributions uses a different parametrization
    end
end
julia> ForwardDiff.gradient(loss, θ)
3-element Vector{Float64}:
  5.790557942012172e7
 -1.9761748845444782e9
 16.184871970106013

BTW: is it expected that the values of the augmented ELBO are so much larger in magnitude than the normal ELBO?

@theogf
Copy link
Member

theogf commented Apr 1, 2022

Hi @simsurace,

Sorry, I did not answer (and save you some time) but I got Covid-stroke. Actually aux_posterior! should not be differentiable! It should be ignored when doing the AD pass. When using Zygote, I pass the block in a Zygote.@ignore, I don't know if it's possible to do the same with ForwardDiff though.
The reason is that the aux_posterior step is already an implicit optimization.

@simsurace
Copy link
Member Author

Thanks, I did not notice that this is an implicit optimization. So is it independent of hyperparameters? If yes, this and the PR #77 would be unnecessary. I will give it a try and see if the results change! Sorry to hear about your illness. Hope you get better soon.

@simsurace
Copy link
Member Author

If it works out with the ignore statements, I could convert the PR into a documentation thing where this is explained.

@theogf
Copy link
Member

theogf commented Apr 1, 2022

Thanks, I did not notice that this is an implicit optimization. So is it independent of hyperparameters? If yes, this and the PR #77 would be unnecessary. I will give it a try and see if the results change! Sorry to hear about your illness. Hope you get better soon.

That's an interesting question actually it depends on the parametrization. Right now I am parametrizing with m and S, mean and covariance. But one could parameterize the covariance as (K^{-1} + D)^{-1} and similarly for the mean, there one could optimize the hyperparameters as well but that's a more complicated matter.

In summary, for full GPs, the kernel parameters only matters for the KL(q(f)||p(f)) and for sparse GPs they also are influenced in the expected log likelihood, but that's it.

@simsurace
Copy link
Member Author

Just to clarify my understanding:
The qΩ = aux_posterior(lik, y, qf) should be ignored by the AD system, even though lik and qf depend on the parameters such as likelihood parameters, inducing point locations and variational parameters that one wants to optimize over?

@theogf
Copy link
Member

theogf commented Apr 1, 2022

Oh yeah sorry, somehow I got confused with the updates on q(f).
But it's the same thing. is optimized via aux_posterior and once this is obtained we can compute the ELBO and optimize the rest of the other hyper-parameters

@simsurace
Copy link
Member Author

simsurace commented Apr 1, 2022

EDIT: Ah I think I now understood, one should not expose the variational parameters to the optimizer, but have an internal CAVI loop for them.

Still struggling to make it work though. Do you have a working example for hyperparameter optimization of the augmented ELBO?

No hurry though. This is not very urgent, but it would be nice to make this work and compare it to the normal SVGP optimization loop for speed.

@simsurace simsurace changed the title aux_posterior is not AD-ready How to do hyperparameter tuning May 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants