How to do hyperparameter tuning #76

simsurace · 2022-03-31T14:20:03Z

I tried to AD aug_elbo in the NegBinomialLikelihood example, i.e. (removed unnecessary bits), purposefully avoiding ParameterHandling.jl and trying only with ForwardDiff.gradient

# # Negative Binomial

# We load all the necessary packages
using AbstractGPs
using ApproximateGPs
using AugmentedGPLikelihoods
using Distributions
using ForwardDiff # <-- try this first
using LinearAlgebra

# We create some random data (sorted for plotting reasons)
N = 100
x = range(-10, 10; length=N)
kernel = with_lengthscale(SqExponentialKernel(), 2.0)
gp = GP(kernel)
lik = NegBinomialLikelihood(15)
lf = LatentGP(gp, lik, 1e-6)
f, y = rand(lf(x));

# ## ELBO
# How can one compute the Augmented ELBO?
# Again AugmentedGPLikelihoods provides helper functions
# to not have to compute everything yourself
function aug_elbo(lik, u_post, x, y)
    qf = marginals(u_post(x))
    qΩ = aux_posterior(lik, y, qf)
    return expected_logtilt(lik, qΩ, y, qf) - aux_kldivergence(lik, qΩ, y) -
           kldivergence(u_post.approx.q, u_post.approx.fz)     # approx.fz is the prior and approx.q is the posterior 
end

function u_posterior(fz, m, S)
    return posterior(SparseVariationalApproximation(Centered(), fz, MvNormal(m, S)))
end

# ## Try to differentiate loss function

function makeloss(x, y)
    N = length(x)
    function loss(θ)
        k = ScaledKernel(
            RBFKernel() ∘ ScaleTransform(inv(θ[1])), 
            θ[2]
        )
        gp = GP(k)
        lik = NegBinomialLikelihood(θ[3])
        fz = gp(x, 1e-8);
        u_post = u_posterior(fz, zeros(N), Matrix{Float64}(I(N)))
        return aug_elbo(lik, u_post, x, y)
    end
end

θ = [1., 1., 15.]

loss = makeloss(x, y)
loss(θ) # works!
ForwardDiff.gradient(loss, θ) # MethodError

There is an easy fix (happy to open a PR): change the definition of aux_posterior as

function aux_posterior(lik::NegBinomialLikelihood, y, f)
    c = sqrt.(second_moment.(f))
    return For(TupleVector(; y=y, c=c)) do φ
        NTDist(PolyaGamma(φ.y + lik.r, φ.c)) # Distributions uses a different parametrization
    end
end

julia> ForwardDiff.gradient(loss, θ)
3-element Vector{Float64}:
  5.790557942012172e7
 -1.9761748845444782e9
 16.184871970106013

BTW: is it expected that the values of the augmented ELBO are so much larger in magnitude than the normal ELBO?

The text was updated successfully, but these errors were encountered:

theogf · 2022-04-01T07:55:35Z

Hi @simsurace,

Sorry, I did not answer (and save you some time) but I got Covid-stroke. Actually aux_posterior! should not be differentiable! It should be ignored when doing the AD pass. When using Zygote, I pass the block in a Zygote.@ignore, I don't know if it's possible to do the same with ForwardDiff though.
The reason is that the aux_posterior step is already an implicit optimization.

simsurace · 2022-04-01T08:13:29Z

Thanks, I did not notice that this is an implicit optimization. So is it independent of hyperparameters? If yes, this and the PR #77 would be unnecessary. I will give it a try and see if the results change! Sorry to hear about your illness. Hope you get better soon.

simsurace · 2022-04-01T08:16:12Z

If it works out with the ignore statements, I could convert the PR into a documentation thing where this is explained.

theogf · 2022-04-01T08:37:19Z

Thanks, I did not notice that this is an implicit optimization. So is it independent of hyperparameters? If yes, this and the PR #77 would be unnecessary. I will give it a try and see if the results change! Sorry to hear about your illness. Hope you get better soon.

That's an interesting question actually it depends on the parametrization. Right now I am parametrizing with m and S, mean and covariance. But one could parameterize the covariance as (K^{-1} + D)^{-1} and similarly for the mean, there one could optimize the hyperparameters as well but that's a more complicated matter.

In summary, for full GPs, the kernel parameters only matters for the KL(q(f)||p(f)) and for sparse GPs they also are influenced in the expected log likelihood, but that's it.

simsurace · 2022-04-01T09:52:28Z

Just to clarify my understanding:
The qΩ = aux_posterior(lik, y, qf) should be ignored by the AD system, even though lik and qf depend on the parameters such as likelihood parameters, inducing point locations and variational parameters that one wants to optimize over?

theogf · 2022-04-01T09:55:55Z

Oh yeah sorry, somehow I got confused with the updates on q(f).
But it's the same thing. qΩ is optimized via aux_posterior and once this is obtained we can compute the ELBO and optimize the rest of the other hyper-parameters

simsurace · 2022-04-01T10:16:00Z

EDIT: Ah I think I now understood, one should not expose the variational parameters to the optimizer, but have an internal CAVI loop for them.

Still struggling to make it work though. Do you have a working example for hyperparameter optimization of the augmented ELBO?

No hurry though. This is not very urgent, but it would be nice to make this work and compare it to the normal SVGP optimization loop for speed.

simsurace mentioned this issue Mar 31, 2022

Allow use of ForwardDiff and ReverseDiff to calculate gradient of augmented ELBO #77

Closed

simsurace changed the title ~~aux_posterior is not AD-ready~~ How to do hyperparameter tuning May 19, 2022

simsurace mentioned this issue May 19, 2022

Add example for hyperparameter tuning #89

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to do hyperparameter tuning #76

How to do hyperparameter tuning #76

simsurace commented Mar 31, 2022 •

edited

Loading

theogf commented Apr 1, 2022

simsurace commented Apr 1, 2022

simsurace commented Apr 1, 2022

theogf commented Apr 1, 2022

simsurace commented Apr 1, 2022

theogf commented Apr 1, 2022

simsurace commented Apr 1, 2022 •

edited

Loading

How to do hyperparameter tuning #76

How to do hyperparameter tuning #76

Comments

simsurace commented Mar 31, 2022 • edited Loading

theogf commented Apr 1, 2022

simsurace commented Apr 1, 2022

simsurace commented Apr 1, 2022

theogf commented Apr 1, 2022

simsurace commented Apr 1, 2022

theogf commented Apr 1, 2022

simsurace commented Apr 1, 2022 • edited Loading

simsurace commented Mar 31, 2022 •

edited

Loading

simsurace commented Apr 1, 2022 •

edited

Loading