Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adaptive Metropolis algorithm #57

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

kaandocal
Copy link
Collaborator

I've implemented a simple version of the standard AM algorithm as a small extension to #39 (this PR is independent of #39). The pull request features a multivariate Gaussian proposal which uses a rescaled version of the posterior covariance (Haario et al.'s famous 2.38 rule) + a small epsilon term for positive-definiteness and initial exploration. In order to update the proposal I had to create a new sampler class, AMSampler, which is basically MetropolisHastings but updates the proposal at each sampling step via the trackstep function. The default implementation of trackstep does nothing.

The update mechanism is slightly different from that in #39: this one performs adaptation in AbstractMCMC.step whereas #39 does it in propose. I think the new AMSampler class should be able to handle both, and since it consists of minimal modifications to the original MetropolisHastings it could provide a general interface for adaptive MH algorithms.

Notes:

  • Proposals are shared between threads - how can this be changed?
  • I haven't implemented all overloads of propose for AMSampler but this should not be an issue.
  • This PR would benefit from Return accept/reject stat #38 to make the trackstep interface more uniform.
  • The test example is minimal, I would welcome any suggestions!

@devmotion
Copy link
Member

Is it possible to just add a new proposal and extend the existing sampler, if needed, but not add a new sampler? I think this would simplify the code and also be easier if one wants to combine different proposals.

I am also wondering if trackstep is needed at all - maybe one could just update the proposal right before proposing a new sample in the propose step?

Comment on lines 66 to 73
mutable struct AMProposal <: Proposal{MvNormal}
epsilon::Symmetric
scalefactor::Float64
proposal::MvNormal
samplemean::AbstractArray
samplesqmean::AbstractMatrix
N::Int
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to add type parameters such that the fields are concretely typed: https://docs.julialang.org/en/v1/manual/performance-tips/index.html#Avoid-fields-with-abstract-type

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fixed now.

N::Int
end

function AMProposal(epsilon::AbstractMatrix, scalefactor=2.38^2 / size(epsilon, 1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the motivation for this specific scale factor? Is there a reference for it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kaandocal
Copy link
Collaborator Author

Is it possible to just add a new proposal and extend the existing sampler, if needed, but not add a new sampler? I think this would simplify the code and also be easier if one wants to combine different proposals.

I didn't want to modify existing code in order to not break something but the changes could be trivially implemented in MetropolisHastings! I can move things there if that'd be better.

I am also wondering if trackstep is needed at all - maybe one could just update the proposal right before proposing a new sample in the propose step?

I wanted to leave users the option of querying the sampler freely for new proposals without changing the state, hence why I thought updating the proposal in step would be a good idea. Furthermore the accept/reject decision is taken in step, so the proposed samples might not end up in the chain - the AM algorithm only uses samples in the chain to do its adaptation.

@devmotion
Copy link
Member

Furthermore the accept/reject decision is taken in step, so the proposed samples might not end up in the chain - the AM algorithm only uses samples in the chain to do its adaptation.

My suggestion was not to update the proposal based on the proposed sample but on the previous sample - this should always be the accepted sample. And since the previous sample is available in the propose step, I thought this could be sufficient.

@kaandocal
Copy link
Collaborator Author

I see, so like the approach in #39 ! I personally think having a separate callback interface for adaptation would be cleaner, and being able to track acceptances/rejections might be useful for some algorithms. And compared to the accepted! callback in #39 this should be a bit more consistent/flexible.

@devmotion
Copy link
Member

I'm not strictly against adding such a function but there is one major problem: transitions contain all parameters but only for a subset of them (e.g. just one parameter) one might want to use the adaptive proposal. Thus in general it is not correct to call trackstep with the AMProposal and a Transition, it should be called only with the corresponding sample.

One way to achieve this is to handle it in propose. Another one would be to iterate through samples and proposals.

In particular due to these mixed cases, I think it would be better to only add a new proposal but not a new sampler.

@devmotion
Copy link
Member

Another advantage of handling it in the propose step is that you don't have to construct the MvNormal proposal, which probably is not very efficient.

@devmotion
Copy link
Member

Slightly unrelated thought:

being able to track acceptances/rejections might be useful

I think probably one would not want to keep track of these statistics in a separate trackstep function - otherwise every proposal would have to reimplement the tracking functionality. Probably it would be better to hardcode it in the Transition object and the implementation of step, and then possibly forward it to the proposals if needed.

# When the proposal is initialised the empirical posterior covariance is zero
function trackstep(proposal::AMProposal, trans::Transition)
proposal.samplemean .= trans.params
proposal.samplesqmean .= trans.params * trans.params'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still computes trans.params * trans.params' first and then copies it to proposal.samplesqmean. You can avoid this by writing

Suggested change
proposal.samplesqmean .= trans.params * trans.params'
mul!(proposal.samplesqmean, trans.params, trans.params')


# Recompute the empirical posterior covariance matrix
function trackstep(proposal::AMProposal, trans::Transition, ::Union{Val{true}, Val{false}})
proposal.samplemean .= (proposal.samplemean .* proposal.N .+ trans.params) ./ (proposal.N + 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC it is numerically more stable to write

Suggested change
proposal.samplemean .= (proposal.samplemean .* proposal.N .+ trans.params) ./ (proposal.N + 1)
proposal.N += 1
proposal.samplemean .+= (trans.params .- proposal.samplemean) ./ proposal.N

Comment on lines 97 to 98
proposal.samplesqmean .= (proposal.samplesqmean .* proposal.N + trans.params * trans.params') ./
(proposal.N + 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would fuse all operations (might require Compat.jl on older Julia versions) but is not as stable as the version for the mean above:

Suggested change
proposal.samplesqmean .= (proposal.samplesqmean .* proposal.N + trans.params * trans.params') ./
(proposal.N + 1)
mul!(proposal.samplesqmean, trans.params, trans.params', true, N)
ldiv!(N + 1, proposal.samplesqmean)

proposal.samplesqmean .= (proposal.samplesqmean .* proposal.N + trans.params * trans.params') ./
(proposal.N + 1)

samplecov = proposal.samplesqmean .- proposal.samplemean * proposal.samplemean'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One should be aware that this is the "naive" algorithm for computing the variance which can lead to catastrophic cancellation (see e.g. https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Na%C3%AFve_algorithm). Maybe it would be better to use Welford's algorithm (https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Welford's_online_algorithm)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will have a look at this later. In combination with creating a new MvNormal distribution every round, which is not very efficient, I think we could make this whole step more efficient by using the rank-one Cholesky update algorithm, although I am not sure how best to do this given that MvNormals are immutable

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the Welford implementation in AdvancedHMC is pretty good. I think it's an adaptation of the Stan version.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added!

@kaandocal
Copy link
Collaborator Author

Slightly unrelated thought:

being able to track acceptances/rejections might be useful

I think probably one would not want to keep track of these statistics in a separate trackstep function - otherwise every proposal would have to reimplement the tracking functionality. Probably it would be better to hardcode it in the Transition object and the implementation of step, and then possibly forward it to the proposals if needed.

I agree, which is why I think implementing #38 would be useful! It shouldn't be a big change but probably belongs in another pull request

@kaandocal
Copy link
Collaborator Author

kaandocal commented May 26, 2021

I'm not strictly against adding such a function but there is one major problem: transitions contain all parameters but only for a subset of them (e.g. just one parameter) one might want to use the adaptive proposal. Thus in general it is not correct to call trackstep with the AMProposal and a Transition, it should be called only with the corresponding sample.

One way to achieve this is to handle it in propose. Another one would be to iterate through samples and proposals.

I have just merged AMSampler into MetropolisHastings. I haven't seen multiple independent adaptive proposals in the wild, but this implementation just calls trackstep for every proposal in case of AbstractArrays or NamedTuples. Note that acceptances/rejections are not determined by a single proposal, but by a combination of all of them so having more than one adaptive proposal might not work very well.

Copy link
Member

@cpfiffer cpfiffer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome stuff -- I'm happy with this for the moment but will defer to David's judgement if he has further suggestions.

src/adaptivemetropolis.jl Outdated Show resolved Hide resolved
proposal.samplesqmean .= (proposal.samplesqmean .* proposal.N + trans.params * trans.params') ./
(proposal.N + 1)

samplecov = proposal.samplesqmean .- proposal.samplemean * proposal.samplemean'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the Welford implementation in AdvancedHMC is pretty good. I think it's an adaptation of the Stan version.

src/AdvancedMH.jl Outdated Show resolved Hide resolved
@kaandocal
Copy link
Collaborator Author

Any way we could reuse the Welford implementation from AdvancedHMC?

@kaandocal
Copy link
Collaborator Author

By the way, it would be easy to combine #39 with this pull request, but we should maybe fix nomenclature for the methods as the names almost overlap - any suggestions?

@devmotion
Copy link
Member

Any way we could reuse the Welford implementation from AdvancedHMC?

Only if it's moved to a separate package I assume.

@kaandocal
Copy link
Collaborator Author

kaandocal commented May 26, 2021

By the way - and this is embarassing - #39 already implemented pretty much the same adaptive algorithm. I didn't realise this until now, so this PR is a bit of a duplicate really. Apologies to @arzwa.

@kaandocal
Copy link
Collaborator Author

There's a subpackage called Adaptation in AdvancedHMC - for now we could probably borrow code from there (this should be permitted by the BSD-3 and MIT Licences). So we could merge that computation of the sample covariance matrix with this and #39.

One idea would be basically copying the interface in AdvancedHMC, ie. having an abstract Adaptor struct with initialize!(adaptor, proposal, initial_sample, model) and update!(adaptor, proposal, sample_prev, sample, model, accepted). That way we could avoid creating new Proposal structs and cleanly separate the adaptation logic.

@yebai
Copy link
Member

yebai commented May 26, 2021

How about moving the AdvancedHMC mass matrix adaption code into AbstractMCMC.Adaption submodule?

@kaandocal
Copy link
Collaborator Author

How about moving the AdvancedHMC mass matrix adaption code into AbstractMCMC.Adaption submodule?

It might make sense to define an abstract Adaptation interface in AbstractMCMC. As far as I know AdvancedHMC does not use AbstractMCMC though.

@yebai
Copy link
Member

yebai commented May 26, 2021

There is a PR in progress - see TuringLang/AdvancedHMC.jl#259

@arzwa
Copy link

arzwa commented May 26, 2021

By the way - and this is embarassing - #39 already implemented pretty much the same adaptive algorithm. I didn't realise this until now, so this PR is a bit of a duplicate really. Apologies to @arzwa.

No problem at all, that PR got stalled due to myself not really finding the time to finish it off decently (it worked for what I needed, and I got sidetracked).

@kaandocal
Copy link
Collaborator Author

kaandocal commented May 26, 2021

I created a new branch adaptive (available on my GitHub) which implements a more generic interface (I don't want to spam PRs).

Main changes:

  • MetropolisHastings has a second field named adaptor
  • AbstractAdaptor{ProposalType} as an abstract type for adaptors with methods initialize! and update!
  • NoAdaptor{PT} <: AbstractAdaptor{PT} as a default no-op adaptor
  • Adaptors can be combined for arrays/named tuples of proposals (yielding an array/named tuple of adaptors)
  • AMAdaptor for symmetric MvNormal random walk proposals

Not sure if the type interface is the cleanest, I would appreciate any feedback!

Note: I'm not sure if we really want to support multiple adaptives for arrays/tuples of proposals, this feature could be removed...

@devmotion
Copy link
Member

Intuitively I would have thought adaptor should be part of an AdaptiveProposal but not a general property of MetropolisHastings. That seems a bit cleaner but maybe you noticed some advantages if it is part of the sampler?

@kaandocal
Copy link
Collaborator Author

Not really to be honest! I tried to see if it would be cleaner but at least my new "solution" is more complicated than what we have already. Unless you or @cpfiffer see any advantages in separating out the adaptor logic (I don't really) we can just keep it as it is.

@cpfiffer
Copy link
Member

Unless you or @cpfiffer see any advantages in separating out the adaptor logic (I don't really) we can just keep it as it is.

For right now, I think it's fine to just go as-is. We can add in more general adaptation interfaces later on.

@kaandocal
Copy link
Collaborator Author

I have uploaded a new version that uses Welford's algorithm. The code now more or less covers the AdaptiveMvNormal features in #39 . The current adaptation interface is via trackstep! (added a bang).

@cpfiffer
Copy link
Member

Cool. I like it. I've turned on the test suite so we can see how the tests run.

@kaandocal
Copy link
Collaborator Author

kaandocal commented May 27, 2021

Cool. I like it. I've turned on the test suite so we can see how the tests run.

There was an error due to my using a different version of LinearAlgebra.jl, the AdaptiveMH test should now run correctly (hopefully)

@cpfiffer
Copy link
Member

I'll approve for now. Could you add a little bit of usage info to the README file?

@@ -5,6 +5,8 @@ version = "0.6.0"
[deps]
AbstractMCMC = "80f14c24-f653-4e6a-9b94-39d6b0f70001"
Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
PDMats = "90014a1f-27ba-587c-ab20-58faa44d9150"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a [compat] entry for PDMats?

src/adaptivemetropolis.jl Outdated Show resolved Hide resolved
src/adaptivemetropolis.jl Outdated Show resolved Hide resolved
src/mh-core.jl Outdated
# Called to update proposal when the first sample is drawn
trackstep!(proposal::Proposal, params) = nothing
trackstep!(proposal::AbstractArray, params) = foreach(trackstep!, proposal, params)
trackstep!(proposal::NamedTuple, params) = foreach(trackstep!, proposal, params)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameters in the NamedTuple params are not necessarily in the same order as the proposal.

src/mh-core.jl Outdated
Comment on lines 151 to 158
trackstep!(proposal::Proposal, params,
::Union{Val{true}, Val{false}}) = nothing

trackstep!(proposal::AbstractArray, params, accept::Union{Val{true},Val{false}}) =
foreach((prop, par) -> trackstep!(prop, par, accept), proposal, params)

trackstep!(proposal::NamedTuple, params, accept::Union{Val{true},Val{false}}) =
foreach((prop, par) -> trackstep!(prop, par, accept), proposal, params)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the type annotations on accept? They are not needed.

Suggested change
trackstep!(proposal::Proposal, params,
::Union{Val{true}, Val{false}}) = nothing
trackstep!(proposal::AbstractArray, params, accept::Union{Val{true},Val{false}}) =
foreach((prop, par) -> trackstep!(prop, par, accept), proposal, params)
trackstep!(proposal::NamedTuple, params, accept::Union{Val{true},Val{false}}) =
foreach((prop, par) -> trackstep!(prop, par, accept), proposal, params)
trackstep!(proposal::Proposal, params, accept) = nothing
function trackstep!(proposal::AbstractArray, params, accept)
return foreach((prop, par) -> trackstep!(prop, par, accept), proposal, params)
end
function trackstep!(proposal::NamedTuple, params, accept)
return foreach((prop, par) -> trackstep!(prop, par, accept), proposal, params)
end

Additionally, again the NamedTuple implementation is not completely correct.

src/mh-core.jl Outdated
@@ -157,6 +173,7 @@ function AbstractMCMC.step(
transition = AdvancedMH.transition(spl, model, init_params)
end

trackstep!(spl.proposal, transition.params)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The step implementations are written for generic AbstractTransitions and MHSampler, so neither spl.proposal nor transition.params might exist.

test/runtests.jl Outdated Show resolved Hide resolved
Co-authored-by: Cameron Pfiffer <[email protected]>
Co-authored-by: David Widmann <[email protected]>
@kaandocal
Copy link
Collaborator Author

@arzwa We should also incorporate your univariate adaptive proposal. Any suggestions for a name? We can also change AMProposal.

@ParadaCarleton
Copy link
Member

@kaandocal Are you still interested in getting this PR merged?

[Diff since v0.6.7](TuringLang/AdvancedMH.jl@v0.6.7...v0.6.8)

**Merged pull requests:**
- Remove type restriction of log density function (TuringLang#68) (@devmotion)
@kaandocal
Copy link
Collaborator Author

We should probably get this PR out of the way. I did not pursue this much further because I had trouble getting it to work on my sampling problems - this approach needs a lot of samples to get a decent estimate of the covariance matrix and seems quite fickle in practice. Due to the nature of MH it just takes a while to explore the target distribution even if you have a decent covariance matrix... AdvancedHMC does a fantastic job at adaptation in my experience, and this is nowhere near that. While it is quite different @ParadaCarleton I do recommend having a look at Emcee in the meanwhile.

@ParadaCarleton
Copy link
Member

We should probably get this PR out of the way. I did not pursue this much further because I had trouble getting it to work on my sampling problems - this approach needs a lot of samples to get a decent estimate of the covariance matrix and seems quite fickle in practice. Due to the nature of MH it just takes a while to explore the target distribution even if you have a decent covariance matrix... AdvancedHMC does a fantastic job at adaptation in my experience, and this is nowhere near that. While it is quite different @ParadaCarleton I do recommend having a look at Emcee in the meanwhile.

Yep, I totally get that.

The reason an AMH sampler would be useful is less to do with me wanting to use AMH itself, and more to do with AMH being a useful building block in other algorithms. I think @fipelle mentioned an interest in this earlier?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants