Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Affine distributions #1442

Draft
wants to merge 40 commits into
base: master
Choose a base branch
from

Conversation

ParadaCarleton
Copy link
Contributor

No description provided.

.gitignore Outdated Show resolved Hide resolved
src/Distributions.jl Outdated Show resolved Hide resolved
src/Distributions.jl Show resolved Hide resolved
src/Distributions.jl Outdated Show resolved Hide resolved
src/matrix/lkj.jl Outdated Show resolved Hide resolved
src/univariate/affine.jl Outdated Show resolved Hide resolved
src/univariate/affine.jl Outdated Show resolved Hide resolved
src/univariate/affine.jl Outdated Show resolved Hide resolved
src/univariate/affine.jl Outdated Show resolved Hide resolved
src/univariate/affine.jl Outdated Show resolved Hide resolved
@devmotion
Copy link
Member

Some more high-level comments:

  • I think we should not introduce any additional API. and hence in particular not add affine: We should continue to use * and + (and / and -) to construct affine transformations of random variables/distributions.
  • Maybe we should use the opportunity and not just rename LocationScale and remove the check for the scale but actually change the design and allow general array-like variates, i.e., make it a subtype of Distribution{ArrayLikeVariate{N},<:ValueSupport} and allow arrays as location and scale.
  • I wonder if it would be useful to split AffineDistribution into a ShiftedDistribution and a ScaledDistribution

@ParadaCarleton
Copy link
Contributor Author

I agree on the array-like variates bit and actually started work on that yesterday.

I agree we should encourage and use *, +, /, and - over affine, but I think it makes sense to include affine for ease of implementation. This way anyone overloading affine only has to add 1 method instead of 4.

I'm not sure what benefits a split would give, and it might make type-checking and conversions more of a chore since Scaled{Shifted} would be different from Shifted{Scaled}. In particular we'd have to add more methods to collapse Affine down when performing repeated affine transformations.

@devmotion
Copy link
Member

I agree we should encourage and use *, +, /, and - over affine, but I think it makes sense to include affine for ease of implementation. This way anyone overloading affine only has to add 1 method instead of 4.

There are default fallbacks for / and -, so generally one just has to define x + distribution and x * distribution. I think this is simple enough for developers and not more difficult than defining affine. Additionally, having both affine and +, ... seems confusing - if possible, we should not provide multiple ways but one recommended one for constructing affine transformations.

@ParadaCarleton
Copy link
Contributor Author

ParadaCarleton commented Nov 29, 2021

Unrelated to the previous comments, we might want some field name other than scale, which seems to be a bit too easy to confuse with the scale function. Greek letters work in the worst-case scenario but are annoying to type in the REPL.

@ParadaCarleton
Copy link
Contributor Author

ParadaCarleton commented Dec 1, 2021

@devmotion it turns out that this is already implemented in MeasureTheory.jl, and quite nicely too. Chad offered to move it to MeasuresBase, since you mentioned you'd be interested in adding MeasuresBase as a dependency. Should we maybe just add MeasuresBase as a dependency and reexport this feature?

@devmotion
Copy link
Member

No, this is not sufficient. We want to create and work with Distributions, * etc. should not suddenly lift Distributions to Measures.

@mschauer
Copy link
Member

mschauer commented Dec 1, 2021

If we go with + and -, the docstring should mention ⊕ from #1445

@devmotion
Copy link
Member

We have to support +, * etc. anyway if we want to avoid a breaking change since it is the current API.

To me it seems an additional function would not have any advantage but only extend the API, which is possibly confusing for users, developers, and an additional source for breaking changes in the future. The main argument above for it, it seems, was that it would be easier to implement. Actually, I think this is not correct:

  • As the PR for Normal etc. shows, one just has to implement two functions for univariate distributions currently
  • The more modular design makes it much easier to add specialized implementations for specific types: With affine one would have to handle all combinations of the types for scale and shift whereas otherwise one can implement a method for eg a specific type of scales without worrying about types of shifts.

@ParadaCarleton
Copy link
Contributor Author

ParadaCarleton commented Dec 1, 2021

Got it; I'll rewrite this to handle + and * directly.

@ParadaCarleton
Copy link
Contributor Author

ParadaCarleton commented Dec 1, 2021

Should we memoize inv(σ) since there's a good chance it'll be frequently used?

src/matrix/lkj.jl Outdated Show resolved Hide resolved
@devmotion
Copy link
Member

Should we memoize inv(σ) since there's a good chance it'll be frequently used?

No, I don't think it's worth it. In my experience it just causes problems, eg with AD and undesired promotions (e.g. if sigma is an Int, the cached inverse would be a Float64 and cause promotions in computations with Float32 that don't occur if we just divide by sigma).

@ParadaCarleton
Copy link
Contributor Author

ParadaCarleton commented Dec 1, 2021

Should we memoize inv(σ) since there's a good chance it'll be frequently used?

No, I don't think it's worth it. In my experience it just causes problems, eg with AD and undesired promotions (e.g. if sigma is an Int, the cached inverse would be a Float64 and cause promotions in computations with Float32 that don't occur if we just divide by sigma).

Hmm; could we memoize in cases where base_dist is continuous, since everything has to be converted to eltype(base_dist) (after promoting with the scale+shift params) anyways?

@devmotion
Copy link
Member

No, eltype should (and will) not match the parameters in general. If one wants to e.g. evaluate just the log density it is irrelevant what the eltype is - one should be able to provide e.g. a point in Float32 and obtain a result that is not promoted to Float64.

@devmotion
Copy link
Member

I'm very certain memoization just causes problems here and is not worth it. As an example in a different package, similar issues occurred with memoization of inverses in PDMats and hence in the latest release it was removed.

@devmotion
Copy link
Member

I think some things should be designed a bit differently. E.g., it is completely fine to translate Distribution{<:ArrayLikeVariate} of arbitrary dimension but a scale of Diagonal(Ones(...)) or I only makes sense for MultivariateDistributions and MatrixDistributions. Therefore I think it would be better to split AffineDistribution in two different distributions, one that shifts and one that applies linear transformations.

Maybe we could use the following design, which seems to address also your concerns about complicated dispatches and constructions of affine transformations above:

  • Add a ShiftedDistribution/TranslatedDistribution that is the fallback for x::Real + dist::UnivariateDistribution and x::AbstractArray{<:Real,N} + dist::Distribution{ArrayLikeVariate{N}}, i.e., the distribution of x + Z where the law of Z is dist
  • Add a ScaledDistribution (not sure about the name) that is the fallback for x::Real * dist::Distribution{<:ArrayLikeVariate} and x::AbstractArray * dist::Distribution{<:ArrayLikeVariate} (for x of the correct last dimension), i.e., the distribution of x * Z where the law of Z is dist
  • Add an AffineDistribution that is the fallback for x::Real * dist::ShiftedDistribution, x::AbstractArray * dist::ShiftedDistribution, x::Real + dist::ScaledDistribution{Univariate}, and x::AbstractArray{<:Real,N} + dist::ScaledDistribution{ArrayLikeVariate{N}}

Then we could support all transformations of the form a + Z, b * Z, a + b * Z where a and b are of the correct sizes and we don't have to deal with neutral elements of addition and multiplication as in this PR. We could still dispatch on the fallback for affine transformations explicitly and there would be no problems with ShiftedDistribution of ScaledDistribution versus ScaledDistribution of ShiftedDistribution.

Copy link
Member

@devmotion devmotion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An additional suggestion: Just copy the existing methods for LocationScale for univariate AffineDistributions. This makes it also less likely that we accidentally introduce bugs.

src/univariate/affine.jl Outdated Show resolved Hide resolved
src/univariate/affine.jl Outdated Show resolved Hide resolved
Comment on lines 88 to 95
if iszero(σ)
throw(ArgumentError("scale must be non-zero"))
elseif size(σ)[end] ≠ size(ρ)[1]
throw(DimensionMismatch(
"scaling matrix and distribution have incompatible dimensions" *
"$(size(σ)) and $(size(ρ)))"
))
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not possible to check iszero here, I assume? The second check only makes sense if the type is defined more generally and allows to increase or decrease the number of dimensions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right about the second check. The first check is possible -- it returns true if σ is the zero matrix.

src/univariate/affine.jl Outdated Show resolved Hide resolved
src/univariate/affine.jl Outdated Show resolved Hide resolved
src/univariate/affine.jl Outdated Show resolved Hide resolved
mean(d::AffineDistribution) = d.μ + d.σ * mean(d.ρ)
median(d::AffineDistribution) = d.μ + d.σ * median(d.ρ)
mode(d::AffineDistribution) = d.μ + d.σ * mode(d.ρ)
modes(d::AffineDistribution) = d.μ .+ d.σ .* modes(d.ρ)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong e.g. for MultivariateDistributions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see how median might be wrong (since that's not uniquely defined in multiple dimensions), although that should just throw an error when it tries to call median on a multivariate distribution. But what's wrong with the rest?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The broadcast operations are not compatible, e.g.,

julia> rand(4) .+ rand(4, 3) .* [rand(3) for _ in 1:5]
ERROR: DimensionMismatch("arrays could not be broadcast to a common size; got a dimension with lengths 4 and 5")
Stacktrace:
  [1] _bcs1
    @ ./broadcast.jl:516 [inlined]
  [2] _bcs
    @ ./broadcast.jl:510 [inlined]
  [3] broadcast_shape
    @ ./broadcast.jl:504 [inlined]
  [4] combine_axes
    @ ./broadcast.jl:499 [inlined]
  [5] _axes
    @ ./broadcast.jl:224 [inlined]
  [6] axes
    @ ./broadcast.jl:222 [inlined]
  [7] combine_axes
    @ ./broadcast.jl:499 [inlined]
  [8] instantiate
    @ ./broadcast.jl:281 [inlined]
  [9] materialize(bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2}, Nothing, typeof(+), Tuple{Vector{Float64}, Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2}, Nothing, typeof(*), Tuple{Matrix{Float64}, Vector{Vector{Float64}}}}}})
    @ Base.Broadcast ./broadcast.jl:860
 [10] top-level scope
    @ REPL[1]:1

julia> rand(3) .+ rand() .* [rand(3) for _ in 1:5]
ERROR: DimensionMismatch("arrays could not be broadcast to a common size; got a dimension with lengths 3 and 5")
Stacktrace:
 [1] _bcs1
   @ ./broadcast.jl:516 [inlined]
 [2] _bcs
   @ ./broadcast.jl:510 [inlined]
 [3] broadcast_shape
   @ ./broadcast.jl:504 [inlined]
 [4] combine_axes
   @ ./broadcast.jl:499 [inlined]
 [5] instantiate
   @ ./broadcast.jl:281 [inlined]
 [6] materialize(bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(+), Tuple{Vector{Float64}, Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(*), Tuple{Float64, Vector{Vector{Float64}}}}}})
   @ Base.Broadcast ./broadcast.jl:860
 [7] top-level scope
   @ REPL[2]:1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm; would map work?

skewness(d::UnivariateAffine) = Base.sign(d) * skewness(d.ρ)
kurtosis(d::UnivariateAffine) = kurtosis(d.ρ)

cov(d::AffineDistribution) = d.σ * cov(d.ρ) * d.σ'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should be defined only for multivariate distributions, and we should have different implementations based on the type of the scale to reduce allocations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do that, but do we want to deviate from Base's behavior (where cov returns the variance for univariate outcomes)?

return d.μ + d.σ * quantile(d.ρ, q)
end

rand(rng::AbstractRNG, d::AffineDistribution) = d.μ + d.σ * rand(rng, d.ρ)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For higher-dimensional distributions we should define _rand!.


rand(rng::AbstractRNG, d::AffineDistribution) = d.μ + d.σ * rand(rng, d.ρ)

function gradlogpdf(d::ContinuousAffine, x::Real)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should restrict it to univariate distributions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it is only defined for univariate distributions. In fact it's completely undocumented but I learnt just that it is also defined for MvNormal and MvTDist. So maybe it's fine to keep it. Then the bigger issue is that the implementation is incorrect.

Copy link
Contributor Author

@ParadaCarleton ParadaCarleton Dec 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How so? I'm not sure I see the mistake.

@ParadaCarleton
Copy link
Contributor Author

All tests now pass, except for support when called on normal distributions that have ForwardDiff.Dual numbers as arguments. I'm going to be completely honest and say I'm not sure what support is supposed to do there.

@devmotion
Copy link
Member

I hope you don't take this personal but I don't think we can merge the PR in its current state. In my opinion it is difficult to review and, in particular, it is difficult to assess if there are any breaking changes. Distributions is a central package and hence we have to be careful about not breaking downstream packages (since there are so many). To me it still seems there are too many changes and too general definitions, even for the univariate case currently handled by LocationScale (indicated by the test failures as well).

As a way forward, I think we should break up this PR (or maybe start from scratch) and

  1. make an initial PR that only replaces LocationScale with AffineDistribution (with less restrictive type parameters), adds a deprecation warning for LocationScale, and adds type aliases for LocationScale - but without any further changes or generalizations e.g. to multivariate distributions: Tests should still pass and it would be easy to review if the files are not renamed;
  2. make a separate PR that allows and tests negative scales for univariate distributions;
  3. make a follow-up PR that adds implementations for multi- and possibly even more general arrayvariate distributions (but without changes of the univariate implementations);
  4. make additional PRs that introduce TranslatedDistribution and ScaledDistribution to simplify and enable the handling of + and *, without changing the implementations for AffineDistribution but only changing dispatches of * and + and combining ScaledDistribution and TranslatedDistribution to AffineDistributions.

@ParadaCarleton
Copy link
Contributor Author

ParadaCarleton commented Dec 8, 2021

I hope you don't take this personal but I don't think we can merge the PR in its current state. In my opinion it is difficult to review and, in particular, it is difficult to assess if there are any breaking changes. Distributions is a central package and hence we have to be careful about not breaking downstream packages (since there are so many). To me it still seems there are too many changes and too general definitions, even for the univariate case currently handled by LocationScale (indicated by the test failures as well).

As a way forward, I think we should break up this PR (or maybe start from scratch) and

  1. make an initial PR that only replaces LocationScale with AffineDistribution (with less restrictive type parameters), adds a deprecation warning for LocationScale, and adds type aliases for LocationScale - but without any further changes or generalizations e.g. to multivariate distributions: Tests should still pass and it would be easy to review if the files are not renamed;
  2. make a separate PR that allows and tests negative scales for univariate distributions;
  3. make a follow-up PR that adds implementations for multi- and possibly even more general arrayvariate distributions (but without changes of the univariate implementations);
  4. make additional PRs that introduce TranslatedDistribution and ScaledDistribution to simplify and enable the handling of + and *, without changing the implementations for AffineDistribution but only changing dispatches of * and + and combining ScaledDistribution and TranslatedDistribution to AffineDistributions.

If you'd like, I can break this into two PRs. The first can add a UnivariateAffine which is just LocationScale with a possibly negative scale; this would replace LocationScale. A separate PR can add a MultivariateAffine distribution.

@ParadaCarleton
Copy link
Contributor Author

ParadaCarleton commented Dec 8, 2021

I also would like to point out there have been essentially no changes to the behavior, of LocationScale, with the exception of:

  1. Bug fixes for the location and scale methods, which are currently returning incorrect answers. (They always return locationscale.μ and locationscale.σ, regardless of the location and scale of the underlying distribution. For instance, LocationScale(0, 1, Normal(2, 4)) currently returns (0, 1).)
  2. The implementations of new constructors and deprecation of the old LocationScale()-style constructor, as you requested.

All bugs were related to dealing with negative transformations of discrete distributions, where I forgot to account for the probability that x is exactly equal to X. There may be bugs regarding multivariate distributions that I haven't caught yet, since I was still writing the tests, but this PR should not contain any breaking changes.

If you'd like to rewrite the code for multivariate affine transformations from scratch, work out a new interface to replace the old LocationScale interface from scratch, etc. that's completely reasonable. For now, I'll gladly make a separate PR implementing the code I had ready a week and a half ago. This code made no changes except removing the positivity checks for LocationScale and throwing up a couple absolute value bars, before I'd rewritten all the constructors from scratch, deprecated or rewritten all of the old constructors to favor your own syntax using +, -, etc. instead of my original Affine which was a simple reskin of LocationScale, and implemented several features beyond the original scope of this PR at your request.

@devmotion
Copy link
Member

If you'd like, I can break this into two PRs. The first can add a UnivariateAffine which is just LocationScale with a possibly negative scale; this would replace LocationScale. A separate PR can add a MultivariateAffine distribution.

Just to be sure there's no misunderstanding: We should not have separate distributions but at most type aliases (maybe they re not needed though). As mentioned above, it would be preferable to allow negative scales only in a second PR to keep changes in the different PRs minimal and make them easier to review.

I also would like to point out there have been essentially no changes to the behavior, of LocationScale, with the exception of:

Unfortunately, while the behaviour might not have changed (I don't know) the implementation has to a larger extent than what seemed necessary. During the different rounds of preliminary review I noticed and commented on methods that were modified slightly or generalized (sometimes incorrectly). Due to the renaming and other unrelated changes it is not possible for me to check with certainty if changes are needed, are bug fixes, are necessary or unnecessary generalizations, or are just style changes.

@ParadaCarleton
Copy link
Contributor Author

ParadaCarleton commented Dec 8, 2021

Ok; if you'd like to implement renaming and all the different changes in a different PR, go ahead, then. I'll make a single PR that only changes LocationScale to allow negative scale values, or else (if you don't like this idea) I'll add reflections directly to a different package instead of spending more and more time making changes completely unrelated to this feature at your request.

@devmotion
Copy link
Member

Sure, it's seemingly simple changes and of course it could be done in fewer PRs - but Distributions has > 300 direct dependents and > 700 indirect ones and hence I won't merge any PRs, in particular no refactorings, for which I feel I can't review them properly and guarantee with reasonable confidence that they don't break anything. Therefore I suggest breaking it up into small PRs that are easy to review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants