Add Affine distributions #1442

ParadaCarleton · 2021-11-28T16:32:30Z

No description provided.

.gitignore

src/Distributions.jl

src/matrix/lkj.jl

src/univariate/affine.jl

Co-authored-by: David Widmann <[email protected]>

devmotion · 2021-11-28T20:10:34Z

Some more high-level comments:

I think we should not introduce any additional API. and hence in particular not add affine: We should continue to use * and + (and / and -) to construct affine transformations of random variables/distributions.
Maybe we should use the opportunity and not just rename LocationScale and remove the check for the scale but actually change the design and allow general array-like variates, i.e., make it a subtype of Distribution{ArrayLikeVariate{N},<:ValueSupport} and allow arrays as location and scale.
I wonder if it would be useful to split AffineDistribution into a ShiftedDistribution and a ScaledDistribution

ParadaCarleton · 2021-11-28T21:06:43Z

I agree on the array-like variates bit and actually started work on that yesterday.

I agree we should encourage and use *, +, /, and - over affine, but I think it makes sense to include affine for ease of implementation. This way anyone overloading affine only has to add 1 method instead of 4.

I'm not sure what benefits a split would give, and it might make type-checking and conversions more of a chore since Scaled{Shifted} would be different from Shifted{Scaled}. In particular we'd have to add more methods to collapse Affine down when performing repeated affine transformations.

devmotion · 2021-11-28T21:16:39Z

I agree we should encourage and use *, +, /, and - over affine, but I think it makes sense to include affine for ease of implementation. This way anyone overloading affine only has to add 1 method instead of 4.

There are default fallbacks for / and -, so generally one just has to define x + distribution and x * distribution. I think this is simple enough for developers and not more difficult than defining affine. Additionally, having both affine and +, ... seems confusing - if possible, we should not provide multiple ways but one recommended one for constructing affine transformations.

ParadaCarleton · 2021-11-29T02:40:46Z

Unrelated to the previous comments, we might want some field name other than scale, which seems to be a bit too easy to confuse with the scale function. Greek letters work in the worst-case scenario but are annoying to type in the REPL.

ParadaCarleton · 2021-12-01T01:27:04Z

@devmotion it turns out that this is already implemented in MeasureTheory.jl, and quite nicely too. Chad offered to move it to MeasuresBase, since you mentioned you'd be interested in adding MeasuresBase as a dependency. Should we maybe just add MeasuresBase as a dependency and reexport this feature?

devmotion · 2021-12-01T07:28:47Z

No, this is not sufficient. We want to create and work with Distributions, * etc. should not suddenly lift Distributions to Measures.

mschauer · 2021-12-01T12:55:03Z

If we go with + and -, the docstring should mention ⊕ from #1445

devmotion · 2021-12-01T13:08:42Z

We have to support +, * etc. anyway if we want to avoid a breaking change since it is the current API.

To me it seems an additional function would not have any advantage but only extend the API, which is possibly confusing for users, developers, and an additional source for breaking changes in the future. The main argument above for it, it seems, was that it would be easier to implement. Actually, I think this is not correct:

As the PR for Normal etc. shows, one just has to implement two functions for univariate distributions currently
The more modular design makes it much easier to add specialized implementations for specific types: With affine one would have to handle all combinations of the types for scale and shift whereas otherwise one can implement a method for eg a specific type of scales without worrying about types of shifts.

ParadaCarleton · 2021-12-01T18:01:08Z

Got it; I'll rewrite this to handle + and * directly.

ParadaCarleton · 2021-12-01T18:02:35Z

Should we memoize inv(σ) since there's a good chance it'll be frequently used?

src/matrix/lkj.jl

devmotion · 2021-12-01T18:53:51Z

Should we memoize inv(σ) since there's a good chance it'll be frequently used?

No, I don't think it's worth it. In my experience it just causes problems, eg with AD and undesired promotions (e.g. if sigma is an Int, the cached inverse would be a Float64 and cause promotions in computations with Float32 that don't occur if we just divide by sigma).

ParadaCarleton · 2021-12-01T19:37:36Z

Should we memoize inv(σ) since there's a good chance it'll be frequently used?

No, I don't think it's worth it. In my experience it just causes problems, eg with AD and undesired promotions (e.g. if sigma is an Int, the cached inverse would be a Float64 and cause promotions in computations with Float32 that don't occur if we just divide by sigma).

Hmm; could we memoize in cases where base_dist is continuous, since everything has to be converted to eltype(base_dist) (after promoting with the scale+shift params) anyways?

devmotion · 2021-12-01T19:47:24Z

No, eltype should (and will) not match the parameters in general. If one wants to e.g. evaluate just the log density it is irrelevant what the eltype is - one should be able to provide e.g. a point in Float32 and obtain a result that is not promoted to Float64.

devmotion · 2021-12-01T19:49:14Z

I'm very certain memoization just causes problems here and is not worth it. As an example in a different package, similar issues occurred with memoization of inverses in PDMats and hence in the latest release it was removed.

Co-authored-by: David Widmann <[email protected]>

devmotion · 2021-12-05T19:35:05Z

I think some things should be designed a bit differently. E.g., it is completely fine to translate Distribution{<:ArrayLikeVariate} of arbitrary dimension but a scale of Diagonal(Ones(...)) or I only makes sense for MultivariateDistributions and MatrixDistributions. Therefore I think it would be better to split AffineDistribution in two different distributions, one that shifts and one that applies linear transformations.

Maybe we could use the following design, which seems to address also your concerns about complicated dispatches and constructions of affine transformations above:

Add a ShiftedDistribution/TranslatedDistribution that is the fallback for x::Real + dist::UnivariateDistribution and x::AbstractArray{<:Real,N} + dist::Distribution{ArrayLikeVariate{N}}, i.e., the distribution of x + Z where the law of Z is dist
Add a ScaledDistribution (not sure about the name) that is the fallback for x::Real * dist::Distribution{<:ArrayLikeVariate} and x::AbstractArray * dist::Distribution{<:ArrayLikeVariate} (for x of the correct last dimension), i.e., the distribution of x * Z where the law of Z is dist
Add an AffineDistribution that is the fallback for x::Real * dist::ShiftedDistribution, x::AbstractArray * dist::ShiftedDistribution, x::Real + dist::ScaledDistribution{Univariate}, and x::AbstractArray{<:Real,N} + dist::ScaledDistribution{ArrayLikeVariate{N}}

Then we could support all transformations of the form a + Z, b * Z, a + b * Z where a and b are of the correct sizes and we don't have to deal with neutral elements of addition and multiplication as in this PR. We could still dispatch on the fallback for affine transformations explicitly and there would be no problems with ShiftedDistribution of ScaledDistribution versus ScaledDistribution of ShiftedDistribution.

devmotion

An additional suggestion: Just copy the existing methods for LocationScale for univariate AffineDistributions. This makes it also less likely that we accidentally introduce bugs.

src/univariate/affine.jl

devmotion · 2021-12-06T01:06:24Z

src/univariate/affine.jl

+    if iszero(σ)
+        throw(ArgumentError("scale must be non-zero"))
+    elseif size(σ)[end] ≠ size(ρ)[1]
+        throw(DimensionMismatch(
+            "scaling matrix and distribution have incompatible dimensions" *
+            "$(size(σ)) and $(size(ρ)))"
+        ))
+    end


It's not possible to check iszero here, I assume? The second check only makes sense if the type is defined more generally and allows to increase or decrease the number of dimensions.

You're right about the second check. The first check is possible -- it returns true if σ is the zero matrix.

src/univariate/affine.jl

devmotion · 2021-12-06T01:28:42Z

src/univariate/affine.jl

+mean(d::AffineDistribution) = d.μ + d.σ * mean(d.ρ)
+median(d::AffineDistribution) = d.μ + d.σ * median(d.ρ)
+mode(d::AffineDistribution) = d.μ + d.σ * mode(d.ρ)
+modes(d::AffineDistribution) = d.μ .+ d.σ .* modes(d.ρ)


This is wrong e.g. for MultivariateDistributions.

I see how median might be wrong (since that's not uniquely defined in multiple dimensions), although that should just throw an error when it tries to call median on a multivariate distribution. But what's wrong with the rest?

The broadcast operations are not compatible, e.g.,

julia> rand(4) .+ rand(4, 3) .* [rand(3) for _ in 1:5] ERROR: DimensionMismatch("arrays could not be broadcast to a common size; got a dimension with lengths 4 and 5") Stacktrace: [1] _bcs1 @ ./broadcast.jl:516 [inlined] [2] _bcs @ ./broadcast.jl:510 [inlined] [3] broadcast_shape @ ./broadcast.jl:504 [inlined] [4] combine_axes @ ./broadcast.jl:499 [inlined] [5] _axes @ ./broadcast.jl:224 [inlined] [6] axes @ ./broadcast.jl:222 [inlined] [7] combine_axes @ ./broadcast.jl:499 [inlined] [8] instantiate @ ./broadcast.jl:281 [inlined] [9] materialize(bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2}, Nothing, typeof(+), Tuple{Vector{Float64}, Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2}, Nothing, typeof(*), Tuple{Matrix{Float64}, Vector{Vector{Float64}}}}}}) @ Base.Broadcast ./broadcast.jl:860 [10] top-level scope @ REPL[1]:1 julia> rand(3) .+ rand() .* [rand(3) for _ in 1:5] ERROR: DimensionMismatch("arrays could not be broadcast to a common size; got a dimension with lengths 3 and 5") Stacktrace: [1] _bcs1 @ ./broadcast.jl:516 [inlined] [2] _bcs @ ./broadcast.jl:510 [inlined] [3] broadcast_shape @ ./broadcast.jl:504 [inlined] [4] combine_axes @ ./broadcast.jl:499 [inlined] [5] instantiate @ ./broadcast.jl:281 [inlined] [6] materialize(bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(+), Tuple{Vector{Float64}, Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(*), Tuple{Float64, Vector{Vector{Float64}}}}}}) @ Base.Broadcast ./broadcast.jl:860 [7] top-level scope @ REPL[2]:1

Hmm; would map work?

devmotion · 2021-12-06T01:30:35Z

src/univariate/affine.jl

+skewness(d::UnivariateAffine) = Base.sign(d) * skewness(d.ρ)
+kurtosis(d::UnivariateAffine) = kurtosis(d.ρ)
+
+cov(d::AffineDistribution) = d.σ * cov(d.ρ) * d.σ'


Probably should be defined only for multivariate distributions, and we should have different implementations based on the type of the scale to reduce allocations.

We could do that, but do we want to deviate from Base's behavior (where cov returns the variance for univariate outcomes)?

devmotion · 2021-12-06T01:33:28Z

src/univariate/affine.jl

+    return d.μ + d.σ * quantile(d.ρ, q)
+end
+
+rand(rng::AbstractRNG, d::AffineDistribution) = d.μ + d.σ * rand(rng, d.ρ)


For higher-dimensional distributions we should define _rand!.

devmotion · 2021-12-06T01:33:56Z

src/univariate/affine.jl

+
+rand(rng::AbstractRNG, d::AffineDistribution) = d.μ + d.σ * rand(rng, d.ρ)
+
+function gradlogpdf(d::ContinuousAffine, x::Real)


We should restrict it to univariate distributions.

I thought it is only defined for univariate distributions. In fact it's completely undocumented but I learnt just that it is also defined for MvNormal and MvTDist. So maybe it's fine to keep it. Then the bigger issue is that the implementation is incorrect.

How so? I'm not sure I see the mistake.

Co-authored-by: David Widmann <[email protected]>

Co-authored-by: Moritz Schauer <[email protected]>

ParadaCarleton · 2021-12-08T17:19:21Z

All tests now pass, except for support when called on normal distributions that have ForwardDiff.Dual numbers as arguments. I'm going to be completely honest and say I'm not sure what support is supposed to do there.

devmotion · 2021-12-08T19:11:44Z

I hope you don't take this personal but I don't think we can merge the PR in its current state. In my opinion it is difficult to review and, in particular, it is difficult to assess if there are any breaking changes. Distributions is a central package and hence we have to be careful about not breaking downstream packages (since there are so many). To me it still seems there are too many changes and too general definitions, even for the univariate case currently handled by LocationScale (indicated by the test failures as well).

As a way forward, I think we should break up this PR (or maybe start from scratch) and

make an initial PR that only replaces LocationScale with AffineDistribution (with less restrictive type parameters), adds a deprecation warning for LocationScale, and adds type aliases for LocationScale - but without any further changes or generalizations e.g. to multivariate distributions: Tests should still pass and it would be easy to review if the files are not renamed;
make a separate PR that allows and tests negative scales for univariate distributions;
make a follow-up PR that adds implementations for multi- and possibly even more general arrayvariate distributions (but without changes of the univariate implementations);
make additional PRs that introduce TranslatedDistribution and ScaledDistribution to simplify and enable the handling of + and *, without changing the implementations for AffineDistribution but only changing dispatches of * and + and combining ScaledDistribution and TranslatedDistribution to AffineDistributions.

ParadaCarleton · 2021-12-08T19:34:30Z

I hope you don't take this personal but I don't think we can merge the PR in its current state. In my opinion it is difficult to review and, in particular, it is difficult to assess if there are any breaking changes. Distributions is a central package and hence we have to be careful about not breaking downstream packages (since there are so many). To me it still seems there are too many changes and too general definitions, even for the univariate case currently handled by LocationScale (indicated by the test failures as well).

As a way forward, I think we should break up this PR (or maybe start from scratch) and

make an initial PR that only replaces LocationScale with AffineDistribution (with less restrictive type parameters), adds a deprecation warning for LocationScale, and adds type aliases for LocationScale - but without any further changes or generalizations e.g. to multivariate distributions: Tests should still pass and it would be easy to review if the files are not renamed;

make a separate PR that allows and tests negative scales for univariate distributions;

make a follow-up PR that adds implementations for multi- and possibly even more general arrayvariate distributions (but without changes of the univariate implementations);

make additional PRs that introduce TranslatedDistribution and ScaledDistribution to simplify and enable the handling of + and *, without changing the implementations for AffineDistribution but only changing dispatches of * and + and combining ScaledDistribution and TranslatedDistribution to AffineDistributions.

If you'd like, I can break this into two PRs. The first can add a UnivariateAffine which is just LocationScale with a possibly negative scale; this would replace LocationScale. A separate PR can add a MultivariateAffine distribution.

ParadaCarleton · 2021-12-08T23:03:03Z

I also would like to point out there have been essentially no changes to the behavior, of LocationScale, with the exception of:

Bug fixes for the location and scale methods, which are currently returning incorrect answers. (They always return locationscale.μ and locationscale.σ, regardless of the location and scale of the underlying distribution. For instance, LocationScale(0, 1, Normal(2, 4)) currently returns (0, 1).)
The implementations of new constructors and deprecation of the old LocationScale()-style constructor, as you requested.

All bugs were related to dealing with negative transformations of discrete distributions, where I forgot to account for the probability that x is exactly equal to X. There may be bugs regarding multivariate distributions that I haven't caught yet, since I was still writing the tests, but this PR should not contain any breaking changes.

If you'd like to rewrite the code for multivariate affine transformations from scratch, work out a new interface to replace the old LocationScale interface from scratch, etc. that's completely reasonable. For now, I'll gladly make a separate PR implementing the code I had ready a week and a half ago. This code made no changes except removing the positivity checks for LocationScale and throwing up a couple absolute value bars, before I'd rewritten all the constructors from scratch, deprecated or rewritten all of the old constructors to favor your own syntax using +, -, etc. instead of my original Affine which was a simple reskin of LocationScale, and implemented several features beyond the original scope of this PR at your request.

devmotion · 2021-12-08T23:37:34Z

If you'd like, I can break this into two PRs. The first can add a UnivariateAffine which is just LocationScale with a possibly negative scale; this would replace LocationScale. A separate PR can add a MultivariateAffine distribution.

Just to be sure there's no misunderstanding: We should not have separate distributions but at most type aliases (maybe they re not needed though). As mentioned above, it would be preferable to allow negative scales only in a second PR to keep changes in the different PRs minimal and make them easier to review.

I also would like to point out there have been essentially no changes to the behavior, of LocationScale, with the exception of:

Unfortunately, while the behaviour might not have changed (I don't know) the implementation has to a larger extent than what seemed necessary. During the different rounds of preliminary review I noticed and commented on methods that were modified slightly or generalized (sometimes incorrectly). Due to the renaming and other unrelated changes it is not possible for me to check with certainty if changes are needed, are bug fixes, are necessary or unnecessary generalizations, or are just style changes.

ParadaCarleton · 2021-12-08T23:46:47Z

Ok; if you'd like to implement renaming and all the different changes in a different PR, go ahead, then. I'll make a single PR that only changes LocationScale to allow negative scale values, or else (if you don't like this idea) I'll add reflections directly to a different package instead of spending more and more time making changes completely unrelated to this feature at your request.

devmotion · 2021-12-08T23:59:04Z

Sure, it's seemingly simple changes and of course it could be done in fewer PRs - but Distributions has > 300 direct dependents and > 700 indirect ones and hence I won't merge any PRs, in particular no refactorings, for which I feel I can't review them properly and guarantee with reasonable confidence that they don't break anything. Therefore I suggest breaking it up into small PRs that are easy to review.

Add Affine distributions

e8d191a

devmotion reviewed Nov 28, 2021

View reviewed changes

ParadaCarleton and others added 13 commits November 28, 2021 09:09

Bug fixes

7d2d5a7

Fixes

9106a9e

Don't export Affine

acb57f1

Co-authored-by: David Widmann <[email protected]>

Remove affine export

25766de

Co-authored-by: David Widmann <[email protected]>

replace affine

9d127b1

Co-authored-by: David Widmann <[email protected]>

Update src/matrix/matrixfdist.jl

22cb852

Co-authored-by: David Widmann <[email protected]>

Update src/matrix/matrixtdist.jl

756ddc1

Co-authored-by: David Widmann <[email protected]>

Update .gitignore

4dda88b

Co-authored-by: David Widmann <[email protected]>

suggestions

aefd9d2

Rename "Affine"

a71d351

documentation+exports

9f69cc6

LocationScale bug? + convenient affine

a969ae4

LocationScale bug?

ffad8aa

mschauer reviewed Dec 1, 2021

View reviewed changes

src/matrix/lkj.jl Outdated Show resolved Hide resolved

ParadaCarleton and others added 9 commits December 5, 2021 10:29

Update src/common.jl

18cd1cb

Co-authored-by: David Widmann <[email protected]>

Update src/univariate/affine.jl

68a0bf2

Co-authored-by: David Widmann <[email protected]>

Update src/univariate/affine.jl

3fe309b

Co-authored-by: David Widmann <[email protected]>

Update src/univariate/affine.jl

10876c0

Co-authored-by: David Widmann <[email protected]>

Update src/univariate/affine.jl

4409934

Co-authored-by: David Widmann <[email protected]>

Update src/univariate/affine.jl

4884be5

Co-authored-by: David Widmann <[email protected]>

Apply suggestions

78e4ed1

Update src/univariate/affine.jl

50703e0

Co-authored-by: David Widmann <[email protected]>

Update src/univariate/affine.jl

0a77c2e

Co-authored-by: David Widmann <[email protected]>

devmotion reviewed Dec 6, 2021

View reviewed changes

ParadaCarleton and others added 10 commits December 6, 2021 13:13

Update src/univariate/affine.jl

40b3dcf

Co-authored-by: David Widmann <[email protected]>

fixes

62dbce6

add rand!

c0f1143

fixes

4d191d1

Fixes

4b10ad3

additional test

5129c58

Update src/matrix/lkj.jl

82bc2ae

Co-authored-by: Moritz Schauer <[email protected]>

Doc improvements

2286772

Remove qualified "inv"

c4ad326

Fix method ambiguity

cd9dcf6

Fix CDF for univariate distributions

4d6d7a2

Merge remote-tracking branch 'upstream/master' into affine-transforms

775e2eb


		rand(rng::AbstractRNG, d::AffineDistribution) = d.μ + d.σ * rand(rng, d.ρ)

		function gradlogpdf(d::ContinuousAffine, x::Real)

Add Affine distributions #1442

Are you sure you want to change the base?

Add Affine distributions #1442

Conversation

ParadaCarleton commented Nov 28, 2021

devmotion commented Nov 28, 2021

ParadaCarleton commented Nov 28, 2021

devmotion commented Nov 28, 2021

ParadaCarleton commented Nov 29, 2021 • edited Loading

ParadaCarleton commented Dec 1, 2021 • edited Loading

devmotion commented Dec 1, 2021

mschauer commented Dec 1, 2021

devmotion commented Dec 1, 2021

ParadaCarleton commented Dec 1, 2021 • edited Loading

ParadaCarleton commented Dec 1, 2021 • edited Loading

devmotion commented Dec 1, 2021

ParadaCarleton commented Dec 1, 2021 • edited Loading

devmotion commented Dec 1, 2021

devmotion commented Dec 1, 2021

devmotion commented Dec 5, 2021

devmotion left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ParadaCarleton Dec 7, 2021 • edited Loading

Choose a reason for hiding this comment

ParadaCarleton commented Dec 8, 2021

devmotion commented Dec 8, 2021

ParadaCarleton commented Dec 8, 2021 • edited Loading

ParadaCarleton commented Dec 8, 2021 • edited Loading

devmotion commented Dec 8, 2021

ParadaCarleton commented Dec 8, 2021 • edited Loading

devmotion commented Dec 8, 2021

ParadaCarleton commented Nov 29, 2021 •

edited

Loading

ParadaCarleton commented Dec 1, 2021 •

edited

Loading

ParadaCarleton commented Dec 1, 2021 •

edited

Loading

ParadaCarleton commented Dec 1, 2021 •

edited

Loading

ParadaCarleton commented Dec 1, 2021 •

edited

Loading

ParadaCarleton Dec 7, 2021 •

edited

Loading

ParadaCarleton commented Dec 8, 2021 •

edited

Loading

ParadaCarleton commented Dec 8, 2021 •

edited

Loading

ParadaCarleton commented Dec 8, 2021 •

edited

Loading