Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add EnzymeCore weakdep and an extension with a custom rule for the Levin transformation #97

Merged
merged 17 commits into from
May 18, 2023
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) 2021-2022 Michael Helton, Oscar Smith, and contributors
Copyright (c) 2021-2023 Michael Helton, Oscar Smith, Chris Geoga, and contributors

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
8 changes: 7 additions & 1 deletion Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,17 @@ version = "0.3.0-DEV"
SIMDMath = "5443be0b-e40a-4f70-a07e-dcd652efc383"

[compat]
julia = "1.8"
SIMDMath = "0.2.5"
julia = "1.8"

[extras]
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"

[weakdeps]
EnzymeCore = "f151be2c-9106-41f4-ab19-57ee4f262869"

[extensions]
BesselsEnzymeCoreExt = "EnzymeCore"

[targets]
test = ["Test"]
51 changes: 51 additions & 0 deletions ext/BesselsEnzymeCoreExt.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
module BesselsEnzymeCoreExt

# TODO (cg 2023/05/08 10:02): Compat of any kind.

using Bessels, EnzymeCore
using EnzymeCore.EnzymeRules
using Bessels.Math

# A manual method that separately transforms the `val` and `dval`, because
# sometimes the `val` can converge while the `dval` hasn't, so just using an
# early return or something can give incorrect derivatives in edge cases.
#
# https://github.com/JuliaMath/Bessels.jl/issues/96
#
# and links with for discussion.
#
# TODO (cg 2023/05/08 10:00): I'm not entirely sure how best to "generalize"
# this to cases like a return type of DuplicatedNoNeed, or something being a
# `Enzyme.Const`. These shouldn't in principle affect the "point" of this
# function (which is just to check for convergence before applying a
# function), but on its face this approach would mean I need a lot of
# hand-written extra methods. I have an open issue on the Enzyme.jl repo at
#
# https://github.com/EnzymeAD/Enzyme.jl/issues/786
#
# that gets at this problem a bit. But it's a weird request and I'm sure Billy
# has a lot of asks on his time.
function EnzymeRules.forward(func::Const{typeof(levin_transform)},
::Type{<:Duplicated},
s::Duplicated,
w::Duplicated)
(sv, dv, N) = (s.val, s.dval, length(s.val))
ls = levin_transform(sv, w.val)
dls = levin_transform(dv, w.dval)
Duplicated(ls, dls)
end

# This is fixing a straight bug in Enzyme.
function EnzymeRules.forward(func::Const{typeof(sinpi)},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any way we can get this fixed in enzyme itself?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this should use sincospi

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I poked over at EnzymeAD/Enzyme.jl#443
I don’t think I know exactly how to solve that though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it turns out this is not the only problem. Something else in the generic power series is broken for Enzyme but not ForwardDiff. Leaving a summary comment below, one moment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you have time could post a specific example where this is broken? I will try to figure out what line is causing the issue even when separating out the sinpi.

These issues though are especially annoying......

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ironically sincospi using Enzyme should be fine. I'm adding a pr for sinpi/cospi now which hopefully will be available in a few days.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, though I'm solving in a different way from this PR (internal to Enzyme proper rather than Enzyme.jl custom rule), rules like this are welcome as PR's to Enzyme.jl

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect! Thanks for looking at this. I'll change it over here once that is available.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It occurs to me to mention here for people looking at this PR in the future that the problem was just the sinpi, but that I didn't understand how to properly write EnzymeRules and just needed to propagate the x.dval in the derivative part of the returned Duplicated object. I didn't include tests for power series accuracy here because that will probably be a little bit of a project to get the last few digits, but once I fixed my custom rule that worked fine.

@wsmoses, would you like me to make a PR with this rule in the meantime? If it is fixed and will be available in the next release, maybe not point unless you would make a more immediate release that has the custom rule. I'd be happy to try and make that PR if you want, but I understand if it isn't the most useful.

Sorry that this thread looks on a skim like Enzyme problems but was actually "Chris doesn't know how to write custom rules" problems.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See EnzymeAD/Enzyme#1216. Hopefully that fixes this issue here and we can remove that part in the future.

P.s. I have the general besselk working now locally so hopefully we can get that merged soon and can test the general derivative cases.

::Type{<:Duplicated},
x::Duplicated)
Duplicated(sinpi(x.val), pi*cospi(x.val))
end

function EnzymeRules.forward(func::Const{typeof(sinpi)},
::Type{<:Const},
x::Const)
sinpi(x.val)
end

end
127 changes: 89 additions & 38 deletions src/BesselFunctions/besselk.jl
Original file line number Diff line number Diff line change
Expand Up @@ -499,38 +499,25 @@ besselk_power_series(v, x::Float32) = Float32(besselk_power_series(v, Float64(x)
besselk_power_series(v, x::ComplexF32) = ComplexF32(besselk_power_series(v, ComplexF64(x)))

function besselk_power_series(v, x::ComplexOrReal{T}) where T
MaxIter = 1000
S = eltype(x)
v, x = S(v), S(x)

z = x / 2
zz = z * z
logz = log(z)
xd2_v = exp(v*logz)
xd2_nv = inv(xd2_v)

# use the reflection identify to calculate gamma(-v)
# use relation gamma(v)*v = gamma(v+1) to avoid two gamma calls
gam_v = gamma(v)
gam_nv = π / (sinpi(-abs(v)) * gam_v * v)
gam_1mv = -gam_nv * v
gam_1mnv = gam_v * v

_t1 = gam_v * xd2_nv * gam_1mv
_t2 = gam_nv * xd2_v * gam_1mnv
(xd2_pow, fact_k, out) = (one(S), one(S), zero(S))
for k in 0:MaxIter
t1 = xd2_pow * T(0.5)
tmp = muladd(_t1, gam_1mnv, _t2 * gam_1mv)
tmp *= inv(gam_1mv * gam_1mnv * fact_k)
term = t1 * tmp
out += term
abs(term / out) < eps(T) && break
(gam_1mnv, gam_1mv) = (gam_1mnv*(one(S) + v + k), gam_1mv*(one(S) - v + k))
xd2_pow *= zz
fact_k *= k + one(S)
Math.isnearint(v) && return besselk_power_series_int(v, x)
MaxIter = 5000
gam = gamma(v)
ngam = π / (sinpi(-abs(v)) * gam * v)

s1, s2 = zero(T), zero(T)
t1, t2 = one(T), one(T)

for k in 1:MaxIter
s1 += t1
s2 += t2
t1 *= x^2 / (4k * (k - v))
t2 *= x^2 / (4k * (k + v))
abs(t1) < eps(T) && break
end
return out

xpv = (x/2)^v
s = gam * s1 + xpv^2 * ngam * s2
return s / (2*xpv)
end
besselk_power_series_cutoff(nu, x::Float64) = x < 2.0 || nu > 1.6x - 1.0
besselk_power_series_cutoff(nu, x::Float32) = x < 10.0f0 || nu > 1.65f0*x - 8.0f0
Expand Down Expand Up @@ -578,15 +565,16 @@ end
@generated function besselkx_levin(v, x::T, ::Val{N}) where {T <: FloatTypes, N}
:(
begin
s_0 = zero(T)
s = zero(T)
t = one(T)
@nexprs $N i -> begin
s_{i} = s_{i-1} + t
t *= (4*v^2 - (2i - 1)^2) / (8 * x * i)
w_{i} = 1 / t
end
sequence = @ntuple $N i -> s_{i}
weights = @ntuple $N i -> w_{i}
s += t
t *= (4*v^2 - (2i - 1)^2) / (8 * x * i)
s_{i} = s
w_{i} = t
heltonmc marked this conversation as resolved.
Show resolved Hide resolved
end
sequence = @ntuple $N i -> s_{i}
weights = @ntuple $N i -> w_{i}
heltonmc marked this conversation as resolved.
Show resolved Hide resolved
return levin_transform(sequence, weights) * sqrt(π / 2x)
end
)
Expand Down Expand Up @@ -614,3 +602,66 @@ end
end
)
end

# This is a version of Temme's proposed f_0 (1975 JCP, see reference above) that
# swaps in a bunch of local expansions for functions that are well-behaved but
# whose standard forms can't be naively evaluated by a computer at the origin.
@inline function f0_local_expansion_v0(v, x)
l2dx = log(2/x)
mu = v*l2dx
vv = v*v
sp = evalpoly(vv, (1.0, 1.6449340668482264, 1.8940656589944918, 1.9711021825948702))
g1 = evalpoly(vv, (-0.5772156649015329, 0.04200263503409518, 0.042197734555544306))
g2 = evalpoly(vv, (1.0, -0.6558780715202539, 0.16653861138229145))
sh = evalpoly(mu*mu, (1.0, 0.16666666666666666, 0.008333333333333333, 0.0001984126984126984, 2.7557319223985893e-6))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm kinda wondering how many terms we need for this expansion as mu slowly grows...

julia> v = 1e-4
0.0001

julia> v * log(2 / 24.0)
-0.0002484906649788

Five terms is probably ok ? I did some quick checks adding another term below but didn't seem to change much. Seems like a reasonable approximation that we have here. Just checking this f0 is pretty accurate? This isn't contributing to errors we are seeing?

SeriesData[x, 0, {
 1.`20., 0, 0.16666666666666666666666666666666666667`20., 0, 
  0.00833333333333333333333333333333333333`20., 0, 
  0.00019841269841269841269841269841269841`20., 0, 
  2.75573192239858906525573192239859`20.*^-6, 0, 
  2.505210838544171877505210838544`20.*^-8}, 0, 11, 1]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't really say where the difference is coming from here ....

julia> Bessels.BesselFunctions.besselk_power_series_int(0.000, 10.0)
1.778006219756317e-5

julia> ArbNumerics.besselk(ArbFloat(0.0000), ArbFloat(10.0))
1.7780062316167651811301192799427833154e-5

julia> ArbNumerics.besselk(ArbFloat(12.0000), ArbFloat(10.0))
0.010278998056493335846252984780767697567

julia> Bessels.BesselFunctions.besselk_power_series_int(12.000, 10.0)
0.010278998068072438

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well those are huge x values---weren't we only going to use this for x<1.5 or so?

In general, I'm sure more terms couldn't hurt---I can tinker with that. But it isn't obvious that it will help, because if abs(v)<1e-5 and these are polynomials in v^2, then the sixth order coefficient will be at most 1e-30. I know that the story is more complicated for derivatives though, so I'll see about putting in an extra term to see if it helps.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya that's right haha. I will need to adjust cutoffs and stuff accordingly. So like what we probably should do is just have general routines for abs(v) < 1.5 and then use forward reccurence. It's much faster actually to do 2 Levin calculations and forward recurrence than it is to do the int power series. So let's adjust the whole routine. For v > 25 we will use the uniform expansion so forward recurrence will be fast here.

But ya I think as long as we verify derivatives for v near 0.0 and x < 1.5 for this power series we should be ok. The tough thing about this though is that the derivatives are zero for v so something really close to v will be tricky to get right....

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dang sorry ya I need to rethink the whole routine now that the intemediate range using the Levin transform is so fast. I used to just try to avoid that completely by extending the power series range as much as possible but now it makes sense to favor that when necessary and just fall back to forward recurrence when necessary.

It actually makes checking the accuracy of the whole routine much easier because we essentially have to just check our scalar and derivative information for v < 1.0 and we know that forward recurrence is also stable and accurate for besselk. This should greatly reduce the number of points we need to explicity check. Of course we should still do some scattered spotchecks to make sure the derivatives are carried out ok with AD

sp*(g1*cosh(mu) + g2*sh*l2dx)
end

# This function assumes |v|<1e-5!
function besselk_power_series_temme_basal(v::V, x::X) where{V,X}
max_iter = 50
T = promote_type(V,X)
z = x/2
zz = z*z
fk = f0_local_expansion_v0(v,x)
zv = z^v
znv = inv(zv)
gam_1_c = (1.0, -0.5772156649015329, 0.9890559953279725, -0.23263776388631713)
gam_1pv = evalpoly(v, gam_1_c)
gam_1nv = evalpoly(-v, gam_1_c)
(pk, qk, _ck, factk, vv) = (znv*gam_1pv/2, zv*gam_1nv/2, one(T), one(T), v*v)
(out_v, out_vp1) = (zero(T), zero(T))
for k in 1:max_iter
# add to the series:
ck = _ck/factk
term_v = ck*fk
term_vp1 = ck*(pk - (k-1)*fk)
out_v += term_v
out_vp1 += term_vp1
# check for convergence:
((abs(term_v) < eps(T)) && (abs(term_vp1) < eps(T))) && break
# otherwise, increment new quantities:
fk = (k*fk + pk + qk)/(k^2 - vv)
pk /= (k-v)
qk /= (k+v)
_ck *= zz
factk *= k
end
(out_v, out_vp1/z)
end

function besselk_power_series_int(v, x::Float64)
v = abs(v)
(_v, flv) = modf(v)
if _v > 1/2
(_v, flv) = (_v-one(_v), flv+1)
end
(kv, kvp1) = besselk_power_series_temme_basal(_v, x)
twodx = 2/x
for _ in 1:flv
_v += 1
(kv, kvp1) = (kvp1, muladd(twodx*_v, kvp1, kv))
end
kv
end

2 changes: 2 additions & 0 deletions src/GammaFunctions/gamma.jl
Original file line number Diff line number Diff line change
Expand Up @@ -113,3 +113,5 @@ function gamma(n::Integer)
n > 20 && return gamma(float(n))
@inbounds return Float64(factorial(n-1))
end

gamma_near_1(x) = evalpoly(x-one(x), (1.0, -0.5772156649015329, 0.9890559953279725, -0.23263776388631713))
8 changes: 6 additions & 2 deletions src/Math/Math.jl
Original file line number Diff line number Diff line change
Expand Up @@ -131,11 +131,12 @@ end
#@inline levin_scale(B::T, n, k) where T = -(B + n) * (B + n + k)^(k - one(T)) / (B + n + k + one(T))^k
@inline levin_scale(B::T, n, k) where T = -(B + n + k) * (B + n + k - 1) / ((B + n + 2k) * (B + n + 2k - 1))

@inline @generated function levin_transform(s::NTuple{N, T}, w::NTuple{N, T}) where {N, T <: FloatTypes}
@inline @generated function levin_transform(s::NTuple{N, T},
w::NTuple{N, T}) where {N, T <: FloatTypes}
len = N - 1
:(
begin
@nexprs $N i -> a_{i} = Vec{2, T}((s[i] * w[i], w[i]))
@nexprs $N i -> a_{i} = iszero(w[i]) ? (return s[i]) : Vec{2, T}((s[i] / w[i], 1 / w[i]))
@nexprs $len k -> (@nexprs ($len-k) i -> a_{i} = fmadd(a_{i}, levin_scale(one(T), i, k-1), a_{i+1}))
return (a_1[1] / a_1[2])
end
Expand All @@ -153,4 +154,7 @@ end
)
end

# TODO (cg 2023/05/16 18:09): dispute this cutoff.
cgeoga marked this conversation as resolved.
Show resolved Hide resolved
isnearint(x) = abs(x-round(x)) < 1e-5

end
Loading