Besselj and Besselk for all integer and noninteger postive orders #29

heltonmc · 2022-08-03T16:30:55Z

Implementation of besselk and besseli for all real orders. Still have several things to finish and touch up but these methods so far seem to work

besseli is very straightforward. Use the uniform asymptotic expansion when either x or nu is large (x>35, nu>25) and then the power series elsewhere... now I also have the asymptotic expansion when x is large in there that is significantly faster but may just remove so the whole function has just two branches... decision point for later..

besselk is obviously not straightforward. However, the same uniform asymptotic expansion applies when either x or nu is large (x>35, nu>25) so use that. Now we could also use an asymptotic expansion with respect to x here as well (see https://github.com/heltonmc/Bessels.jl/issues/17#issuecomment-1197096678), but again at the cost of an extra branch.

So the rest comes down to what do we do when both x < 35 and nu < 25... The power series form for besselk is accurate when x < 2 or nu > 1.6x - 1.0 so we could use that but that doesn't help much for x > 2 and adds an extra branch. We could use numerical integration here from small orders and then use forward recurrence (see https://github.com/heltonmc/Bessels.jl/issues/17#issuecomment-1177801831) and subsequent optimizations.... We could also use the hypergeometric series (see https://github.com/heltonmc/Bessels.jl/issues/17#issuecomment-1196836506) for small orders then use a continued fraction approach to generate k_[nu+1} and then use forward recurrence to fill everything else..... we could also use some type of polynomial approximation (Chebyshev) but I found that this only worked well for a limited range.

I think the simplest method is to take advantage of the fact that the besseli power series is very accurate for nu<25 for any x value.... the advantage is we can also generate I_{nu+1} very easily so we can get very accurate estimates of I_{nu+1}/I_{nu} for any value of nu in this range. Now, we can also use a continued fraction approach to get K_{nu+1}/K_{nu} (see https://github.com/heltonmc/Bessels.jl/issues/17#issuecomment-1195726642). So using the two ratios we can use the Wronskian to get K_{nu} very simply with a lot of code reuse.

Here are some benchmarks for this method.

julia> @benchmark Bessels.besselk_continued_fraction(15.2, 17.0)
BenchmarkTools.Trial: 10000 samples with 700 evaluations.
 Range (min … max):  179.583 ns … 276.309 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     179.881 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   180.464 ns ±   3.142 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▆█▇▄▁                            ▂▃▁                         ▁
  ███████▇▆▆▆▅▅▅▅▄▃▄▄▄▄▅▅▅▄▄▄▄▄▃▃▂▄▇███▆▃▃▅▂▄▄▄▂▄▂▃▄▃▅▅▅▆▆▇▆▆▅▅ █
  180 ns        Histogram: log(frequency) by time        187 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.


julia> @benchmark SpecialFunctions.besselk(15.2, 17.0)
BenchmarkTools.Trial: 10000 samples with 198 evaluations.
 Range (min … max):  435.606 ns … 782.409 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     441.081 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   450.682 ns ±  26.869 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▃██▆▃▃▁▃▄▂▁▁       ▁                                          ▂
  █████████████████▇▇████▇▇▆▇▇▇▇█▇▇▆▇█▇▆▇▇▇▇▆▆▆▇▆▇▆▆▅▅▄▅▅▅▅▅▅▅▄ █
  436 ns        Histogram: log(frequency) by time        571 ns <

 Memory estimate: 16 bytes, allocs estimate: 1.

So almost 2.5x faster than SpecialFunctions....

O and and here is a plot of the relative errors compared to ArbNumerics.jl

With a maximum error of..

julia> maximum(out)
1.33399240780568911954050188375962391214648355184706163492721e-15

For reference this more accurate than SpecialFunctions.jl which had a maximum error of 5e-15.

heltonmc · 2022-08-03T16:33:52Z

Some general things going forward.... Do we want to have another branch for large x that would be faster at the cost of more branches? Unsure honestly. We can look more at @cgeoga large argument expansion https://github.com/heltonmc/Bessels.jl/issues/17#issuecomment-1197096678. I have one for besseli that is really fast and accurate in the appropriate range

oscardssmith · 2022-08-03T17:00:08Z

That looks great!

cgeoga · 2022-08-03T17:35:17Z

Hey, sorry to have disappeared for a few days! I'm about to enter a pretty swamped period and will probably be slower for the next week or so.

Those times for the intermediate range are very impressive to me. I played a bit more with the Miller recurrence, but I think to get to the tolerances you are going for here it would just be too expensive. With regard to very large x, I'm surprised about the standard asymptotic expansion not being fast enough. With my random choice of (v,x) = (0.3,38.1), if I crank up the order to 10 instead of 6 that was in your test, even without the exponential improvement I get 1 - (...)/(...) = 8.8e-16 or so at a speed cost 16 ns. That suggests a higher x cutoff than I expected, but in my experience the uniform asymptotic expansion is much slower to converge as x gets larger when v is small. With that said, though, the AMOS library (figure 2 here) does not seem to have a special branch for this. The Temme 1975 paper does use one, though.

I like the idea of the code re-use with the besseli series and the Wronskian. That is a strategy that I missed but seems very simple and elegant. I wish I had taken the continued fraction thing more seriously earlier in the project and given that version a whack.

With regard to extra branches, I would think there are enough ns on the table where it would make sense to have several for various specialized cases, like when v = k + 1/2 for integer k (although I'm biased about this obviously). I would also think that a power series approach might be way faster for, say, x<0.1. I don't really know what the computational physics applications are and what kinds of (v,x) you typically use, but in my applications I might hit that case a lot.

heltonmc · 2022-08-03T17:44:09Z

Yes - the miller recurrence is definitely something else to consider! I think that is what Amos is using? So I would imagine that we might be able to slightly faster than what they have but similar accuracies. I do like this approach as the amount of code is much smaller and easier to optimize and diagnose branch cutoffs.

But to your point. The continued fraction approach is much slower when x < 1 so we should use a power series here for sure. I'll do that now.

For the large argument, I'm very happy to include that. Maybe once I merge this you could submit a PR with the large argument branch that you wrote? We can then integrate I think. I do like to lean on the asymptotic expansions just because they are so optimized (and I use them for every single routine) but definitely when x>>nu a different expansion will be better so let's do that.

I think for specific nu like 1/2 it would be worth having an optimized branch. Maybe as a separate PR after sort this all out. I'm also going to work on nu =1/3 because we can use those for Airy functions which are needed as well.... but in time.

heltonmc · 2022-08-03T17:48:58Z

Here is a plot of the relative errors using the uniform asymptotic expansion (with 10 U polynomials). So it looks like it converges fairly well even when nu is small.

Which is quite fast (not quite as the asymptotic expansion would be for large x though)

julia> @benchmark Bessels.besselk_large_orders(120.0,80.0)
BenchmarkTools.Trial: 10000 samples with 968 evaluations.
 Range (min … max):  79.545 ns … 129.347 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     79.718 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   80.406 ns ±   3.194 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▃       ▃                                                   ▁
  ██▅▅▅▅▅▄▅██▃▄▄▆▇▄▄▂▄▂▄▂▃▄▃▃▄▅▄▄▃▄▃▃▃▂▃▄▄▄▃▄▃▅▄▄▅▄▂▄▅▅▄▄▅▃▄▂▄ █
  79.5 ns       Histogram: log(frequency) by time      98.4 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

cgeoga · 2022-08-03T17:50:57Z

Fantastic. This code is sort of on a pedestal for me and I'm nervous to submit an un-clean PR, but I'm very happy to give it a shot. Do you have a series code already? If not, our series code is pretty decent, I think, and I could try to clean it up and submit it for consideration as well in a separate issue or something.

Once this merges I'll definitely submit an asymptotic expansion PR. Do you want the exponential correction in that? The way I wrote it in our code it uses the maximum possible order, but maybe it would be faster for me to hard-code just using the leading term in it or something, if you wanted. Happy to submit it that way and potentially prune it off if the speed cost is too high.

I could also add the half-integer order code from #25 in that same PR if you'd like.

Also, funny about that plot---in my mind, an order of, like, 3 is "large" because of my application. I keep forgetting that you want to work with an order of 50 and stuff.

heltonmc · 2022-08-03T17:53:08Z

So I'd probably pose the following branches.

nu > 25.0 || x > 35.0
- use the uniform asymptotic expansion
x < ~2.0
- use the power series
else

use continued fraction with wronskian
inu and inum1 are generate simultaneously with power series

With the option of a 4th branch for asymptotic x >> nu.

That sounds pretty reasonable to me...?

heltonmc · 2022-08-03T18:06:58Z

Definitely don't feel that way. The code is very good so I think we can just drop in and clean some stuff up! But let's plan on that when you have time submit the asymptotic expansion for large x and we can go through it. The exponential correction sounds reasonable but we should have a pretty specific cutoff where we want to use it and set the order to a specific value for the entire branch.

I do not have a power series for besselk that I am super happy with so it might be great to just reuse what you have. I would like to probably include that in this PR though. I can coauthor a commit if you want with that implementation here?

But yes if you want half orders please go ahead and submit that (preferably as a separate PR)!

Haha about the orders! In my research I deal with a lot of infinite series over the order so we need calculations for very high (nu>1e4) numbers....

Co-authored-by: Chris Geoga <[email protected]>

cgeoga · 2022-08-03T18:27:04Z

I appreciate that! I see you committed the code from the repo directly, but here is a slightly cleaned up version that I think is a slightly better style fit:

function _besselk_ser(v, x)
  (maxit, tol) = (100, 1e-15)
  T = promote_type(typeof(x), typeof(v))
  # precompute a handful of things:
  xd2  = x/T(2)
  xd22 = xd2*xd2
  half = one(T)/T(2)
  # (x/2)^(±v). Writing the literal power doesn't seem to change anything here,
  # and I think this is faster:
  lxd2     = log(xd2)
  xd2_v    = exp(v*lxd2)
  xd2_nv   = exp(-v*lxd2)
  # use the gamma function a couple times to start:
  gam_v    = gamma(v)
  gam_nv   = gamma(-v)
  gam_1mv  = -gam_nv*v # == gamma(one(T)-v)
  gam_1mnv = gam_v*v   # == gamma(one(T)+v)
  (gpv, gmv) = (gam_1mnv, gam_1mv)
  # One final re-compression of a few things:
  _t1 = gam_v*xd2_nv*gam_1mv
  _t2 = gam_nv*xd2_v*gam_1mnv
  # A couple series-specific accumulators:
  (xd2_pow, fact_k, floatk, out) = (one(T), one(T), zero(T), zero(T))
  for _j in 0:maxit
    t1   = half*xd2_pow
    tmp  = _t1/(gmv*fact_k)
    tmp += _t2/(gpv*fact_k)
    term = t1*tmp
    out += term
    abs(term/out) < tol && return out
    # Use the trick that gamma(1+k+1+v) == gamma(1+k+v)*(1+k+v) to skip gamma calls:
    (gpv, gmv) = (gpv*(one(T)+v+floatk), gmv*(one(T)-v+floatk)) 
    xd2_pow *= xd22
    fact_k  *= (floatk+1)
    floatk  += one(T)
  end
  throw(error("$maxit iterations reached without achieving atol $tol."))
end

Note that the modified version in our repo is not exp(x)*besselk(v,x), it is (x^v)*besselk(v, x) (which shows up in the Matern covariance), so that probably isn't useful to you here and I cut it out.

The issue I'm having, though, is that my 1 - (...)/(...) errors are really about 2e-14 and I can't do much better. Does anything here stand out to you as problematic?

EDIT: the code is a bit obfuscated, so for reference I'm using equation 3.2 from here.

src/besselk.jl

codecov-commenter · 2022-08-03T21:27:31Z

Codecov Report

Merging #29 (dfbd16d) into master (a09b94d) will increase coverage by 0.04%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master      #29      +/-   ##
==========================================
+ Coverage   99.06%   99.10%   +0.04%     
==========================================
  Files          14       14              
  Lines        1172     1231      +59     
==========================================
+ Hits         1161     1220      +59     
  Misses         11       11

Impacted Files	Coverage Δ
src/U_polynomials.jl	`99.24% <100.00%> (-0.04%)`	⬇️
src/besseli.jl	`100.00% <100.00%> (ø)`
src/besselk.jl	`100.00% <100.00%> (ø)`
src/gamma.jl	`100.00% <100.00%> (ø)`

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

src/besselk.jl

src/besseli.jl

heltonmc · 2022-08-04T13:56:25Z

I think this is ready to merge now. I think the last thing is still the idea of potentially returning complex numbers for real (negative) inputs. Mathematica will return complex values for real inputs but SpecialFunctions.jl will not.

julia> using SpecialFunctions

julia> bessely(1.2, -0.2)
ERROR: DomainError with -0.2:
`x` must be nonnegative.
Stacktrace:
 [1] bessely(nu::Float64, x::Float64)
   @ SpecialFunctions ~/.julia/packages/SpecialFunctions/NBIqR/src/bessel.jl:556
 [2] top-level scope
   @ REPL[2]:1

julia> bessely(1.2, -Complex(0.2))
3.8701259814738727 - 2.904048804819615im

Matlab will output complex numbers for real inputs.

>> bessely(1.2, -0.2)

ans =

   3.8701 - 2.9040i

>> bessely(1.2, -complex(0.2))

ans =

   3.8701 - 2.9040i

I'm wondering what the best is here?

oscardssmith · 2022-08-04T13:57:35Z

Julia's general practice is to throw a domain error in cases like this (and require the input to be complex)

heltonmc · 2022-08-04T14:05:39Z

That sounds good to me. One issue is that we don't actually support arguments for nonzero imaginary parts. So we would probably have to have a additional checks for nonzero complex terms.

Edit: I'm wondering if that would be a little confusing to allow for complex arguments but Im(x) = 0 for now...

oscardssmith · 2022-08-04T14:16:32Z

Honestly, I think we can not deal with this until we start on more general complex support.

heltonmc · 2022-08-04T14:18:26Z

Ya that sounds good to me.... I'll make an issue about it just so people are aware and put it on the updated readme!

continued fraction

aeddfe7

add power series

9f6cb5c

Co-authored-by: Chris Geoga <[email protected]>

oscardssmith reviewed Aug 3, 2022

View reviewed changes

src/besselk.jl Outdated Show resolved Hide resolved

oscardssmith reviewed Aug 3, 2022

View reviewed changes

src/besselk.jl Outdated Show resolved Hide resolved

heltonmc added 2 commits August 3, 2022 14:47

update power series

6863aeb

rm extra assignment

71de923

oscardssmith reviewed Aug 3, 2022

View reviewed changes

src/besselk.jl Outdated Show resolved Hide resolved

heltonmc added 4 commits August 3, 2022 14:57

rm extra division

3252399

actually optimize gamma call away

277bd35

update docs for power series

b4a8832

add docs and besselk tests

bc4ec0e

heltonmc added 2 commits August 3, 2022 18:03

add tests, fix F32

cd97b85

add docs

00cd0ec

heltonmc commented Aug 3, 2022

View reviewed changes

src/besselk.jl Show resolved Hide resolved

add support for negative args

7278cfe

heltonmc commented Aug 3, 2022

View reviewed changes

src/besseli.jl Show resolved Hide resolved

heltonmc added 3 commits August 3, 2022 19:25

fix coverage

aea768b

add more negative arg tests

ab6a5e5

fix doc string error

dfbd16d

heltonmc marked this pull request as ready for review August 3, 2022 23:47

This was referenced Aug 4, 2022

Clean up besselj(nu,x) and add docs #28

Merged

v0.1.0 Release Roadmap #1

Closed

heltonmc merged commit 497b118 into master Aug 4, 2022

heltonmc deleted the besseli_all_orders branch August 4, 2022 14:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Besselj and Besselk for all integer and noninteger postive orders #29

Besselj and Besselk for all integer and noninteger postive orders #29

heltonmc commented Aug 3, 2022

heltonmc commented Aug 3, 2022

oscardssmith commented Aug 3, 2022

cgeoga commented Aug 3, 2022

heltonmc commented Aug 3, 2022

heltonmc commented Aug 3, 2022 •

edited

Loading

cgeoga commented Aug 3, 2022

heltonmc commented Aug 3, 2022

heltonmc commented Aug 3, 2022

cgeoga commented Aug 3, 2022 •

edited

Loading

codecov-commenter commented Aug 3, 2022 •

edited

Loading

heltonmc commented Aug 4, 2022

oscardssmith commented Aug 4, 2022

heltonmc commented Aug 4, 2022 •

edited

Loading

oscardssmith commented Aug 4, 2022

heltonmc commented Aug 4, 2022

Besselj and Besselk for all integer and noninteger postive orders #29

Besselj and Besselk for all integer and noninteger postive orders #29

Conversation

heltonmc commented Aug 3, 2022

heltonmc commented Aug 3, 2022

oscardssmith commented Aug 3, 2022

cgeoga commented Aug 3, 2022

heltonmc commented Aug 3, 2022

heltonmc commented Aug 3, 2022 • edited Loading

cgeoga commented Aug 3, 2022

heltonmc commented Aug 3, 2022

heltonmc commented Aug 3, 2022

cgeoga commented Aug 3, 2022 • edited Loading

codecov-commenter commented Aug 3, 2022 • edited Loading

Codecov Report

heltonmc commented Aug 4, 2022

oscardssmith commented Aug 4, 2022

heltonmc commented Aug 4, 2022 • edited Loading

oscardssmith commented Aug 4, 2022

heltonmc commented Aug 4, 2022

heltonmc commented Aug 3, 2022 •

edited

Loading

cgeoga commented Aug 3, 2022 •

edited

Loading

codecov-commenter commented Aug 3, 2022 •

edited

Loading

heltonmc commented Aug 4, 2022 •

edited

Loading