Skip to content

libdivide-2.0

Compare
Choose a tag to compare
@kimwalisch kimwalisch released this 04 Jul 14:45
· 111 commits to master since this release

I am happy to announce the release of libdivide-2.0 🎉

Libdivide finally supports AVX2 and AVX512 vector division on x86 CPUs. Libdivide now also works with the clang-cl compiler and the Intel C++ compiler on Windows. There have been many small incremental improvements which should provide minor speedups for many use cases.

Since libdivide is now nearly 10 years old and many features have been added over the years it has become necessary to remove some rarely used functionality. I have removed the unswitch functionality since it was a large amount of code that has never been used by anybody as far as I am aware of. So overall, even with the added support for AVX2 and AVX512, libdivide.h now contains fewer lines of code than the previous release and compiles faster using both C and C++.

  • BREAKING
    • Removed unswitch functionality (#46)
    • Renamed macro LIBDIVIDE_USE_SSE2 to LIBDIVIDE_SSE2
    • Renamed divider::recover_divisor() to divider::recover()
  • BUG FIXES
    • Remove _udiv128() as not yet supported by clang-cl and icl compilers
    • Fix C++ linker issue caused by anonymous namespace (#54)
    • Fix clang-cl (Windows) linker issue (#56)
  • ENHANCEMENT
    • Add AVX2 & AVX512 vector division
    • Speed up SSE2 libdivide_mullhi_u64_vector()
    • Support +1 & -1 signed branchfree dividers (4a1d5a7)
    • Speed up unsigned branchfull power of 2 dividers (2422199)
    • Simplify C++ templates
    • Simplify more bit flags of the libdivide_*_t structs
    • Get rid of MAYBE_VECTOR() hack
  • TESTING
    • tester.cpp: Convert to modern C++
    • tester.cpp: Add more test cases
    • benchmark_branchfreee.cpp: Convert to modern C++
    • benchmark.c: Prevent compilers from optmizing too much
  • BUILD
    • Automatically detect SSE2/AVX2/AVX512
  • DOCS