Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Start tet1d module * Update tet1d module * Add CUDA support for tet1d module * Add scalar input support in tet1d module * simplify command line arguments for hatch.py * Add support for generation of modules * Make tet1d a module into the new system * Add scalar version of some function from libm + reinterpret * Fix refactoring * Before merging master * Fixes after merge * For backup * Fix CUDA * Add forgotten files * Working => backup * Fixes * All tests are passing f16 included * For backup * COVID-19 * Fixes * Fixes * Fixes: all test compile with nvcc * ROCm support, addition on f32 and f16 are compiling * TET1D tests are compiling with both nvcc and hipcc * Merge CUDA and ROCm when code is the same * Forgot files * Now we can list generated files * Forgot to merge nsimd.h * Forgot to push * Update .gitignore with the new file generated by the tet1d module. * Return allocated arrays for tests * Increase the minimum size of the tests array * Fix segfault * Fix segfault * Add mask[oz]_load[zu] and mask_store[au] operators for CPU * For backup * For backup * Fix for SSE * Fix fma for C89 * Remove warning from GCC when using long long in C98 and C++98 * Fix warnings for C98 and C++98 and AVX512 * Add set1l, iota, mask_for_loop_tail for ARM * Before merging master * Fix ARM mask[oz]_load[au] * Fixes for ARM SVE * Fix warning when using __f16's * Add alignment-templated masked loads/stores * Rewrite friendly_but_not_optimized stuff * Forgot file * Fix ARM * Fix ARM * Cosmetic * Backup * Backup * Backup * Backup * Forgot file * For backup * For backup * Refactoring of documentation * Add build.nsconfig + fix warning in fixed_point exp * Fix warning in SPMd module * Add forgotten file * Fixes for CUDA * Fixes for CPU * Fixes * Add gather/scatter for cpu and x86 * Add gather/scatter for arm (not tested yet) * Fix gather/scatter for arm * Deactivate tet1d module * Cleanup * Add scripts for building * Fix setup and build script for Linux * Changing computer * Backup * Fix script/setup.sh * Fixes for fixed size SVE * Fix Windows scripts * Fix scripts for Linux * Fix Makefile.nix for md2html * Fix Makefile.win for md2html * Fix generation of documentation * Add mask scatter for cpu * Add mask_scatter for x86 * Forgot a file * Add mask_scatter for arm * Add masked gather for cpu * Add masked gather for x86 * Add masked gather for arm * Fix masked gather for f16's * Adapt SVE typedefs to new GCC 10 * Fixes for x86 * Fix tet1d tests for CUDA * Fixes for HIP * Fix warning fr ROCm/HIP * Various fixes * Fix tests for rec11, rec8, rsqrt11 and rsqrt8 * Fix rec11, rec8, rsqrt11, rsqrt8 tests * Improve gather/scatter for neon128 and aarch64 * Add gather_linear + scatter_linear and remove masked gather and scatter * Add linear gather + scatter * Fix gather_linear for neon128 + aarch64 * Improve gather on aarch64 + neon128 * Add documentation for module TET1d * Update README * Add documentation for module TET1d * Improve README with nsconfig stuff * Improve README * Improve README * Improve README * Improve README * Fix warning for armclang * Fix warning when compiling with Clang and C++98/03 * Fix generation of benches * For backup * First version (not finished yet) * Add support for non closed operators * Improve doc * Improve documentation * More fixes * Fix broken link in README * Add CONTRIBUTING.md * Improve documentation * Improve documentation * Improve documentation + simplify scoped_aligned_mem_for * Fix scoped_aligned_mem * Fixed errors in nsimd.h * Improve documentation * Improve documentation * Improve documentation * Replace some print left by common.myprint * Fixed multiple declarations * Let benches generate despite the new function set1l * Add a module offering a vectorized random generator * Only generate rand module if flags passed from hatch are correct * Removed F-strings * Fix build.nsconfig * Fix generation of rand module * Building the library does not require C++14 anymore, C++98 is more than sufficient * Update README * Update README * Setup.sh clone nstools using the same protocol as nsimd * Add possibility to ignore tests/benches/... * Add C++20 concepts to nsimd.h * Add C++20 concepts to cxx_adv_api.hpp * Add C++20 concepts to Python-generated functions * Fix C++20 concepts * Prepare support for oneAPI * Add C++20 concepts doc * Modify the rand module to allow generation with python 3.5 and earlier * Improve doc + rename module rand --> random * Fix menu of doc of random module * Fix availability of scoped_mem... * Fix tests to_pack* * Tests are dependant of the SIMD architecture * Improvements for Intel + Fixes for KNL * More fixes for KNL and C89 * More fixes * Fix fms/fnms for aarch64 * Fixes for SVE * Fix warning whe compiling for 32-bits targets * Cleaning in tests generation * Fix ULP bounds for some operators * Almost all tests are passing on 32-bits platform * No more warning for 32-bits compilations * Forgot a file * Fix last errors in philox * First version of quick'n'dirty CI * Fix warnings * Fix more warnings * Fix Pyhon generation for module/random * Fix fnms for SSE2 and SSE42 * Try again to fix warnings for GCC * Fix warnings for Clang * Add variable to compile for a given CUDA GPU * Fix warnings for ROCm/HIP * Fix CUDA f16 implementation * Fix CUDA f16 implementation * Fix CUDA f16 implementation * Reduce size of arrays for GPU testing * Reduce size of arrays for GPU testing * Compile .so with nvcc and hipcc for binary compatibility * Fix build.nsconfig * Fix build.nsconfig * Fix build.nsconfig * Fix build.nsconfig * Improve CI script + add static in NSIMD_INLINE * Fix build.nsconfig for HIP * Last fixes * Fix issue: __popcnt64 not available in 32-bits mode * Fix DLL specifier of *logulps* * Fix MSVC 32-bits related issues * Cosmetic * Add __vectorcall for MSVC 32-bits * Update .gitignore Co-authored-by: Lénaïc Bagnères <[email protected]> Co-authored-by: Lénaïc Bagnères <[email protected]> Co-authored-by: Paul Gannay <[email protected]> Co-authored-by: c <[email protected]> Co-authored-by: Adrien Arnaud <[email protected]> Co-authored-by: Rodolphe Cargnello <[email protected]>
- Loading branch information