Merge for v2 (#77) · agenium-scale/nsimd@eb11949

Commit

Merge for v2 (#77)

* Start tet1d module

* Update tet1d module

* Add CUDA support for tet1d module

* Add scalar input support in tet1d module

* simplify command line arguments for hatch.py

* Add support for generation of modules

* Make tet1d a module into the new system

* Add scalar version of some function from libm + reinterpret

* Fix refactoring

* Before merging master

* Fixes after merge

* For backup

* Fix CUDA

* Add forgotten files

* Working => backup

* Fixes

* All tests are passing f16 included

* For backup

* COVID-19

* Fixes

* Fixes

* Fixes: all test compile with nvcc

* ROCm support, addition on f32 and f16 are compiling

* TET1D tests are compiling with both nvcc and hipcc

* Merge CUDA and ROCm when code is the same

* Forgot files

* Now we can list generated files

* Forgot to merge nsimd.h

* Forgot to push

* Update .gitignore with the new file generated by the tet1d module.

* Return allocated arrays for tests

* Increase the minimum size of the tests array

* Fix segfault

* Fix segfault

* Add mask[oz]_load[zu] and mask_store[au] operators for CPU

* For backup

* For backup

* Fix for SSE

* Fix fma for C89

* Remove warning from GCC when using long long in C98 and C++98

* Fix warnings for C98 and C++98 and AVX512

* Add set1l, iota, mask_for_loop_tail for ARM

* Before merging master

* Fix ARM mask[oz]_load[au]

* Fixes for ARM SVE

* Fix warning when using __f16's

* Add alignment-templated masked loads/stores

* Rewrite friendly_but_not_optimized stuff

* Forgot file

* Fix ARM

* Fix ARM

* Cosmetic

* Backup

* Backup

* Backup

* Backup

* Forgot file

* For backup

* For backup

* Refactoring of documentation

* Add build.nsconfig + fix warning in fixed_point exp

* Fix warning in SPMd module

* Add forgotten file

* Fixes for CUDA

* Fixes for CPU

* Fixes

* Add gather/scatter for cpu and x86

* Add gather/scatter for arm (not tested yet)

* Fix gather/scatter for arm

* Deactivate tet1d module

* Cleanup

* Add scripts for building

* Fix setup and build script for Linux

* Changing computer

* Backup

* Fix script/setup.sh

* Fixes for fixed size SVE

* Fix Windows scripts

* Fix scripts for Linux

* Fix Makefile.nix for md2html

* Fix Makefile.win for md2html

* Fix generation of documentation

* Add mask scatter for cpu

* Add mask_scatter for x86

* Forgot a file

* Add mask_scatter for arm

* Add masked gather for cpu

* Add masked gather for x86

* Add masked gather for arm

* Fix masked gather for f16's

* Adapt SVE typedefs to new GCC 10

* Fixes for x86

* Fix tet1d tests for CUDA

* Fixes for HIP

* Fix warning fr ROCm/HIP

* Various fixes

* Fix tests for rec11, rec8, rsqrt11 and rsqrt8

* Fix rec11, rec8, rsqrt11, rsqrt8 tests

* Improve gather/scatter for neon128 and aarch64

* Add gather_linear + scatter_linear and remove masked gather and scatter

* Add linear gather + scatter

* Fix gather_linear for neon128 + aarch64

* Improve gather on aarch64 + neon128

* Add documentation for module TET1d

* Update README

* Add documentation for module TET1d

* Improve README with nsconfig stuff

* Improve README

* Improve README

* Improve README

* Improve README

* Fix warning for armclang

* Fix warning when compiling with Clang and C++98/03

* Fix generation of benches

* For backup

* First version (not finished yet)

* Add support for non closed operators

* Improve doc

* Improve documentation

* More fixes

* Fix broken link in README

* Add CONTRIBUTING.md

* Improve documentation

* Improve documentation

* Improve documentation + simplify scoped_aligned_mem_for

* Fix scoped_aligned_mem

* Fixed errors in nsimd.h

* Improve documentation

* Improve documentation

* Improve documentation

* Replace some print left by common.myprint

* Fixed multiple declarations

* Let benches generate despite the new function set1l

* Add a module offering a vectorized random generator

* Only generate rand module if flags passed from hatch are correct

* Removed F-strings

* Fix build.nsconfig

* Fix generation of rand module

* Building the library does not require C++14 anymore, C++98 is more than sufficient

* Update README

* Update README

* Setup.sh clone nstools using the same protocol as nsimd

* Add possibility to ignore tests/benches/...

* Add C++20 concepts to nsimd.h

* Add C++20 concepts to cxx_adv_api.hpp

* Add C++20 concepts to Python-generated functions

* Fix C++20 concepts

* Prepare support for oneAPI

* Add C++20 concepts doc

* Modify the rand module to allow generation with python 3.5 and earlier

* Improve doc + rename module rand --> random

* Fix menu of doc of random module

* Fix availability of scoped_mem...

* Fix tests to_pack*

* Tests are dependant of the SIMD architecture

* Improvements for Intel + Fixes for KNL

* More fixes for KNL and C89

* More fixes

* Fix fms/fnms for aarch64

* Fixes for SVE

* Fix warning whe compiling for 32-bits targets

* Cleaning in tests generation

* Fix ULP bounds for some operators

* Almost all tests are passing on 32-bits platform

* No more warning for 32-bits compilations

* Forgot a file

* Fix last errors in philox

* First version of quick'n'dirty CI

* Fix warnings

* Fix more warnings

* Fix Pyhon generation for module/random

* Fix fnms for SSE2 and SSE42

* Try again to fix warnings for GCC

* Fix warnings for Clang

* Add variable to compile for a given CUDA GPU

* Fix warnings for ROCm/HIP

* Fix CUDA f16 implementation

* Fix CUDA f16 implementation

* Fix CUDA f16 implementation

* Reduce size of arrays for GPU testing

* Reduce size of arrays for GPU testing

* Compile .so with nvcc and hipcc for binary compatibility

* Fix build.nsconfig

* Fix build.nsconfig

* Fix build.nsconfig

* Fix build.nsconfig

* Improve CI script + add static in NSIMD_INLINE

* Fix build.nsconfig for HIP

* Last fixes

* Fix issue: __popcnt64 not available in 32-bits mode

* Fix DLL specifier of *logulps*

* Fix MSVC 32-bits related issues

* Cosmetic

* Add __vectorcall for MSVC 32-bits

* Update .gitignore

Co-authored-by: Lénaïc Bagnères <[email protected]>
Co-authored-by: Lénaïc Bagnères <[email protected]>
Co-authored-by: Paul Gannay <[email protected]>
Co-authored-by: c <[email protected]>
Co-authored-by: Adrien Arnaud <[email protected]>
Co-authored-by: Rodolphe Cargnello <[email protected]>

Loading branch information

7 people authored Dec 11, 2020

1 parent df84e57 commit eb11949

.clang-format

Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		Standard: Cpp03
		ColumnLimit: 79

.gitignore

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -1,34 +1,65 @@
  
    ## Build system

    build

    # Common build dirs

    build*/

    ## Auto-generated

    # Dependencies

    nstools/

    # Binaries

    *.o

    *.so

    *.pyc

    *.exe

    *.dll

    *.dylib

    # Generated files

    ## API

    src/api_*.cpp

    src/api_*

    ## Plateform specific code

    include/nsimd/arm

    include/nsimd/cpu

    include/nsimd/cxx_adv_api_functions.hpp

    include/nsimd/friendly_but_not_optimized.hpp

    include/nsimd/functions.h

    include/nsimd/ppc

    include/nsimd/x86

    src/api_*

    ## Tests

    tests/c_base

    tests/cxx_base

    tests/cxx_adv

    tests/modules/tet1d/

    tests/modules/fixed_point/

    tests/modules/rand/*.cpp

    tests/modules/spmd/

    tests/modules/random/

    ## Benches

    benches/cxx_adv

    _deps

    _install

    doc/html

    ## Modules

    include/nsimd/modules/tet1d/

    include/nsimd/modules/spmd/

    include/nsimd/modules/fixed_point/

    include/nsimd/scalar_utilities.h

    ## Doc

    doc/html

    doc/markdown/overview.md

    doc/markdown/api.md

    doc/markdown/api_*.md

    doc/markdown/module_fixed_point_api*.md

    doc/markdown/module_fixed_point_overview.md

    doc/markdown/module_spmd_api*.md

    doc/markdown/module_spmd_overview.md

    doc/markdown/module_memory_management_overview.md

    doc/md2html

    doc/tmp.html

    ## Ulps

    ulps/

    ## CI

    _ci/

CMakeLists.txt

This file was deleted.

0 comments on commit `eb11949`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `eb11949`

Commit

There are no files selected for viewing

0 comments on commit eb11949

0 comments on commit `eb11949`