Release v2023.02.15

jfalcou · Feb 15, 2023 · 3d5821f · 3d5821f
1 parent a9d1dcb
commit 3d5821f
Show file tree

Hide file tree

Showing 1,110 changed files with 9,126 additions and 6,349 deletions.
diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -11,9 +11,9 @@ enable_testing()
 ## =================================================================================================
 project(eve LANGUAGES CXX)
 
-set(EVE_MAJOR_VERSION 2022)
-set(EVE_MINOR_VERSION 9)
-set(EVE_PATCH_VERSION 1)
+set(EVE_MAJOR_VERSION 2023)
+set(EVE_MINOR_VERSION 2)
+set(EVE_PATCH_VERSION 15)
 set(EVE_VERSION ${EVE_MAJOR_VERSION}.${EVE_MINOR_VERSION}.${EVE_PATCH_VERSION})
 
 set(PROJECT_VERSION   ${EVE_VERSION})

diff --git a/README.md b/README.md
@@ -7,7 +7,7 @@
 
 ## Purpose
 
-EVE is a reimplementation of the old EVE SIMD library by Falcou et al. which for a while was
+EVE is a re-implementation of the old EVE SIMD library by Falcou et al. which for a while was
 named Boost.SIMD. It's a C++20 and onward implementation of a type based wrapper around
 SIMD extensions sets for most current architectures. It aims at showing how C++20 can be used
 to design and implement efficient, low level, high abstraction library suited for high performances.
@@ -39,22 +39,26 @@ EVE is considered **operationnal**: it's usable, has a large feature sets for a
 
 ### Current roster of supported Instructions Sets
 
+In term of SIMD extension sets, we actively supports (ie code is optimized and regularly tested)
+the following:
+
 Full support with tests:
- - **Intel**
-   - SSE2, SSSE3, SSE3, SSE4.1, SSE4.2
-   - AVX, AVX2, FMA3
-   - AVX512 Skymake style (F,CD,DQ,BW,VL)
- - **ARM**
-   - NEON (64 & 128 bits)
-   - AARCH64
+  - **Intel**
+    - SSE2, SSSE3, SSE3, SSE4.1, SSE4.2
+    - AVX, AVX2, FMA3
+    - AVX512 in SKylake-AVX512 mode (F, CD, VL, DQ, BW)
+  - **ARM**
+    - NEON A32 (64 & 128 bits)
+    - NEON A64 (64 & 128 bits)
+    - ASIMD
+    - SVE with fixed sizes of 128, 256 and 512 bits registers.
 
 Partial/In-progress support with minimal checks:
- - **PowerPC**
-   - VMX
-   - VSX
+  - **PowerPC**
+    - VMX
+    - VSX
 
- - We **do not support** ARM SVE with dynamic as the execution model makes no sense and the current compiler support is not adequate for us. **WOrk is in progress for fixed size SVE support**.
- - We **do not support** GPGPU, this is the job for another tool.
+We **do not support** ARM SVE with dynamic size nor GPGPU, this is the job for another tool.
 
 ### Current roster of supported compiler
 

diff --git a/doc/Doxyfile b/doc/Doxyfile
@@ -5,7 +5,7 @@ DOXYFILE_ENCODING      = UTF-8
 
 PROJECT_NAME           = E.V.E
 PROJECT_NUMBER         =
-PROJECT_BRIEF          = v2022.09.01
+PROJECT_BRIEF          = v2023.02.15
 PROJECT_LOGO           = logo.png
 
 OUTPUT_DIRECTORY       = $(EVE_DOXYGEN_OUPUT)
@@ -846,8 +846,7 @@ HTML_EXTRA_STYLESHEET  += doxygen-awesome-sidebar-only.css
 # files will be copied as-is; there are no commands or markers available.
 # This tag requires that the tag GENERATE_HTML is set to YES.
 
-HTML_EXTRA_FILES   = doxystrap.js
-HTML_EXTRA_FILES  += godbolt.js
+HTML_EXTRA_FILES   = godbolt.js
 HTML_EXTRA_FILES  += fragment.js
 HTML_EXTRA_FILES  += paragraph.js
 HTML_EXTRA_FILES  += eve.bibtex

diff --git a/doc/internals/semantic.md b/doc/internals/semantic.md
@@ -10,8 +10,7 @@ behaviors **EVE** types and functions can exhibit.
 For any [value type](@ref eve::value), the **cardinal** is the number of elements it contains.
 This information is retrieved via the eve::cardinal type trait.
 
-  - For any [scalar type](@ref eve::scalar_value) `T`, `eve::cardinal<T>::type` evaluates to @ref eve::scalar_cardinal.
-  - For any [SIMD type](@ref eve::simd_value) `T`, `eve::cardinal<T>::type` evaluates to `eve::fixed<N>`, where `N` is the number of lanes of the underlying SIMD register.
+For any [SIMD type](@ref eve::simd_value) `T`, `eve::cardinal<T>::type` evaluates to `eve::fixed<N>`, where `N` is the number of lanes of the underlying SIMD register.
 
 Two types are said to be **cardinal compatible** if they have the same cardinal or at least one of them
 is a [scalar type](@ref eve::scalar_value).
@@ -55,12 +54,12 @@ For any [values](@ref eve::value) `x1`, ..., `xn` of types `T1`, ..., `Tn` , a C
 returning a [value](@ref eve::value) of type `R` is said to be **Element-wise** if the expression
 `R r = f(x1, ...,xn)` is semantically equivalent to:
 
-  - if `R` models @ref eve::simd_value:
+  - if `R` models @ref eve::simd_value :
     @code{.cpp}
     R r = [](auto i, auto) { return f(get(x1,i),  ..., get(xn,i)); };
     @endcode
 
-  - if `R` models @ref eve::scalar_value:
+  - if `R` models @ref eve::scalar_value :
     @code{.cpp}
     R r = f(x1,  ..., xn);
     @endcode

diff --git a/doc/page01_info.md b/doc/page01_info.md
@@ -22,7 +22,15 @@ In term of SIMD extension sets, we actively supports (ie code is optimized and r
   - NEON A32 (64 & 128 bits)
   - NEON A64 (64 & 128 bits)
   - ASIMD
+  - SVE with fixed sizes of 128, 256 and 512 bits registers.
 
+Partial/In-progress support with minimal checks:
+ - **PowerPC**
+   - VMX
+   - VSX
+
+ - We **do not support** ARM SVE with dynamic size.
+ - We **do not support** GPGPU, this is the job for another tool.
 The following instructions are tentatively supported (ie code is incomplete and not tested in depth):
 
 - **PowerPC**

diff --git a/doc/page10_changelog.md b/doc/page10_changelog.md
@@ -1,6 +1,44 @@
 Change Log {#changelog}
 ==========
 
+## Version 2023.02.15
+
+Codename: [Perdita Quiescent](https://en.wikipedia.org/wiki/Perdita_(The_Winter%27s_Tale))
+
+### What's Changed
+
+#### Removal and Depreciation
+  * The proba module as been removed. (See #1490) It will be reworked as a separate project later on with a proper API.
+
+#### Architectures/Compilers Support & Fixes
+##### The One Big News for this release: SVE
+
+  * SVE with fixed size, power of 2 cardinal is now supported for most of our API. Some more optimizations are on the way but the support is functional. (See  #1432, #1435, #1436, #1437, #1438, #1448, #1453, #1455, #1456, #1463, #1464, #1472, #1473, #1474,#1487, #1489, #1491, #1494, #1533, #1556)
+
+##### Other Fixes
+  * **EVE** now compiles on ARM M1 with Homebrew g++. (See #1471)
+  * **EVE** now compiles with Apple Clang (See #1479, #1485, #1502, #1530, #1531)
+  * Various efforts have been made to advance the MSVC situation. Help still welcomed.
+  * ARM all and any are now more efficient (See #1500)
+  * Integral sign/signnz functions are now more efficient on x86 (See #1499)
+  * Implementation for X86 AVX2/AVX512 gather and masked gather are now optimized. (See #1526)
+
+#### Features
+  * **jtlap** implemented a large amount of new functions for `eve::complex`.
+  * Functions like exp, log or sqrt can now be called with a real entry and a complex output. (See #1528)
+  * Binary functions for which a n-ary extensions is available now support being called with a tuple-like parameter instead of a dynamic range (See #1422, #1509)
+  * The minmax function is now available. (See #1507)
+  * **DenisYaroshevskiy** implemented new traits for algorithms to take care of costly kernels, no alignment and to support fused operations. (See #1535, #1543)
+
+#### Bug Fixes
+  * Convert is now more efficient and don't generate piecewise evaluation in some scenario involving logicals. (See #1447, #1428)
+  * `if_else` now uses the proper constant generator in optimized cases. (See #1529)
+  * Prevent constant to erroneously be callable with non-specific product types. (See #1540)
+  * A large cleanup of old traits and concepts has been done. The basics concepts around vectorizable and vectorized types has been therefore simplified and streamlined. (See #1468, #1477, #1527, #1488, #1545)
+  * Fix issue with dynamic SIMD extension detection that were broken by accident. (See #1504)
+
+**Full Changelog**: https://github.com/jfalcou/eve/compare/v2022.09.1...2023.02.15
+
 ## Version 2022.09.1
 
 Codename: [Rosalind Serendipitous](https://en.wikipedia.org/wiki/Rosalind_(As_You_Like_It))
@@ -57,13 +95,13 @@ Starting this fall, we will also try to provide more regular release.
   - Updated tests and infrastructure to use latest TTS to speedup compile times (See #1313)
   - Add find_package support (#1318)
   - Provide CMake machinery and example for multi-arch support (See #1321)
-  - Refactored EVE's exported CMake target and installation by @justend29 in (See #1336)
-  - Automated integration tests and correct their fetches by @justend29 in (See #1338)
+  - Refactored EVE's exported CMake target and installation by **justend29** in (See #1336)
+  - Automated integration tests and correct their fetches by **justend29** in (See #1338)
 
 * Documentation
   - Add link to EVE bibtex (See #1282)
   - Documentation style and layout changed to become more readable (See #1299)
-  - README: Fix links to website by @Simran-B (See #1303)
+  - README: Fix links to website by **Simran-B** (See #1303)
   - Added more documentation for algorithms (See #1349)
   - Add a local doxygen generation target to simplify documentation works (See #1392)
 
@@ -76,8 +114,8 @@ Starting this fall, we will also try to provide more regular release.
 ### New Contributors
 Thanks to all our new contributor for this release!
 
-  - @Simran-B made their first contribution in (See #1303)
-  - @justend29 made their first contribution in (See #1338)
+  - **Simran-B** made their first contribution in (See #1303)
+  - **justend29** made their first contribution in (See #1338)
 
 ## Version 2022.03.0
 
@@ -98,16 +136,16 @@ including WASM and **gasp** fixed size SVE.
  - Revamped docs to add basic 101 tutorials
  - Fixed most documentation to provide Compiler-Explorer-aware samples
  - Correct tutorial example code for if_else
- - Various proofreading by @pauljurczak and @toughengineer
+ - Various proofreading by **pauljurczak** and **toughengineer**
 
 * Improvements on compress (#947, #1013, #1037, #1213)
   - compress_store is a very important function that has been reimplemented to simplify
     its implementation in term of support for iterators
   - better implementation for X86 architectures (SSE2 and BMI).
-  - provide an alternative implementation based on switch ()
+  - provide an alternative implementation based on switch
 
 * Improvements on algorithms
-`@DenisYaroshevskiy` did a wonderful job on this front.
+`**DenisYaroshevskiy**` did a wonderful job on this front.
   - New algorithm: `reverse` (#1066, #1068)
   - New algorithm: `reverse_copy` (#1060)
   - New algorithm: `iota` (#1016)
@@ -124,7 +162,7 @@ including WASM and **gasp** fixed size SVE.
 
 * Convert
   - Fixed missing code for `eve::convert`. All convert calls now produce optimal code.
-  - More specifically and addition to global fixes, @aguinet contributed:
+  - More specifically and addition to global fixes, **aguinet** contributed:
     - Improvement for u64 => u32 when using AVX2
     - Improvement for u16 => u8 when using AVX2
     - Improvement for u64=>u32 using AVX2 + clang
@@ -138,7 +176,7 @@ including WASM and **gasp** fixed size SVE.
   - Implement AVX512 logical pair interleave using BMI parallel bit deposit
 
 * Build systems
-  - Install directory fix (Thanks `@JPenuchot`)
+  - Install directory fix (Thanks `**JPenuchot`)
   - Prevent CMake error if EVE_BUILD_TEST is set to OFF (#1032)
   - Fix bench compilation issues (#1136)
   - Add CI tests for clang++ with -std=libc++ (#614)
@@ -173,11 +211,11 @@ including WASM and **gasp** fixed size SVE.
 ### New Contributors
 Thanks to all our new contributor for this release!
 
-  * `@aguinet` made their first contribution in https://github.com/jfalcou/eve/pull/1049
-  * `@JPenuchot` made their first contribution in https://github.com/jfalcou/eve/pull/1028
-  * `@pauljurczak` made their first contribution in https://github.com/jfalcou/eve/pull/1123
-  * `@the-moisrex` made their first contribution in https://github.com/jfalcou/eve/pull/1025
-  * `@toughengineer` made their first contribution in https://github.com/jfalcou/eve/pull/1182
+  * `**aguinet**` made their first contribution in https://github.com/jfalcou/eve/pull/1049
+  * `**JPenuchot**` made their first contribution in https://github.com/jfalcou/eve/pull/1028
+  * `**pauljurczak**` made their first contribution in https://github.com/jfalcou/eve/pull/1123
+  * `**the-moisrex**` made their first contribution in https://github.com/jfalcou/eve/pull/1025
+  * `**toughengineer**` made their first contribution in https://github.com/jfalcou/eve/pull/1182
 
 ## Version 2021.10.0
 

diff --git a/doc/tutorial/frequency-scaling.hpp b/doc/tutorial/frequency-scaling.hpp
@@ -12,7 +12,7 @@ This is why, for example, if you look at libc, it at most uses 32 byte registers
 sure you might speed up the strlen somewhat but then all the code after will be
 slower.
 
-For big datasets the price of lower frequency is often outweighted by processing
+For big datasets the price of lower frequency is often outweighed by processing
 more numbers in open operation and seed ups of 15% are not unheard of.
 
 This lead to a dilemma in the API design for us: if the user is on the AVX512 system,
@@ -24,8 +24,8 @@ There are also a typedefs `nofs_wide`, `nofs_logical` where `nofs` stands for
 "no frequency scaling".
 
 @note: other than on avx512 on intel we always use the maximum width of the register,
-since we expect the compiler to do it anyways and it is usally accepted.
-If you want to set a specific cardianl for an algorithm, you can always use
+since we expect the compiler to do it anyways and it is usually accepted.
+If you want to set a specific cardinal for an algorithm, you can always use
 `eve::algo::force_cardinal`.
 
 **/
diff --git a/docs/structeve_1_1scalar__cardinal.html → docs/Frequency.html b/docs/structeve_1_1scalar__cardinal.html → docs/Frequency.html
@@ -6,7 +6,7 @@
 <meta http-equiv="X-UA-Compatible" content="IE=11"/>
 <meta name="generator" content="Doxygen 1.9.5"/>
 <meta name="viewport" content="width=device-width, initial-scale=1"/>
-<title>E.V.E: eve::scalar_cardinal Struct Reference</title>
+<title>E.V.E: Scaling.</title>
 <link href="tabs.css" rel="stylesheet" type="text/css"/>
 <script type="text/javascript" src="jquery.js"></script>
 <script type="text/javascript" src="dynsections.js"></script>
@@ -56,7 +56,7 @@
   <td id="projectalign">
    <div id="projectname">E.V.E
    </div>
-   <div id="projectbrief">v2022.09.01</div>
+   <div id="projectbrief">v2023.02.15</div>
   </td>
  </tr>
    <tr><td colspan="2" style="padding: 20px 0px 0px 0px;">        <div id="MSearchBox" class="MSearchBoxInactive">
@@ -98,7 +98,7 @@
 </div>
 <script type="text/javascript">
 /* @license magnet:?xt=urn:btih:d3d9a9a6595521f9666a5e94cc830dab83b65699&amp;dn=expat.txt MIT */
-$(document).ready(function(){initNavTree('structeve_1_1scalar__cardinal.html',''); initResizable(); });
+$(document).ready(function(){initNavTree('Frequency.html',''); initResizable(); });
 /* @license-end */
 </script>
 <div id="doc-content">
@@ -116,26 +116,22 @@
 </iframe>
 </div>
 
-<div class="header">
-  <div class="summary">
-<a href="structeve_1_1scalar__cardinal-members.html">List of all members</a>  </div>
-  <div class="headertitle"><div class="title">eve::scalar_cardinal Struct Reference<div class="ingroups"><a class="el" href="group__simd.html">EVE</a> &raquo; <a class="el" href="group__simd__types.html">SIMD related types</a></div></div></div>
+<div><div class="header">
+  <div class="headertitle"><div class="title">Scaling. </div></div>
 </div><!--header-->
 <div class="contents">
-
-<p>Cardinal type for scalar values.  
- <a href="structeve_1_1scalar__cardinal.html#details">More...</a></p>
-
-<p><code>#include &lt;eve/arch/cardinals.hpp&gt;</code></p>
-<a name="details" id="details"></a><h2 class="groupheader">Detailed Description</h2>
-<div class="textblock"><p >Cardinal type for scalar values. </p>
+<div class="textblock"><p >In SIMD programming there is a known issue of processor frequency scaling: when working with wider registers, in order to avoid overheating, some processors limit their CPU frequency. There are a lot of situations where this can happen but it is a noticeable problem mostly for 64 byte registers on intel avx512 cpus.</p>
+<p >This is why, for example, if you look at libc, it at most uses 32 byte registers: sure you might speed up the strlen somewhat but then all the code after will be slower.</p>
+<p >For big datasets the price of lower frequency is often outweighed by processing more numbers in open operation and seed ups of 15% are not unheard of.</p>
+<p >This lead to a dilemma in the API design for us: if the user is on the AVX512 system, most likely they expect the register to be 64 bytes. But we suspect this is not what they actually want. So we decided that <code><a class="el" href="structeve_1_1wide.html" title="Wrapper for SIMD registers.">eve::wide</a></code> on avx512 is by default 64 bytes but algorithms by default use 32 bytes. If you want to get an algorithm to use 64 byte you can pass <code>eve::algo::allow_frequency_scaling</code> trait. There are also a typedefs <code>nofs_wide</code>, <code>nofs_logical</code> where <code>nofs</code> stands for "no frequency scaling".</p>
+<dl class="section note"><dt>Note</dt><dd>: other than on avx512 on intel we always use the maximum width of the register, since we expect the compiler to do it anyways and it is usually accepted. If you want to set a specific cardinal for an algorithm, you can always use <code>eve::algo::force_cardinal</code>. </dd></dl>
 </div></div><!-- contents -->
+</div><!-- PageDoc -->
 </div><!-- doc-content -->
 <!-- HTML footer for doxygen 1.8.20-->
 <!-- start footer part -->
 <div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
   <ul>
-    <li class="navelem"><a class="el" href="namespaceeve.html">eve</a></li><li class="navelem"><a class="el" href="structeve_1_1scalar__cardinal.html">scalar_cardinal</a></li>
   </ul>
 </div>
 </div> <!-- DOXYSTRAP RELATED -->