Skip to content

Commit

Permalink
Release v2023.02.15
Browse files Browse the repository at this point in the history
  • Loading branch information
jfalcou authored Feb 15, 2023
1 parent a9d1dcb commit 3d5821f
Show file tree
Hide file tree
Showing 1,110 changed files with 9,126 additions and 6,349 deletions.
6 changes: 3 additions & 3 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,9 @@ enable_testing()
## =================================================================================================
project(eve LANGUAGES CXX)

set(EVE_MAJOR_VERSION 2022)
set(EVE_MINOR_VERSION 9)
set(EVE_PATCH_VERSION 1)
set(EVE_MAJOR_VERSION 2023)
set(EVE_MINOR_VERSION 2)
set(EVE_PATCH_VERSION 15)
set(EVE_VERSION ${EVE_MAJOR_VERSION}.${EVE_MINOR_VERSION}.${EVE_PATCH_VERSION})

set(PROJECT_VERSION ${EVE_VERSION})
Expand Down
30 changes: 17 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

## Purpose

EVE is a reimplementation of the old EVE SIMD library by Falcou et al. which for a while was
EVE is a re-implementation of the old EVE SIMD library by Falcou et al. which for a while was
named Boost.SIMD. It's a C++20 and onward implementation of a type based wrapper around
SIMD extensions sets for most current architectures. It aims at showing how C++20 can be used
to design and implement efficient, low level, high abstraction library suited for high performances.
Expand Down Expand Up @@ -39,22 +39,26 @@ EVE is considered **operationnal**: it's usable, has a large feature sets for a

### Current roster of supported Instructions Sets

In term of SIMD extension sets, we actively supports (ie code is optimized and regularly tested)
the following:

Full support with tests:
- **Intel**
- SSE2, SSSE3, SSE3, SSE4.1, SSE4.2
- AVX, AVX2, FMA3
- AVX512 Skymake style (F,CD,DQ,BW,VL)
- **ARM**
- NEON (64 & 128 bits)
- AARCH64
- **Intel**
- SSE2, SSSE3, SSE3, SSE4.1, SSE4.2
- AVX, AVX2, FMA3
- AVX512 in SKylake-AVX512 mode (F, CD, VL, DQ, BW)
- **ARM**
- NEON A32 (64 & 128 bits)
- NEON A64 (64 & 128 bits)
- ASIMD
- SVE with fixed sizes of 128, 256 and 512 bits registers.

Partial/In-progress support with minimal checks:
- **PowerPC**
- VMX
- VSX
- **PowerPC**
- VMX
- VSX

- We **do not support** ARM SVE with dynamic as the execution model makes no sense and the current compiler support is not adequate for us. **WOrk is in progress for fixed size SVE support**.
- We **do not support** GPGPU, this is the job for another tool.
We **do not support** ARM SVE with dynamic size nor GPGPU, this is the job for another tool.

### Current roster of supported compiler

Expand Down
5 changes: 2 additions & 3 deletions doc/Doxyfile
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ DOXYFILE_ENCODING = UTF-8

PROJECT_NAME = E.V.E
PROJECT_NUMBER =
PROJECT_BRIEF = v2022.09.01
PROJECT_BRIEF = v2023.02.15
PROJECT_LOGO = logo.png

OUTPUT_DIRECTORY = $(EVE_DOXYGEN_OUPUT)
Expand Down Expand Up @@ -846,8 +846,7 @@ HTML_EXTRA_STYLESHEET += doxygen-awesome-sidebar-only.css
# files will be copied as-is; there are no commands or markers available.
# This tag requires that the tag GENERATE_HTML is set to YES.

HTML_EXTRA_FILES = doxystrap.js
HTML_EXTRA_FILES += godbolt.js
HTML_EXTRA_FILES = godbolt.js
HTML_EXTRA_FILES += fragment.js
HTML_EXTRA_FILES += paragraph.js
HTML_EXTRA_FILES += eve.bibtex
Expand Down
7 changes: 3 additions & 4 deletions doc/internals/semantic.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,7 @@ behaviors **EVE** types and functions can exhibit.
For any [value type](@ref eve::value), the **cardinal** is the number of elements it contains.
This information is retrieved via the eve::cardinal type trait.

- For any [scalar type](@ref eve::scalar_value) `T`, `eve::cardinal<T>::type` evaluates to @ref eve::scalar_cardinal.
- For any [SIMD type](@ref eve::simd_value) `T`, `eve::cardinal<T>::type` evaluates to `eve::fixed<N>`, where `N` is the number of lanes of the underlying SIMD register.
For any [SIMD type](@ref eve::simd_value) `T`, `eve::cardinal<T>::type` evaluates to `eve::fixed<N>`, where `N` is the number of lanes of the underlying SIMD register.

Two types are said to be **cardinal compatible** if they have the same cardinal or at least one of them
is a [scalar type](@ref eve::scalar_value).
Expand Down Expand Up @@ -55,12 +54,12 @@ For any [values](@ref eve::value) `x1`, ..., `xn` of types `T1`, ..., `Tn` , a C
returning a [value](@ref eve::value) of type `R` is said to be **Element-wise** if the expression
`R r = f(x1, ...,xn)` is semantically equivalent to:

- if `R` models @ref eve::simd_value:
- if `R` models @ref eve::simd_value :
@code{.cpp}
R r = [](auto i, auto) { return f(get(x1,i), ..., get(xn,i)); };
@endcode

- if `R` models @ref eve::scalar_value:
- if `R` models @ref eve::scalar_value :
@code{.cpp}
R r = f(x1, ..., xn);
@endcode
Expand Down
8 changes: 8 additions & 0 deletions doc/page01_info.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,15 @@ In term of SIMD extension sets, we actively supports (ie code is optimized and r
- NEON A32 (64 & 128 bits)
- NEON A64 (64 & 128 bits)
- ASIMD
- SVE with fixed sizes of 128, 256 and 512 bits registers.

Partial/In-progress support with minimal checks:
- **PowerPC**
- VMX
- VSX

- We **do not support** ARM SVE with dynamic size.
- We **do not support** GPGPU, this is the job for another tool.
The following instructions are tentatively supported (ie code is incomplete and not tested in depth):

- **PowerPC**
Expand Down
68 changes: 53 additions & 15 deletions doc/page10_changelog.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,44 @@
Change Log {#changelog}
==========

## Version 2023.02.15

Codename: [Perdita Quiescent](https://en.wikipedia.org/wiki/Perdita_(The_Winter%27s_Tale))

### What's Changed

#### Removal and Depreciation
* The proba module as been removed. (See #1490) It will be reworked as a separate project later on with a proper API.

#### Architectures/Compilers Support & Fixes
##### The One Big News for this release: SVE

* SVE with fixed size, power of 2 cardinal is now supported for most of our API. Some more optimizations are on the way but the support is functional. (See #1432, #1435, #1436, #1437, #1438, #1448, #1453, #1455, #1456, #1463, #1464, #1472, #1473, #1474,#1487, #1489, #1491, #1494, #1533, #1556)

##### Other Fixes
* **EVE** now compiles on ARM M1 with Homebrew g++. (See #1471)
* **EVE** now compiles with Apple Clang (See #1479, #1485, #1502, #1530, #1531)
* Various efforts have been made to advance the MSVC situation. Help still welcomed.
* ARM all and any are now more efficient (See #1500)
* Integral sign/signnz functions are now more efficient on x86 (See #1499)
* Implementation for X86 AVX2/AVX512 gather and masked gather are now optimized. (See #1526)

#### Features
* **jtlap** implemented a large amount of new functions for `eve::complex`.
* Functions like exp, log or sqrt can now be called with a real entry and a complex output. (See #1528)
* Binary functions for which a n-ary extensions is available now support being called with a tuple-like parameter instead of a dynamic range (See #1422, #1509)
* The minmax function is now available. (See #1507)
* **DenisYaroshevskiy** implemented new traits for algorithms to take care of costly kernels, no alignment and to support fused operations. (See #1535, #1543)

#### Bug Fixes
* Convert is now more efficient and don't generate piecewise evaluation in some scenario involving logicals. (See #1447, #1428)
* `if_else` now uses the proper constant generator in optimized cases. (See #1529)
* Prevent constant to erroneously be callable with non-specific product types. (See #1540)
* A large cleanup of old traits and concepts has been done. The basics concepts around vectorizable and vectorized types has been therefore simplified and streamlined. (See #1468, #1477, #1527, #1488, #1545)
* Fix issue with dynamic SIMD extension detection that were broken by accident. (See #1504)

**Full Changelog**: https://github.com/jfalcou/eve/compare/v2022.09.1...2023.02.15

## Version 2022.09.1

Codename: [Rosalind Serendipitous](https://en.wikipedia.org/wiki/Rosalind_(As_You_Like_It))
Expand Down Expand Up @@ -57,13 +95,13 @@ Starting this fall, we will also try to provide more regular release.
- Updated tests and infrastructure to use latest TTS to speedup compile times (See #1313)
- Add find_package support (#1318)
- Provide CMake machinery and example for multi-arch support (See #1321)
- Refactored EVE's exported CMake target and installation by @justend29 in (See #1336)
- Automated integration tests and correct their fetches by @justend29 in (See #1338)
- Refactored EVE's exported CMake target and installation by **justend29** in (See #1336)
- Automated integration tests and correct their fetches by **justend29** in (See #1338)

* Documentation
- Add link to EVE bibtex (See #1282)
- Documentation style and layout changed to become more readable (See #1299)
- README: Fix links to website by @Simran-B (See #1303)
- README: Fix links to website by **Simran-B** (See #1303)
- Added more documentation for algorithms (See #1349)
- Add a local doxygen generation target to simplify documentation works (See #1392)

Expand All @@ -76,8 +114,8 @@ Starting this fall, we will also try to provide more regular release.
### New Contributors
Thanks to all our new contributor for this release!

- @Simran-B made their first contribution in (See #1303)
- @justend29 made their first contribution in (See #1338)
- **Simran-B** made their first contribution in (See #1303)
- **justend29** made their first contribution in (See #1338)

## Version 2022.03.0

Expand All @@ -98,16 +136,16 @@ including WASM and **gasp** fixed size SVE.
- Revamped docs to add basic 101 tutorials
- Fixed most documentation to provide Compiler-Explorer-aware samples
- Correct tutorial example code for if_else
- Various proofreading by @pauljurczak and @toughengineer
- Various proofreading by **pauljurczak** and **toughengineer**

* Improvements on compress (#947, #1013, #1037, #1213)
- compress_store is a very important function that has been reimplemented to simplify
its implementation in term of support for iterators
- better implementation for X86 architectures (SSE2 and BMI).
- provide an alternative implementation based on switch ()
- provide an alternative implementation based on switch

* Improvements on algorithms
`@DenisYaroshevskiy` did a wonderful job on this front.
`**DenisYaroshevskiy**` did a wonderful job on this front.
- New algorithm: `reverse` (#1066, #1068)
- New algorithm: `reverse_copy` (#1060)
- New algorithm: `iota` (#1016)
Expand All @@ -124,7 +162,7 @@ including WASM and **gasp** fixed size SVE.

* Convert
- Fixed missing code for `eve::convert`. All convert calls now produce optimal code.
- More specifically and addition to global fixes, @aguinet contributed:
- More specifically and addition to global fixes, **aguinet** contributed:
- Improvement for u64 => u32 when using AVX2
- Improvement for u16 => u8 when using AVX2
- Improvement for u64=>u32 using AVX2 + clang
Expand All @@ -138,7 +176,7 @@ including WASM and **gasp** fixed size SVE.
- Implement AVX512 logical pair interleave using BMI parallel bit deposit

* Build systems
- Install directory fix (Thanks `@JPenuchot`)
- Install directory fix (Thanks `**JPenuchot`)
- Prevent CMake error if EVE_BUILD_TEST is set to OFF (#1032)
- Fix bench compilation issues (#1136)
- Add CI tests for clang++ with -std=libc++ (#614)
Expand Down Expand Up @@ -173,11 +211,11 @@ including WASM and **gasp** fixed size SVE.
### New Contributors
Thanks to all our new contributor for this release!

* `@aguinet` made their first contribution in https://github.com/jfalcou/eve/pull/1049
* `@JPenuchot` made their first contribution in https://github.com/jfalcou/eve/pull/1028
* `@pauljurczak` made their first contribution in https://github.com/jfalcou/eve/pull/1123
* `@the-moisrex` made their first contribution in https://github.com/jfalcou/eve/pull/1025
* `@toughengineer` made their first contribution in https://github.com/jfalcou/eve/pull/1182
* `**aguinet**` made their first contribution in https://github.com/jfalcou/eve/pull/1049
* `**JPenuchot**` made their first contribution in https://github.com/jfalcou/eve/pull/1028
* `**pauljurczak**` made their first contribution in https://github.com/jfalcou/eve/pull/1123
* `**the-moisrex**` made their first contribution in https://github.com/jfalcou/eve/pull/1025
* `**toughengineer**` made their first contribution in https://github.com/jfalcou/eve/pull/1182

## Version 2021.10.0

Expand Down
6 changes: 3 additions & 3 deletions doc/tutorial/frequency-scaling.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ This is why, for example, if you look at libc, it at most uses 32 byte registers
sure you might speed up the strlen somewhat but then all the code after will be
slower.
For big datasets the price of lower frequency is often outweighted by processing
For big datasets the price of lower frequency is often outweighed by processing
more numbers in open operation and seed ups of 15% are not unheard of.
This lead to a dilemma in the API design for us: if the user is on the AVX512 system,
Expand All @@ -24,8 +24,8 @@ There are also a typedefs `nofs_wide`, `nofs_logical` where `nofs` stands for
"no frequency scaling".
@note: other than on avx512 on intel we always use the maximum width of the register,
since we expect the compiler to do it anyways and it is usally accepted.
If you want to set a specific cardianl for an algorithm, you can always use
since we expect the compiler to do it anyways and it is usually accepted.
If you want to set a specific cardinal for an algorithm, you can always use
`eve::algo::force_cardinal`.
**/
26 changes: 11 additions & 15 deletions docs/structeve_1_1scalar__cardinal.html → docs/Frequency.html
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
<meta http-equiv="X-UA-Compatible" content="IE=11"/>
<meta name="generator" content="Doxygen 1.9.5"/>
<meta name="viewport" content="width=device-width, initial-scale=1"/>
<title>E.V.E: eve::scalar_cardinal Struct Reference</title>
<title>E.V.E: Scaling.</title>
<link href="tabs.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="dynsections.js"></script>
Expand Down Expand Up @@ -56,7 +56,7 @@
<td id="projectalign">
<div id="projectname">E.V.E
</div>
<div id="projectbrief">v2022.09.01</div>
<div id="projectbrief">v2023.02.15</div>
</td>
</tr>
<tr><td colspan="2" style="padding: 20px 0px 0px 0px;"> <div id="MSearchBox" class="MSearchBoxInactive">
Expand Down Expand Up @@ -98,7 +98,7 @@
</div>
<script type="text/javascript">
/* @license magnet:?xt=urn:btih:d3d9a9a6595521f9666a5e94cc830dab83b65699&amp;dn=expat.txt MIT */
$(document).ready(function(){initNavTree('structeve_1_1scalar__cardinal.html',''); initResizable(); });
$(document).ready(function(){initNavTree('Frequency.html',''); initResizable(); });
/* @license-end */
</script>
<div id="doc-content">
Expand All @@ -116,26 +116,22 @@
</iframe>
</div>

<div class="header">
<div class="summary">
<a href="structeve_1_1scalar__cardinal-members.html">List of all members</a> </div>
<div class="headertitle"><div class="title">eve::scalar_cardinal Struct Reference<div class="ingroups"><a class="el" href="group__simd.html">EVE</a> &raquo; <a class="el" href="group__simd__types.html">SIMD related types</a></div></div></div>
<div><div class="header">
<div class="headertitle"><div class="title">Scaling. </div></div>
</div><!--header-->
<div class="contents">

<p>Cardinal type for scalar values.
<a href="structeve_1_1scalar__cardinal.html#details">More...</a></p>

<p><code>#include &lt;eve/arch/cardinals.hpp&gt;</code></p>
<a name="details" id="details"></a><h2 class="groupheader">Detailed Description</h2>
<div class="textblock"><p >Cardinal type for scalar values. </p>
<div class="textblock"><p >In SIMD programming there is a known issue of processor frequency scaling: when working with wider registers, in order to avoid overheating, some processors limit their CPU frequency. There are a lot of situations where this can happen but it is a noticeable problem mostly for 64 byte registers on intel avx512 cpus.</p>
<p >This is why, for example, if you look at libc, it at most uses 32 byte registers: sure you might speed up the strlen somewhat but then all the code after will be slower.</p>
<p >For big datasets the price of lower frequency is often outweighed by processing more numbers in open operation and seed ups of 15% are not unheard of.</p>
<p >This lead to a dilemma in the API design for us: if the user is on the AVX512 system, most likely they expect the register to be 64 bytes. But we suspect this is not what they actually want. So we decided that <code><a class="el" href="structeve_1_1wide.html" title="Wrapper for SIMD registers.">eve::wide</a></code> on avx512 is by default 64 bytes but algorithms by default use 32 bytes. If you want to get an algorithm to use 64 byte you can pass <code>eve::algo::allow_frequency_scaling</code> trait. There are also a typedefs <code>nofs_wide</code>, <code>nofs_logical</code> where <code>nofs</code> stands for "no frequency scaling".</p>
<dl class="section note"><dt>Note</dt><dd>: other than on avx512 on intel we always use the maximum width of the register, since we expect the compiler to do it anyways and it is usually accepted. If you want to set a specific cardinal for an algorithm, you can always use <code>eve::algo::force_cardinal</code>. </dd></dl>
</div></div><!-- contents -->
</div><!-- PageDoc -->
</div><!-- doc-content -->
<!-- HTML footer for doxygen 1.8.20-->
<!-- start footer part -->
<div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
<ul>
<li class="navelem"><a class="el" href="namespaceeve.html">eve</a></li><li class="navelem"><a class="el" href="structeve_1_1scalar__cardinal.html">scalar_cardinal</a></li>
</ul>
</div>
</div> <!-- DOXYSTRAP RELATED -->
Expand Down
Loading

0 comments on commit 3d5821f

Please sign in to comment.