Skip to content

Commit

Permalink
Merge pull request #243 from ashvardanian/main-dev
Browse files Browse the repository at this point in the history
Navigation Docs
  • Loading branch information
ashvardanian authored Dec 21, 2024
2 parents e0bc213 + e2b1068 commit 3af7f08
Show file tree
Hide file tree
Showing 3 changed files with 40 additions and 1 deletion.
21 changes: 20 additions & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,26 @@ To keep the quality of the code high, we have a set of [guidelines](https://gith
- [How to organize branches?](https://github.com/unum-cloud/awesome/blob/main/Workflow.md#branches)
- [How to style commits?](https://github.com/unum-cloud/awesome/blob/main/Workflow.md#commits)

## Navigating the Codebase

Primary kernels are implemented in header files under `include/simsimd/`:

- `dot.h` - dot products for real and complex vectors.
- `spatial.h` - spatial distances: L2, cosine distance.
- `binary.h` - binary distances: Hamming, Jaccard, etc.
- `probability.h` - probability metrics: KL-divergence, Jensen-Shannon, etc.
- `sparse.h` - sparse distances: weighted and normal set intersections.
- `curved.h` - bilinear forms for real and complex vectors, and Mahalanobis distance.

Bindings to other languages are in the respective directories:

- `python/lib.c` - Python bindings.
- `javascript/lib.c` - JavaScript bindings.
- `rust/lib.rs` - Rust bindings.
- `swift/SimSIMD.swift` - Swift bindings.

All tests, benchmarks, and examples are placed in the `scripts/` directory, if compatible with the toolchain of the implementation language.

## C and C++

To rerun experiments utilize the following command:
Expand Down Expand Up @@ -277,4 +297,3 @@ cd golang
go test # To test
go test -run=^$ -bench=. -benchmem # To benchmark
```
10 changes: 10 additions & 0 deletions include/simsimd/dot.h
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,16 @@ SIMSIMD_PUBLIC void simsimd_dot_u8_haswell(simsimd_u8_t const* a, simsimd_u8_t c
* Ice Lake added VNNI, VPOPCNTDQ, IFMA, VBMI, VAES, GFNI, VBMI2, BITALG, VPCLMULQDQ, and other extensions for integral operations.
* Genoa added only BF16.
* Sapphire Rapids added tiled matrix operations, but we are most interested in the new mixed-precision FMA instructions.
*
* Sadly, we can't effectively interleave different kinds of arithmetic instructions to utilize more ports:
*
* > Like Intel server architectures since Skylake-X, SPR cores feature two 512-bit FMA units, and organize them in a similar fashion.
* > One 512-bit FMA unit is created by fusing two 256-bit ones on port 0 and port 1. The other is added to port 5, as a server-specific
* > core extension. The FMA units on port 0 and 1 are configured into 2×256-bit or 1×512-bit mode depending on whether 512-bit FMA
* > instructions are present in the scheduler. That means a mix of 256-bit and 512-bit FMA instructions will not achieve higher IPC
* > than executing 512-bit instructions alone.
*
* Source: https://chipsandcheese.com/p/a-peek-at-sapphire-rapids
*/
SIMSIMD_PUBLIC void simsimd_dot_f64_skylake(simsimd_f64_t const* a, simsimd_f64_t const* b, simsimd_size_t n, simsimd_distance_t* result);
SIMSIMD_PUBLIC void simsimd_dot_f64c_skylake(simsimd_f64c_t const* a, simsimd_f64c_t const* b, simsimd_size_t n, simsimd_distance_t* results);
Expand Down
10 changes: 10 additions & 0 deletions include/simsimd/spatial.h
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,16 @@ SIMSIMD_PUBLIC void simsimd_cos_f64_haswell(simsimd_f64_t const* a, simsimd_f64_
/* SIMD-powered backends for AVX512 CPUs of Skylake generation and newer, using 32-bit arithmetic over 512-bit words.
* Skylake was launched in 2015, and discontinued in 2019. Skylake had support for F, CD, VL, DQ, and BW extensions,
* as well as masked operations. This is enough to supersede auto-vectorization on `f32` and `f64` types.
*
* Sadly, we can't effectively interleave different kinds of arithmetic instructions to utilize more ports:
*
* > Like Intel server architectures since Skylake-X, SPR cores feature two 512-bit FMA units, and organize them in a similar fashion.
* > One 512-bit FMA unit is created by fusing two 256-bit ones on port 0 and port 1. The other is added to port 5, as a server-specific
* > core extension. The FMA units on port 0 and 1 are configured into 2×256-bit or 1×512-bit mode depending on whether 512-bit FMA
* > instructions are present in the scheduler. That means a mix of 256-bit and 512-bit FMA instructions will not achieve higher IPC
* > than executing 512-bit instructions alone.
*
* Source: https://chipsandcheese.com/p/a-peek-at-sapphire-rapids
*/
SIMSIMD_PUBLIC void simsimd_l2_f32_skylake(simsimd_f32_t const* a, simsimd_f32_t const* b, simsimd_size_t n, simsimd_distance_t* d);
SIMSIMD_PUBLIC void simsimd_l2sq_f32_skylake(simsimd_f32_t const* a, simsimd_f32_t const* b, simsimd_size_t n, simsimd_distance_t* d);
Expand Down

0 comments on commit 3af7f08

Please sign in to comment.