Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate release process #124

Merged
merged 14 commits into from
Nov 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .github/release.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# .github/release.yml

# Configure automatic release note generation
# See https://docs.github.com/en/repositories/releasing-projects-on-github/automatically-generated-release-notes
changelog:
exclude:
labels:
- ignore-for-release
98 changes: 98 additions & 0 deletions .github/workflows/prepare_release.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
name: Create draft release

on:
workflow_dispatch:
inputs:
release_type:
description: 'Release Type'
required: true
type: choice
default: 'Minor (E.g. 5.2.1 to 5.3.0)'
options:
- Major (E.g. 5.2.1 to 6.0.0)
- Minor (E.g. 5.2.1 to 5.3.0)
- Patch (E.g. 5.2.1 to 5.2.2)

jobs:
Create-Release:
runs-on: ubuntu-latest

permissions:
# Give the default GITHUB_TOKEN write permission to commit and push the
# added or changed files to the repository.
contents: write

outputs:
release_url: ${{ steps.create-draft-release.outputs.url }}

steps:
- name: Checkout code
uses: actions/checkout@v4
with:
ref: ${{ github.head_ref }}

- name: Update the version
id: bump_version
shell: pwsh
run: |
# Extract current version
$header = Get-Content ./libdivide.h
$major_ver = [int](($header -match "LIBDIVIDE_VERSION_MAJOR")[0] -Split " ")[-1]
$minor_ver = [int](($header -match "LIBDIVIDE_VERSION_MINOR")[0] -Split " ")[-1]
$patch_ver = [int](($header -match "LIBDIVIDE_VERSION_PATCH")[0] -Split " ")[-1]
$current_version=@($major_ver, $minor_ver, $patch_ver) -Join "."

# Increment version
if ("${{ github.event.inputs.release_type }}" -like "Patch*") {
$patch_ver = $patch_ver + 1
} elseif ("${{ github.event.inputs.release_type }}" -like "minor*") {
$minor_ver = $minor_ver + 1
$patch_ver = 0
} else { # Must be major version
$major_ver = $major_ver + 1
$minor_ver = 0
$patch_ver = 0
}
$new_version=@($major_ver, $minor_ver, $patch_ver) -Join "."

# Update header file
$header = $header -replace "#define LIBDIVIDE_VERSION ""\d+\.\d+\.\d+""", "#define LIBDIVIDE_VERSION_MAJOR ""$new_version"""
$header = $header -replace "#define LIBDIVIDE_VERSION_MAJOR \d+", "#define LIBDIVIDE_VERSION_MAJOR $major_ver"
$header = $header -replace "#define LIBDIVIDE_VERSION_MINOR \d+", "#define LIBDIVIDE_VERSION_MINOR $minor_ver"
$header = $header -replace "#define LIBDIVIDE_VERSION_PATCH \d+", "#define LIBDIVIDE_VERSION_PATCH $patch_ver"
$header | Set-Content ./libdivide.h

# Update other files
$file="./library.properties"
$regex = 'version=(\d+\.\d+(\.\d+)?)'
(Get-Content $file) -replace $regex, "version=$new_version" | Set-Content $file

$file="./CMakeLists.txt"
$regex = "set\(LIBDIVIDE_VERSION ""\d+\.\d+(\.\d+)?""\)"
(Get-Content $file) -replace $regex, "set(LIBDIVIDE_VERSION ""$new_version"")" | Set-Content $file

Write-Output "previous_version=$current_version" >> $Env:GITHUB_OUTPUT
Write-Output "version=$new_version" >> $Env:GITHUB_OUTPUT
Write-Output "major=$major_ver" >> $Env:GITHUB_OUTPUT
Write-Output "minor=$minor_ver" >> $Env:GITHUB_OUTPUT
Write-Output "patch=$patch_ver" >> $Env:GITHUB_OUTPUT

# Commit all changed files back to the repository
- name: Commit updated versions
uses: stefanzweifel/git-auto-commit-action@v5
with:
commit_message: Auto increment version to ${{ steps.bump_version.outputs.version }}

# Create draft release
- name: Create draft release
id: create-draft-release
uses: softprops/action-gh-release@v2
with:
name: v${{ steps.bump_version.outputs.version }}
draft: true
generate_release_notes: true
tag_name: v${{ steps.bump_version.outputs.version }}

- name: Generate Summary
run: |
echo "Created [v${{ steps.bump_version.outputs.version }} draft release](${{ steps.create-draft-release.outputs.url }})" >> $GITHUB_STEP_SUMMARY
89 changes: 26 additions & 63 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,29 +1,17 @@
# libdivide

[![Build Status](https://github.com/ridiculousfish/libdivide/actions/workflows/canary_build.yml/badge.svg)](https://github.com/ridiculousfish/libdivide/actions/workflows/canary_build.yml)
[![Github Releases](https://img.shields.io/github/release/ridiculousfish/libdivide.svg)](https://github.com/ridiculousfish/libdivide/releases)

```libdivide.h``` is a header-only C/C++ library for optimizing integer division.
Integer division is one of the slowest instructions on most CPUs e.g. on
current x64 CPUs a 64-bit integer division has a latency of up to 90 clock
cycles whereas a multiplication has a latency of only 3 clock cycles.
libdivide allows you to replace expensive integer division instructions by
a sequence of shift, add and multiply instructions that will calculate
the integer division much faster.

On current CPUs you can get a **speedup of up to 10x** for 64-bit integer division
and a speedup of up to to 5x for 32-bit integer division when using libdivide.
libdivide also supports [SSE2](https://en.wikipedia.org/wiki/SSE2),
[AVX2](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions) and
[AVX512](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions)
vector division which provides an even larger speedup. You can test how much
speedup you can achieve on your CPU using the [benchmark](#benchmark-program)
program.
```libdivide.h``` is a header-only C/C++ library for optimizing integer division. Integer division is one of the slowest instructions on most CPUs e.g. on current x64 CPUs a 64-bit integer division has a latency of up to 90 clock cycles whereas a multiplication has a latency of only 3 clock cycles. libdivide allows you to replace expensive integer division instructions by a sequence of shift, add and multiply instructions that will calculate the integer division much faster.

libdivide is compatible with 8-bit microcontrollers, such as the AVR series: [the CI build includes a AtMega2560 target](test/avr/readme.md). Since low end hardware such as this often do not include a hardware divider, libdivide is particularly useful. In addition to the runtime [C](https://github.com/ridiculousfish/libdivide/blob/master/doc/C-API.md) & [C++](https://github.com/ridiculousfish/libdivide/blob/master/doc/CPP-API.md) APIs, a set of [predefined macros](constant_fast_div.h) and [templates](constant_fast_div.hpp) is included to speed up division by 16-bit constants: division by a 16-bit constant is [not optimized by avr-gcc on 8-bit systems](https://stackoverflow.com/questions/47994933/why-doesnt-gcc-or-clang-on-arm-use-division-by-invariant-integers-using-multip).
On current CPUs you can get a **speedup of up to 10x** for 64-bit integer division and a speedup of up to to 5x for 32-bit integer division when using libdivide. libdivide also supports [SSE2](https://en.wikipedia.org/wiki/SSE2), [AVX2](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions) and [AVX512](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions) vector division which provides an even larger speedup. You can test how much speedup you can achieve on your CPU using the [benchmark](#benchmark-program) program.

libdivide is compatible with 8-bit microcontrollers, such as the AVR series: [the CI build includes a AtMega2560 target](test/avr/readme.md). Since low end hardware such as this often do not include a hardware divider, libdivide is particularly useful. In addition to the runtime [C](doc/C-API.md) & [C++](doc/CPP-API.md) APIs, a set of [predefined macros](constant_fast_div.h) and [templates](constant_fast_div.hpp) is included to speed up division by 16-bit constants: division by a 16-bit constant is [not optimized by avr-gcc on 8-bit systems](https://stackoverflow.com/questions/47994933/why-doesnt-gcc-or-clang-on-arm-use-division-by-invariant-integers-using-multip).

See https://libdivide.com for more information on libdivide.

# C++ example
## C++ example

The first code snippet divides all integers in a vector using integer division.
This is slow as integer division is at least one order of magnitude slower than
Expand Down Expand Up @@ -60,7 +48,7 @@ Generally libdivide will give a significant speedup if:
* The divisor is only known at runtime
* The divisor is reused multiple times e.g. in a loop

# C example
## C example

You first need to generate a libdivide divider using one of the ```libdivide_*_gen``` functions (```*```: ```s32```, ```u32```, ```s64```, ```u64```)
which can then be used to compute the actual integer division using the
Expand All @@ -79,28 +67,19 @@ void divide(int64_t *array, size_t size, int64_t divisor)
}
```

# API reference
## API reference

* [C API](https://github.com/ridiculousfish/libdivide/blob/master/doc/C-API.md)
* [C++ API](https://github.com/ridiculousfish/libdivide/blob/master/doc/CPP-API.md)
* [C API](doc/C-API.md)
* [C++ API](doc/CPP-API.md)
* [Macro Invariant Division](constant_fast_div.h)
* [Template Based Invariant Division](constant_fast_div.hpp)

# Branchfull vs branchfree
## Branchfull vs branchfree

The default libdivide divider makes use of
[branches](https://en.wikipedia.org/wiki/Branch_(computer_science)) to compute the integer
division. When the same divider is used inside a hot loop as in the C++ example section the
CPU will accurately predict the branches and there will be no performance slowdown. Often
the compiler is even able to move the branches outside the body of the loop hence
completely eliminating the branches, this is called loop-invariant code motion.

libdivide also has a branchfree divider type which computes the integer division without
using any branch instructions. The branchfree divider generally uses a few more instructions
than the default branchfull divider. The main use case for the branchfree divider is when
you have an array of different divisors and you need to iterate over the divisors. In this
case the default branchfull divider would exhibit poor performance as the CPU won't be
able to correctly predict the branches.
[branches](https://en.wikipedia.org/wiki/Branch_(computer_science)) to compute the integer division. When the same divider is used inside a hot loop as in the C++ example section the CPU will accurately predict the branches and there will be no performance slowdown. Often the compiler is even able to move the branches outside the body of the loop hence completely eliminating the branches, this is called loop-invariant code motion.

libdivide also has a branchfree divider type which computes the integer division without using any branch instructions. The branchfree divider generally uses a few more instructions than the default branchfull divider. The main use case for the branchfree divider is when you have an array of different divisors and you need to iterate over the divisors. In this case the default branchfull divider would exhibit poor performance as the CPU won't be able to correctly predict the branches.

```C++
#include "libdivide.h"
Expand All @@ -124,14 +103,12 @@ Caveats of branchfree divider:
* Unsigned branchfree divider cannot be ```1```
* Faster for unsigned types than for signed types

# Vector division
## Vector division

libdivide supports [SSE2](https://en.wikipedia.org/wiki/SSE2),
[AVX2](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions) and
[AVX512](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions)
vector division on x86 and x64 CPUs. In the example below we divide the packed 32-bit
integers inside an AVX512 vector using libdivide. libdivide supports 32-bit and 64-bit
vector division for both signed and unsigned integers.
vector division on x86 and x64 CPUs. In the example below we divide the packed 32-bit integers inside an AVX512 vector using libdivide. libdivide supports 32-bit and 64-bit vector division for both signed and unsigned integers.

```C++
#include "libdivide.h"
Expand All @@ -153,7 +130,7 @@ Note that you need to define one of macros below to enable vector division:
* ```LIBDIVIDE_AVX512```
* ```LIBDIVIDE_NEON```

# Performance tips
## Performance Tips

* If possible use unsigned integer types because libdivide's unsigned division is measurably
faster than its signed division. This is especially true for the branchfree divider.
Expand All @@ -165,34 +142,23 @@ Note that you need to define one of macros below to enable vector division:
currently no vector multiplication instructions on x86 to efficiently calculate
64-bit * 64-bit to 128-bit.

# Build instructions
## Build instructions

libdivide has one test program and two benchmark programs which can be built using cmake and
a recent C++ compiler that supports C++11 or later. Optionally ```libdivide.h``` can also be
installed to ```/usr/local/include```.
libdivide has one test program and two benchmark programs which can be built using cmake and a recent C++ compiler that supports C++11 or later. Optionally ```libdivide.h``` can also be installed to ```/usr/local/include```.

```bash
cmake .
make -j
sudo make install
```

# Tester program
## Tester program

You can pass the **tester** program one or more of the following arguments: ```u32```,
```s32```, ```u64```, ```s64``` to test the four cases (signed, unsigned, 32-bit, or 64-bit), or
run it with no arguments to test all four. The tester will verify the correctness of libdivide
via a set of randomly chosen numerators and denominators, by comparing the result of libdivide's
division to hardware division. It will stop with an error message as soon as it finds a
discrepancy.
You can pass the **tester** program one or more of the following arguments: ```u32```, ```s32```, ```u64```, ```s64``` to test the four cases (signed, unsigned, 32-bit, or 64-bit), or run it with no arguments to test all four. The tester will verify the correctness of libdivide via a set of randomly chosen numerators and denominators, by comparing the result of libdivide's division to hardware division. It will stop with an error message as soon as it finds a discrepancy.

# Benchmark program
## Benchmark program

You can pass the **benchmark** program one or more of the following arguments: ```u16```, ```s16```, ```u32```,
```s32```, ```u64```, ```s64``` to compare libdivide's speed against hardware division.
**benchmark** tests a simple function that inputs an array of random numerators and a single
divisor, and returns the sum of their quotients. It tests this using both hardware division, and
the various division approaches supported by libdivide, including vector division.
You can pass the **benchmark** program one or more of the following arguments: ```u16```, ```s16```, ```u32```, ```s32```, ```u64```, ```s64``` to compare libdivide's speed against hardware division. **benchmark** tests a simple function that inputs an array of random numerators and a single divisor, and returns the sum of their quotients. It tests this using both hardware division, and the various division approaches supported by libdivide, including vector division.

It will output data like this:

Expand All @@ -207,9 +173,7 @@ It will output data like this:
...
```

It will keep going as long as you let it, so it's best to stop it when you are happy with the
denominators tested. These columns have the following significance. All times are in
nanoseconds, lower is better.
It will keep going as long as you let it, so it's best to stop it when you are happy with the denominators tested. These columns have the following significance. All times are in nanoseconds, lower is better.

```
#: The divisor that is tested
Expand All @@ -222,10 +186,9 @@ vec_bf: libdivide time, using vector branchfree division
algo: The algorithm used.
```

The **benchmark** program will also verify that each function returns the same value,
so benchmark is valuable for its verification as well.
The **benchmark** program will also verify that each function returns the same value, so benchmark is valuable for its verification as well.

# Contributing
## Contributing

Although there are no individual unit tests, the supplied ```cmake``` builds do include several safety nets:

Expand Down
13 changes: 13 additions & 0 deletions doc/RELEASE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
## How to do a new libdivide release

Releases are semi-automated using GitHub actions:

1. Manually run the [Create draft release](https://github.com/ridiculousfish/libdivide/actions/workflows/prepare_release.yml) workflow/action.
* Choose the branch to release from (usually ```master```) and the release type (based on [Semantic Versioning](https://semver.org/))
* The action will do some codebase housekeeping and create a draft release:
* Creates a new commit with updated version numbers in ```libdivide.h```, ```CMakeLists.txt```, ```library.properties```.
* Creates a draft Git tag of format vX.Y.Z.
2. Once the action is complete, follow the output link in the action summary to the generated draft release. E.g. ![image](https://github.com/user-attachments/assets/7e8393f7-f204-4b3a-af37-de5e187479dc)
3. Edit the generated release notes as needed & publish

Note that PRs with the ```ignore-for-release``` label are excluded from the generated release notes.
4 changes: 3 additions & 1 deletion libdivide.h
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,11 @@
#ifndef LIBDIVIDE_H
#define LIBDIVIDE_H

#define LIBDIVIDE_VERSION "5.1"
// *** Version numbers are auto generated - do not edit ***
#define LIBDIVIDE_VERSION "5.1.0"
#define LIBDIVIDE_VERSION_MAJOR 5
#define LIBDIVIDE_VERSION_MINOR 1
#define LIBDIVIDE_VERSION_PATCH 0

#include <stdint.h>

Expand Down
14 changes: 7 additions & 7 deletions test/avr/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,12 @@

## Running the Test program

The test program is in the 'megaatmega2560_Test' environment.
The test program is the 'megaatmega2560_sim_unittest' environment.

To run the test program in a simulator:
1. On the activity bar, select PlatformIO
2. Run Project Tasks -> megaatmega2560_Test -> Custom -> Simulate
a. This will build the test program & launch it in the simulator (this might download )supporting packages)
b. **NOTE** Once running it can take a **long** time for ouput to appear in the terminal. **Be patient**
* Or copy the simavr command line from the terminal to a command prompt (or another vscode terminal)
To run the test program in a simulator (no hardware required!):

1. On the activity bar, select PlatformIO
2. Run Project Tasks -> megaatmega2560_sim_unittest -> Advanced -> Test
1. This will build the test program & launch it in the simulator (this might download supporting packages)
2. **NOTE** Once running it can take a **long** time for ouput to appear in the terminal. **Be patient**

1 change: 1 addition & 0 deletions test/tester.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,7 @@ extern "C" int main(int argc, char *argv[]) {
}
}

std::cout << "Testing libdivide v" << LIBDIVIDE_VERSION << std::endl;
std::string vecTypes = "";
#if defined(LIBDIVIDE_SSE2)
vecTypes += "sse2 ";
Expand Down