Skip to content

Commit

Permalink
v0.22.0
Browse files Browse the repository at this point in the history
  • Loading branch information
benbrandt committed Jan 17, 2025
1 parent 4a72472 commit 217fb50
Show file tree
Hide file tree
Showing 3 changed files with 13 additions and 7 deletions.
14 changes: 10 additions & 4 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,18 @@
# Changelog

## v0.20.2
## v0.22.0

### What's New
### Breaking Changes

#### Python
- Revert change to special token behavior in v0.21. This had many unintended side effects, and does not seem to be recommended for chunking.

## v0.21.0

- Minor release to include latest pyo3 and tree-sitter dependencies.
### Breaking Changes

- Special tokens are now also encoded by both Huggingface and Tiktoken tokenizers. This is closer to the default behavior on the Python side, and should make sure if a model adds tokens at the beginning or end of a sequence, these are accounted for as well. This is especially important for embedding models that can add a special token to the beginning of the sequence, and the chunks generated didn't actually fit within the context window because of this.

### What's New

#### Rust

Expand Down
4 changes: 2 additions & 2 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
members = ["bindings/*"]

[workspace.package]
version = "0.20.2"
version = "0.22.0"
authors = ["Ben Brandt <[email protected]>"]
edition = "2021"
description = "Split text into semantic chunks, up to a desired chunk size. Supports calculating length by characters and tokens, and is callable from Rust and Python."
Expand Down

0 comments on commit 217fb50

Please sign in to comment.