Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Add support for Contextual Chunking from Anthropic #105

Open
3 tasks
bhavnicksm opened this issue Dec 24, 2024 · 0 comments
Open
3 tasks

[FEAT] Add support for Contextual Chunking from Anthropic #105

bhavnicksm opened this issue Dec 24, 2024 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@bhavnicksm
Copy link
Collaborator

Hey there! 🦛

I want chonkie to support Contextual Retrieval that Anthropic talks about in their blog out of the box.

We can add the contextual retrieval from Anthropic via a ContextualRefinery class as a part of the Refinery module, such that given a particular generative module and original text, the ContextualRefinery can add appropriate context to the chunks, which can then be used for embeddings for vector search or in a bm25 index.

The API should look something like:

from chonkie import ContextualRefinery, TokenChunker, AnthropicGenie

genie = AnthropicGenie()
refinery = ContextualRefinery(genie)
chunker = TokenChunker()

text = ...

chunks = chunker(text) 
contextual_chunks = refinery(chunks) 

# use the chunks normally!

To get to this point, we need to complete the following:

  • Add support for the Genie classes, specifically AnthropicGenie
  • Plan for saving structured prompts for each genie inside the genie module
  • Create the ContextualRefinery that can take in a genie module
@bhavnicksm bhavnicksm added the enhancement New feature or request label Dec 24, 2024
@bhavnicksm bhavnicksm self-assigned this Dec 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant