Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Add async support to SDPMChunker and to SemanticChunker #39

Open
rodion-m opened this issue Nov 18, 2024 · 7 comments
Open

[FEAT] Add async support to SDPMChunker and to SemanticChunker #39

rodion-m opened this issue Nov 18, 2024 · 7 comments
Assignees
Labels
enhancement New feature or request

Comments

@rodion-m
Copy link

Proper async support will give an ability to properly parallelize the execution of remote embeddings computation.

@rodion-m rodion-m added the enhancement New feature or request label Nov 18, 2024
@bhavnicksm
Copy link
Collaborator

Hey @rodion-m!

Thanks for opening the issue 😊

Could you please elaborate on the benefits of adding async and the use case for it? Personally, I haven't used the async chunking support offered by other libraries before, nor am I aware of devs/companies who use it often. Maybe I just haven't been exposed to those use cases, so having some context would help here...

If you think this could be of wider utility, I'd be happy to accept PRs for it.

Thanks!

@rodion-m
Copy link
Author

Sure, here is my use case:

Proper async support will give an ability to properly parallelize the execution of remote embeddings computation.

So, we need speed :)

@bhavnicksm
Copy link
Collaborator

bhavnicksm commented Nov 18, 2024

@rodion-m,

Correct me if I am wrong, but essentially, this is for faster embeddings computation with APIs like OpenAI, Anthropic etc to pass a bunch of chunks to get their embeddings async?

By proper async support, you mean only for the embeddings right? Or do you mean that chunks should also be generated asynchronously?

I plan to add async/batching support for embedding APIs like OpenAI, because we would need that capability, yes!

I'm not sure of the second one i.e. async chunk generation yet.

Sorry for all the questions, just trying to make sure I understand this correctly

Thanks!

@rodion-m
Copy link
Author

I mean we should be able to call IO operations (like API calls for embeddings) asynchronously to get an ability to speed up embeddings generation in cases when we need to embed huge datasets and batch possibilities are exhausted. Here is an example of LiteLLM's implementation: https://docs.litellm.ai/docs/embedding/async_embedding

@bhavnicksm
Copy link
Collaborator

Hey @rodion-m,

Yes, this makes a lot more sense to me now! Thanks for the link with the example... 😊

I'm thinking about adding this feature into the Embeddings API that would be hosting all these APIs like OpenAI, Anthropic etc, so from the Chunker perspective, it would call a sync batching method but the Embeddings class for the API would take care of the embeddings handling in an async way to speed it up properly. Probably, we can call the asyncio.run inside the OpenAIEmbeddings class for example, and we could expose the async fns inside these classes too.

What do you think about this?

@rodion-m
Copy link
Author

@bhavnicksm and thank you for your attention to this topic!

I think that it's ok to call sync methods from async classes (let the problem of excessive parallelism fall on the developer himself). So, we just can add new async methods into the base class started with "a", like achunk.

Did I get your question right? If not, please provide code examples.

Also, I didn't get why do you want to call asyncio.run inside the OpenAIEmbeddings, could you please explain the need?

@bhavnicksm
Copy link
Collaborator

bhavnicksm commented Nov 19, 2024

@rodion-m,

I think I finally understand...

Yes, so currently, the semantic chunkers work on a "Scan and Look-back" style algorithm, which is why they require all the sentence embeddings before they begin. This is for one piece of text which let say has M sentences.

sync chunk -> sync batched_embed

But as we go from one to many pieces, we currently use multiprocessing at the moment to tackle this with N texts.

Here's what I understand from our discussion:

  1. I believe if we need to support async API calls for OpenAI etc, that would be there. We can expose these methods and have them being used inside a batch_embed fn as well.
sync batch_embed -> asyncio.run( async embed M times with await ) or asyncio.run( async batch_embed )

(NOTE: I'm assuming there's no seperate sync batch_embed for these APIs, correct me if I am wrong here)

  1. Then we also have the option of doing async chunk or achunk which would then depend on async batch_embed and could be exposed to the user as an option.
async chunk -> await (async batch_embed) -> M times x await( async embed ) 

Did I understand this correctly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants