Amazing project! What about Groq and fast inference on open source models? #1

agnoldo · 2024-08-03T12:33:50Z

Congratulations on your achievements, @austin-starks ! I see a huge potential for this project!

I was wondering if you could implement support for Groq and open source fast models such as Llama 3.1 8B. Imagine improving prompts for such a fast model, running at 1200 tokens/second! And cheaply. Or even locally, for those who need complete privacy...

Do you think this is feasible?

Thanks!

lukmay · 2024-08-23T09:20:37Z

Hi @agnoldo

I completely agree with you.
This is a very good idea!
I wanted to add that you can actually achieve something similar using the PrivateGPT project.
With PrivateGPT, you can run a model like Llama 3.1 8B locally and set up the same interfaces that OpenAI provides. This means you can easily swap out API calls to OpenAI with calls to your locally running model.

This approach could be a first workaround to get Promptimizer to work with local models.

Thanks for bringing this up!

austin-starks · 2024-08-23T14:55:45Z

Hey @agnoldo! Sorry for responding late; I never saw your initial message!

It should absolutely be possible! I've never used Groq, but if we could get it to work, that would be game-changing. Ollama is unfortunately too slow, at least on my computer.

I don't have time to implement this, but I'm open to PR requests! It should be relatively straightforward to add.

pressdarling · 2024-09-16T01:02:04Z

I was wondering if you could implement support for Groq and open source fast models such as Llama 3.1 8B. Imagine improving prompts for such a fast model, running at 1200 tokens/second! And cheaply. Or even locally, for those who need complete privacy...

@agnoldo For local inference, Ollama should work - any other local inference engine that exposes a compatible API should work, right? LlamaCpp, etc.

I haven't tried it myself yet in this project and don't have time to add a PR right now, but the breadcrumbs should all be here:

Groq API is largely OpenAI-compatible - that's how I'm using it in open-webui
Currently the OpenAI API is referenced in OpenAIServiceClient.ts:

Promptimizer/src/services/OpenAIServiceClient.ts

Line 27 in 3acdcbf

baseUrl: string = "https://api.openai.com/v1/chat/completions";

changing that to the Groq API Endpoint https://api.groq.com/openai/v1/chat/completions should help (see docs)
you will likely also need to change OPENAI_API_KEY=[...] to GROQ_API_KEY=[...] in both that file and your .env:

Promptimizer/src/services/OpenAIServiceClient.ts

Line 30 in 3acdcbf

const apiKey = process.env.OPENAI_API_KEY;

run the new code past a decent LLM to figure out how to get an option to use both endpoints at once ;)

@austin-starks There's probably a much smarter way to do this but this looks enough like a nail to my hammer-minded approach...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Amazing project! What about Groq and fast inference on open source models? #1

Amazing project! What about Groq and fast inference on open source models? #1

agnoldo commented Aug 3, 2024

lukmay commented Aug 23, 2024

austin-starks commented Aug 23, 2024

pressdarling commented Sep 16, 2024

Amazing project! What about Groq and fast inference on open source models? #1

Amazing project! What about Groq and fast inference on open source models? #1

Comments

agnoldo commented Aug 3, 2024

lukmay commented Aug 23, 2024

austin-starks commented Aug 23, 2024

pressdarling commented Sep 16, 2024