Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roadmap: Jan has revamped Remote Engines (e.g. OpenAI, Anthropic etc) #3786

Open
11 of 25 tasks
dan-menlo opened this issue Oct 13, 2024 · 17 comments
Open
11 of 25 tasks
Assignees
Labels
category: providers Local & remote inference providers type: planning Discussions, specs and decisions stage
Milestone

Comments

@dan-menlo
Copy link
Contributor

dan-menlo commented Oct 13, 2024

Goal

Note: This Epic has changed multiple times, as our architecture has also changed
A lot of the early comments are referring to a different context
e.g. "Provider Abstraction" in Jan

  • Cortex is now an API Platform and needs to route route /chat/completion requests to Remote APIs
    • This is intended to allow us to support Groq, Martian, OpenRouter etc
  • Remote API Extensions will need to support
    • Getting Remote API's model list
    • Enabling certain default models (e.g. we may not want to show every nightly model in Remote API's Model List)
    • Remote APIs may have specific model.yaml templates (e.g. context length)
    • Routing of /chat/completion
    • Extension should cover both UI layer, as well as "Backend" (we may need to modify Cortex to accept a Remote param)
    • Handling API Key Management
  • We may need an incremental path to Remote API Extensions
    • Cortex.cpp does not support Extensions for now
    • We may need to have Remote API Extensions define a specific payload, that Cortex /chat/completions then routes conditionally

Tasklist

Jan

Backend

Remote APIs to Support

Popular

Deprioritized

@dan-menlo dan-menlo converted this from a draft issue Oct 13, 2024
@dan-menlo dan-menlo changed the title architecture: Local Provider Extension architecture: Provider Abstraction Oct 14, 2024
@dan-menlo
Copy link
Contributor Author

dan-menlo commented Oct 14, 2024

Goal: Clear Eng Spec for Providers

Scope

  • "Provider" Scope
    • Remote (Groq, NIM, etc)
    • Local = Hardware + APIs + Model Management (Ollama, Cortex)
    • It is possible that we don't need a differentiation between Remote and Local
    • Choosing a better name, vs. "OAIEngine")
  • Provider Interface + Abstraction
    • Providers registers certain things (e.g. UI, Models), can be called by other extensions
    • Registers Settings Page
    • Registers Models Category + List
  • Each Provider Extension should be a separate repo?
    • I would like this -> add others to help maintain

Related

@louis-jan
Copy link
Contributor

louis-jan commented Oct 14, 2024

Jan Providers

Local Provider

Currently, the local extension still has to manage processes itself, which involves utilizing third-party frameworks such as Node.js (child_process) for building functions.

What if we build Jan on mobile we have to cover extensions as well. It would be better to move these parts to Core module and frontend will just need to use it’s API.

Local Provider will need to execute a command to run its program. Therefore, the command and arguments will be defined, while the rest will be delegated to the super class.

Lifecycle:

  • A Local Provider is intended to run engines as an API Server (potentially using HTTP, socket, or gRPC).
  • Local Provider executes a command through CoreAPI (reducing the main process implementation from extensions, easy to port to other platforms such as mobile)
  • Main Process core module will run a watchdog and maintain the process
  • Since then, the app has been able to make requests and proxy through the Local Provider extension.
  • App terminates -> watchdog terminates the process.

Examples

class CortexProvider extends LocalProvider {
  async onLoad() {
     // The Run is implemented from the core module
     // then the spawn process will be maintained by the watchdog
     this.run("cortex", [ "start", "--port", "39291" ], { cwd: "./", env: { } })
  } 

  async loadModel() {
    // Can be a http request, socket or grpc
    this.post("/v1/model/start", { mode: "llama3.2" })
  }
}

Image

https://drive.google.com/file/d/1lITgfqviqA5b0-etSGtU5wI8BS7_TXza/view?usp=sharing

Remove Provider

  • The same as discussions: Remote API Extension #3505
  • Remote extensions should work with autopopulating models, e.g. /models list.
  • We could not build hundreds model.json files manually.
  • The current extension framework is actually designed to handle this, it's just an implementation issue from extensions, which can be improved.
  • There was a hacky UI implementation where we pre-populated models, then disabled all of them until the API key was set. That should be a part of the extension, not the Jan app.
  • Extension builder still ships default available models.
     // Before
     override async onLoad(): Promise<void> {
       super.onLoad()
       // Register Settings (API Key, Endpoints)
       this.registerSettings(SETTINGS)
     	
       // Pre-populate models - persist model.json files
       // MODELS are model.json files that come with the extension.
       this.registerModels(MODELS)
     }
    
     // After
     override async onLoad(): Promise<void> {
       super.onLoad()
       // Register Settings (API Key, Endpoints)
       this.registerSettings(SETTINGS)
     	
       // Fetch models from provider models endpoint - just a simple fetch
       // Default to `/models`
       get('/models')
         .then((models) => {
             // Model builder will construct model template (aka preset)
     	// This operation builds Model DTOs that works with the app.
     	this.registerModels(this.modelBuilder.build(models))
         })
     }
Remote Provider Extension
Image
Draw.io https://drive.google.com/file/d/1pl9WjCzKl519keva85aHqUhx2u0onVf4/view?usp=sharing
  1. Supported parameters?
  • Each provider works with different parameters, but they all share the same basic function with the current ones defined.
  • We've already supported transformPayload and transformResponse to adapt to these cases.
  • So users still see parameters consistent from model to model, but the magic happens behind the scenes, where the transformations are simplified under the hood.
    /**
    * transformPayload Example
    * Tranform the payload before sending it to the inference endpoint.
    * The new preview models such as o1-mini and o1-preview replaced max_tokens by max_completion_tokens parameter.
    * Others do not.
    */
    transformPayload = (payload: OpenAIPayloadType): OpenAIPayloadType => {
      // Transform the payload for preview models
      if (this.previewModels.includes(payload.model)) {
        const { max_tokens, ...params } = payload
        return { ...params, max_completion_tokens: max_tokens }
      }
      // Pass through for officialw models
      return payload
    }
  1. Decoration?
    {
      "name": "openai-extension",
      "displayName": "OpenAI Extension Provider",
      "icon": "https://openai.com/logo.png"
    }
  2. Just remove the hacky parts
  • Model Dropdown: It checks if the engine is nitro or others, filtering for local versus cloud sections. New local engines will be treated as remote engines (e.g. cortex.cpp). -> Filter by Extension type (class name or type, e.g. LocalOAIEngine vs RemoteOAIEngine).
  • All models from the cloud provider are disabled by default if no API key is set. What if I use a self-hosted endpoint without API key restrictions? Models available or not should be determined from the extensions, when there are no credentials to meet the requirements, it will result in an empty section, indicating no available models. When users input the API-Key from extension settings page, it will fetch model list automatically and cache. Users can also refresh the models list from there (should not fetch so many times, we are building a local-first application)
  • Application settings can be a bit confusing, with Model Providers and Core Extensions listed separately. Where do other extensions fit in?
Extension settings do not have a community or "others" section
Image

Provider Interface and abstraction

  • Providers are scoped at engine operations, such as running engines, loading models...
    • registerModels(models)
    • run(commands, arguments, options)
    • loadModel(model)
    • unloadModel(model)
  • Core functions can be extended through extensions, not confined by providers, such as Hardware and UI.
    • systemStatus()
    • registerSettings()
    • registerRibbon()
    • registerView()

Registered models will be stored in an in-memory store, accessible from other extensions(ModelManager.instance().models). The same as settings. App and extensions can perform chat/completions requests with just model name, which means the registered model should be unique across extensions.

The core module also exposes extensive APIs, such as systemStatus so other extensions can access, there should be just one implementation of the logic supplied by extensions. Otherwise, it will merely be utilized within the extension, first come, first served.

The UI of the model should be aligned with the model object, minimize decorations (e.g. model icon), and avoid introducing various types of model DTOs.

Each Provider Extension should be a separate repo?

Extensions installation is a straightforward process that requires minimal effort.

  • There is no official way to install extensions from a GitHub repository URL. Users typically don't know how to package and install software from sources.
  • There should be a shortcut from the settings page that allows users to input the URL, pop up the extension repository details, and then install from there.
    It would be helpful to provide a list of community extensions, allowing users to easily find the right extension for their specific use case without having to search.

@freelerobot freelerobot added category: local engines category: providers Local & remote inference providers labels Oct 14, 2024
@freelerobot freelerobot pinned this issue Oct 14, 2024
@freelerobot freelerobot moved this from Investigating to Planning in Jan & Cortex Oct 15, 2024
@dan-menlo
Copy link
Contributor Author

@louis-jan We can start working on this refactor, and make adjustments on the edges. Thank you for the clear spec!

@dan-menlo dan-menlo changed the title architecture: Provider Abstraction discussion: Provider Abstraction Oct 17, 2024
@freelerobot freelerobot added the type: epic A major feature or initiative label Oct 17, 2024
@freelerobot freelerobot changed the title discussion: Provider Abstraction planning: Provider Abstraction Oct 17, 2024
@freelerobot freelerobot added type: planning Discussions, specs and decisions stage and removed type: epic A major feature or initiative labels Oct 17, 2024
@dan-menlo dan-menlo changed the title planning: Provider Abstraction planning: Remote API Extensions for Jan & Cortex Oct 29, 2024
@louis-jan
Copy link
Contributor

louis-jan commented Nov 5, 2024

@nguyenhoangthuan99

  • GetModels will call remote API and return info (model ID, name,...)
  • Remote and local models will be handled consistently using model.yml files
  • When call /v1/models/list, also return remote models and create model.yml if not exists.

Would this cause significant latency for /models? It would result in a poor user experience for clients. Also it's /v1/models no /list path component.

  • Remote models are treated as extensions and will not be stored in the database, remote engine will manage its own models

This would result in duplicate implementations between extensions. The current code-sharing mechanism between engine implementations is quite bad. My naive thought that you mean to scan thru the folder, but that introduce a bad performance as we tried to introduce the db file to optimize that. Otherwise, open interpreter would result in hundreds of model entries that cause a noticeable problem.

  1. HandleChatCompletion (will transform request) and forward to remote provider

I think there should be a transformer for parameters to map to Jan UI and consistently persist model.yml.

  1. Each remote provider's models will be stored under their respective engine folder

There was an interesting case where applications like Jan wanted to prepackage engines, making those engines read-only folders. Moving them to the data folder is costly and provides poor UX because there is a compressed mechanism in the app bundle. Decompressing and copying them over can take more than 5 minutes on some computers we have experienced so far.

  1. POST /v1/auth/token
    {
    "provider": "huggingface",
    "token": "your_token_here"
    }

This builds a bad engine isolation where each extension can access others or Application have to map once again. Many parameters can be configured at the engine level, such as API key, URL, and settings for remote engines. For the local llama.cpp engine, options include caching, flash attention, and more. Would it be better to create a generic engine configurable endpoint for scalability?

@nguyenhoangthuan99
Copy link

nguyenhoangthuan99 commented Nov 5, 2024

Updated cortex.cpp implementation base on Louis recommendation:

  • we will implement separate endpoint for get model corresponding to engines /v1/engine/models and support filter for model name with remote engine

  • Remote models will be saved in DB with different table RemoteModels and have follow fields: model, engine, path_to_model_yml -> we need to provide API endpoint for adding remote models, only added models are saved in DB

  • remote engine setting (api key, url, ...) will be saved under models/remote/ data folder, e.g. models/remote/openai.json, models/remote/anthropic.json, models/remote/openai-compatible.json... -> we need to provide API for engine setting /v1/engine/setting.

  • model.yml also contains mapping parameters fields for params transformer

@dan-menlo
Copy link
Contributor Author

Updated cortex.cpp implementation base on Louis recommendation:

  • we will implement separate endpoint for get model corresponding to engines /v1/engine/models and support filter for model name with remote engine
  • Remote models will be saved in DB with different table RemoteModels and have follow fields: model, engine, path_to_model_yml -> we need to provide API endpoint for adding remote models, only added models are saved in DB
  • remote engine setting (api key, url, ...) will be saved under models/remote/ data folder, e.g. models/remote/openai.json, models/remote/anthropic.json, models/remote/openai-compatible.json... -> we need to provide API for engine setting /v1/engine/setting.
  • model.yml also contains mapping parameters fields for params transformer

@nguyenhoangthuan99 @louis-jan I am not sure about this implementation and would like us to brainstorm/think through more:

Overall

  • I would like to explore @louis-jan idea of more code-sharing between engine implementations
  • One path I would like to explore is building a generic "Remote OpenAI-compatible Engine", and then letting users create instances of it.
    • URL
    • API Key
    • Transform Params
  • We can probably incorporate elements of @louis-jan's proposal last week into the Engines abstraction

Models

We should have a clear Models abstraction, which can be either local or remote.

cortex.db Models Table

  • I don't think a separate RemoteModels table in cortex.db makes sense; we should use the existing models table
  • We should add a remote column which if true is a remote model
  • engine should be a 1:1 to the remote engine
  • Calling /models should return both local and remote, with field to indicate whether it's remote or local

Note: This will require us to implement a DB migrator as part of updater, which is an important App Shell primitive, as cortex.db does not have a remote column

getModels

One big question on my mind, is whether Models table should contain all remote models. What if OpenRouter returns all 700 models? What if Claude returns every claude-sonnet- version? This would clog up Models table and make it impossible to use.

  • We should only be showing the "main" models (e.g. 3-4 main ones)
  • We should still make it possible for the user to "define" a model they want to use, e.g. if they want to use claude-sonnet-10082024)

model.yaml has the params transformer

  • I really like this idea, and think this is a very elegant way to deal with this
  • We can host these in the Cortex Huggingface Org, as model.yaml for Remote models

Engines

  • I think remote engine settings should be stored in the /engines folder, not /models
  • e.g. /engines/anthropic, /engines/openai, /engines/<name>
  • Models belong to Engines, and there should be a SQLite relation between them

API Key and URL

  • I think Engines can have a generic settings.json in their /engines/<name> folder
  • Local and Remote Engines can share this abstraction

Generic OpenAI API-compatible Engine?

This is riffing off @louis-jan idea last week, of just having a Transform

  • We should allow users to create a new generic OpenAI-compatible API
    • Creates /engines/<name> folder
    • Creates engine entry in cortex.db's Engines table
    • Creates /engines/<name>/settings
      • Takes in URL, API Key
      • Takes in Transform params (this can be overridden if model has model.yaml)

This would allow us to provision generic OpenAI-equivalent API Engines.

@dan-menlo
Copy link
Contributor Author

dan-menlo commented Nov 6, 2024

Concept:

  • cortex.db Engine
  • We "seed" Engines table with Remote Engines
  • Some of these Remote Engines have Models (e.g. OpenAI has o1 model)
    • o1 model has a model.yaml that overrides the Engines' TransformParams
  • Note: Engine Metadata should be versioned

Image

@dan-menlo dan-menlo changed the title planning: Remote API Extensions for Jan & Cortex planning: Remote Engine Extensions for Jan & Cortex Nov 28, 2024
@gabrielle-ong
Copy link
Contributor

Cortex Implementation [WIP]: janhq/cortex.cpp#1662

@dan-menlo dan-menlo changed the title planning: Remote Engine Extensions for Jan & Cortex roadmap: Remote Engine Extensions for Jan & Cortex Nov 28, 2024
@dan-menlo dan-menlo changed the title roadmap: Remote Engine Extensions for Jan & Cortex roadmap: Jan has revamped Remote Engines (e.g. OpenAI, Anthropic etc) Nov 28, 2024
@dan-menlo dan-menlo moved this from Planning to Scheduled in Jan & Cortex Nov 29, 2024
@dan-menlo dan-menlo moved this from Scheduled to In Progress in Jan & Cortex Nov 29, 2024
@imtuyethan imtuyethan added this to the v0.5.12 milestone Dec 10, 2024
@dan-menlo
Copy link
Contributor Author

dan-menlo commented Dec 12, 2024

12 Dec

  • Anthropic should not require a separate C++ file (we should amend the transformReq abstraction)

@vansangpfiev
Copy link

Engines -> Models List

  • Status quo
    • Jan
      • We define/bundle everything in codebase
    • Cortex
      • We are currently defining remote models in Cortex Model Hub
      • Very manual, not ideal
      • This defines API routes and model.yml

All the fields in the model.yml can be generated with default value. Cortex can generate those model.yml files for each engine when user request to set up.

image

cc: @dan-homebrew @nguyenhoangthuan99

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: providers Local & remote inference providers type: planning Discussions, specs and decisions stage
Projects
Status: In Progress
Development

No branches or pull requests

8 participants