Release v0.2.12 · bentoml/OpenLLM

News

OpenLLM now release a base container containing all compiled kernels, removing the needs for building kernels with openllm build when using vLLM or auto-gptq

vLLM supports (experimental)

Currently, only OPT and Llama 2 supports vLLM. Simply use OPENLLM_LLAMA_FRAMEWORK=vllm to startup openllm runners with vllm.

Installation

pip install openllm==0.2.11

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.2.11

Usage

All available models: openllm models

To start a LLM: python -m openllm start opt

To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it --entrypoint=/bin/bash -P ghcr.io/bentoml/openllm:0.2.12 openllm --help

Find more information about this release in the CHANGELOG.md

New Contributors

@RichardScottOZ made their first contribution in #155

Full Changelog: v0.2.11...v0.2.12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.12

News

vLLM supports (experimental)

Installation

Usage

New Contributors

Contributors