A Python base cli tool for tagging images with llava models.
I make this repo because I want to caption some images cross-platform (On My old MBP, my game win pc or docker base linux cloud-server(like Google colab))
But I don't want to install a huge webui just for this little work. And some cloud-service are unfriendly to gradio base ui.
So this repo born.
Huggingface are original sources, modelscope are pure forks from Huggingface(Because HuggingFace was blocked in Some place).
Model | HuggingFace Link | ModelScope Link |
---|---|---|
llava-v1.6-34B-gguf | HuggingFace | ModelScope |
ggml_llava-v1.5-13b | HuggingFace | ModelScope |
ggml_llava-v1.5-7b | HuggingFace | ModelScope |
make a simple ui by Jupyter widget(When my lazy cancer cured😊)
Python 3.10 works fine.
Open a shell terminal and follow below steps:
# Clone this repo
git clone https://github.com/fireicewolf/llava-caption-cli.git
cd llava-caption-cli
# create a Python venv
python -m venv .venv
.\venv\Scripts\activate
# Install dependencies
# Base dependencies, models for inference will download via python request libs.
pip install -U -r requirements.txt
# If you want to download or cache model via huggingface hub, install this.
pip install -U -r huggingface-requirements.txt
# If you want to download or cache model via modelscope hub, install this.
pip install -U -r modelscope-requirements.txt
This project use llama-cpp-python as base lib, and it needs to be complied.
Make sure your python venv has been activated first!
python caption.py your_datasets_path
To run with more options, You can find help by run with this or see at Options
python caption.py -h
Advance options
`data_path`path for data
--recursive
Will include all support images format in your input datasets path and its sub-path.
config
config json for llava models, default is "default.json"
--use_cpu
Use cpu for inference.
--gpus N
how many gpus used for inference, default is 1.
--split_in_gpus weights
weights to split model in multi-gpus for inference. ex "0.5, 0.5" for 2 gpus balance.
--n_ctx TEXT CONTEXT
Text context, set it larger if your image is large, default is 2048.
--model_name MODEL_NAME
model name for inference, default is "llava-v1.6-34b.Q4_K_M", please check configs/default.json)
--model_site MODEL_SITE
Model site where onnx model download from(huggingface or modelscope), default is huggingface.
--models_save_path MODEL_SAVE_PATH
Path for models to save, default is models(under project folder).
--download_method SDK
Download models via sdk or url, default is sdk.
If huggingface hub or modelscope sdk not installed or download failed, will auto retry with url download.
--use_sdk_cache
Use huggingface or modelscope sdk cache to store models, this option need huggingface_hub or modelscope sdk installed.
If this enabled, --models_save_path
will be ignored.
--custom_model_path CUSTOM_MODEL_PATH
----custom_mmproj_path CUSTOM_MMPROJ_PATH
This two args need to be used together. You can use your exist model.
--custom_caption_save_path CUSTOM_CAPTION_SAVE_PATH
Save caption files to a custom path but not with images(But keep their directory structure)
--log_level LOG_LEVEL
Log level for terminal console and log file, default is INFO
(DEBUG
,INFO
,WARNING
,ERROR
,CRITICAL
)
--save_logs
Save logs to a file, log will be saved at same level with data_dir_path
--caption_extension CAPTION_EXTENSION
Caption file extension, default is .txt
--not_overwrite
Do not overwrite caption file if it existed.
--system_message SYSTEM_MESSAGE
system message for llava model.
--user_prompt USER_PROMPT
user prompt for caption.
--temperature TEMPERATURE
temperature for llava model,default is 0.4.
--max_tokens MAX_TOKENS
max tokens for output.
--verbose
llama-cpp-python verbose mode.
Base on llama-cpp-python
Without their works(👏👏), this repo won't exist.