VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos

This is the official implementation for VideoTree

Authors: Ziyang Wang, Shoubin Yu, Elias Stengel-Eskin*, Jaehong Yoon, Feng Cheng, Gedas Bertasius, Mohit Bansal

University of North Carolina at Chapel Hill

We introduce VideoTree, a query-adaptive and hierarchical framework for long-video understanding with LLMs. Specifically, VideoTree dynamically extracts query-related information from the input video and builds a tree-based video representation for LLM reasoning.

Installation

Install environment.

Python 3.8 or above is required.

git clone https://github.com/Ziyang412/VideoTree.git
cd VideoTree

python3 -m venv videetree_env
source activate videetree_env/bin/activate
pip install openai
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install pandas
pip install transformers==4.28.1
pip install accelerate

Download dataset annotations and extracted captions.

Download data.zip from the File LLoVi provided.

unzip data.zip

You could extract captions for EgoSchema at ./data. It also contains dataset annotations.

Specifically, LaViLa base model is leveraged to extract EgoSchema captions at 1 FPS.

Download EgoSchema Videos. Please follow EgoSchema to download the orginal EgoSchema videos. After downloading, please extract the videos into 1 FPS video frames (save in image format for faster loading speed). Please save in the format of ./data/egoschema_frames/{video_id}/{frame_id}.jpg. Then, to further speed up the tree building process, we extract the visual features for each frame using EVA-CLIP-8B and save the features in ./data/egoschema_features/{video_id}.pt.

python data_extraction/extract_images.py
python data_extraction/extract_features.py

Update Kmeans-pytorch

Since the orginal Kmeans-pytorch package doesn't set a iteration limit and will cause perpetual loop issue, we update the init file of the original kmeans-pytorch package.

git clone https://github.com/subhadarship/kmeans_pytorch
cd kmeans_pytorch

Please replace the init file in "kmeans_pytorch" folder with the file we provide in "./kmeans_pytorch" folder (this repo). And run the following command.

pip install --editable .

Future plans

Due to the limit of time, we are still updating the codebase. We will also incorporate the scipts/captions for NeXT-QA and IntentQA in the future.

Experiments

Adaptive Breath Exapnsion

Please update the feature, asgs (in util.py) and output path before running the code.

sh scripts/breath_expansion.sh

Relevance-based Depth Expansion

Please update the feature, the output of last step (the relevance output path and first level cluster information) and output path before running the code.

python depth_expansion.py

LLM Reasoning

Please update the tree node index file (output of last step), data files and output path before running the code.

sh scripts/egoschema_qa.sh

Debug

--save_info: save more information, e.g. token usage, detailed prompts, etc.
--num_examples_to_run: how many examples to run. -1 (default) to run all.
--start_from_scratch: ignore existing output files. Start from scratch.

Acknowledgments

We thank the developers of LLoVi, LifelongMemory, EVA-CLIP, Kmeans-pytorch and SKlearn Clustering for their public code release. We also thank the authors of VideoAgent for the helpful discussion.

Reference

Please cite our paper if you use our models in your works:

@article{wang2024videotree,
  title={VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos},
  author={Wang, Ziyang and Yu, Shoubin and Stengel-Eskin, Elias and Yoon, Jaehong and Cheng, Feng and Bertasius, Gedas and Bansal, Mohit},
  journal={arXiv preprint arXiv:2405.19209},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
asset		asset
data_extraction		data_extraction
kmeans_pytorch		kmeans_pytorch
results		results
script		script
LICENSE		LICENSE
README.md		README.md
adaptive_breath_expansion.py		adaptive_breath_expansion.py
dataset.py		dataset.py
depth_expansion.py		depth_expansion.py
eval.py		eval.py
main_qa.py		main_qa.py
model.py		model.py
prompts.py		prompts.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos

Authors: Ziyang Wang, Shoubin Yu, Elias Stengel-Eskin*, Jaehong Yoon, Feng Cheng, Gedas Bertasius, Mohit Bansal

University of North Carolina at Chapel Hill

Installation

Update Kmeans-pytorch

Future plans

Experiments

Adaptive Breath Exapnsion

Relevance-based Depth Expansion

LLM Reasoning

Debug

Acknowledgments

Reference

About

Releases

Packages

Contributors 2

Languages

License

Ziyang412/VideoTree

Folders and files

Latest commit

History

Repository files navigation

VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos

Authors: Ziyang Wang*, Shoubin Yu*, Elias Stengel-Eskin*, Jaehong Yoon, Feng Cheng, Gedas Bertasius, Mohit Bansal

University of North Carolina at Chapel Hill

Installation

Update Kmeans-pytorch

Future plans

Experiments

Adaptive Breath Exapnsion

Relevance-based Depth Expansion

LLM Reasoning

Debug

Acknowledgments

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Authors: Ziyang Wang, Shoubin Yu, Elias Stengel-Eskin*, Jaehong Yoon, Feng Cheng, Gedas Bertasius, Mohit Bansal

Packages