Sili Chen · Hengkai Guo† · Shengnan Zhu · Feihu Zhang
Zilong Huang · Jiashi Feng · Bingyi Kang†
ByteDance
†Corresponding author
This work presents Video Depth Anything based on Depth Anything V2, which can be applied to arbitrarily long videos without compromising quality, consistency, or generalization ability. Compared with other diffusion-based models, it enjoys faster inference speed, fewer parameters, and higher consistent depth accuracy.
- 2025-01-21: Paper, project page, code, models, and demo are all released.
We provide two models of varying scales for robust and consistent video depth estimation:
Model | Params | Checkpoint |
---|---|---|
Video-Depth-Anything-V2-Small | 28.4M | Download |
Video-Depth-Anything-V2-Large | 381.8M | Download |
git clone https://github.com/DepthAnything/Video-Depth-Anything
cd Video-Depth-Anything
pip install -r requirements.txt
Download the checkpoints listed here and put them under the checkpoints
directory.
bash get_weights.sh
python3 run.py --input_video ./assets/example_videos/davis_rollercoaster.mp4 --output_dir ./outputs --encoder vitl
If you find this project useful, please consider citing:
@article{video_depth_anything,
title={Video Depth Anything: Consistent Depth Estimation for Super-Long Videos},
author={Chen, Sili and Guo, Hengkai and Zhu, Shengnan and Zhang, Feihu and Huang, Zilong and Feng, Jiashi and Kang, Bingyi}
journal={arXiv:2501.12375},
year={2025}
}
Video-Depth-Anything-Small model is under the Apache-2.0 license. Video-Depth-Anything-Large model is under the CC-BY-NC-4.0 license.