Official implementation of Annotation-free Audio-Visual Segmentation .
This paper has been accepted by WACV 2024, the project page is https://jinxiang-liu.github.io/anno-free-AVS/.
Create a conda environment and install dependencies:
conda create -n sama python=3.10.11
conda activate sama
pip install -r requirements.txt
- AVSBench
- Please refer to https://github.com/OpenNLPLab/AVSBench to download the AVSBench dataset.
- Please download re-organized split files with the OneDrive link which might be helpful.
- AVS-Synthetic
- Please download the dataset from https://zenodo.org/record/8125822.
After downloading the datasets with annotations, please declare the directory and file locations in the configs/sam_avs_adapter.yaml
file.
Model weights: All the weights including the image backbone from SAM, audio backbone for VGGish and our pretrained models are obtained with the OneDrive link.
- Please place
vggish-10086976.pth
andsam_vit_h_4b8939.pth
inassets
sub-folder. - Please place the pretrained model weights in
ckpts
sub-folder.
- Test on AVS-Synthetic test set
bash scripts/synthetic_test.sh
- Test on AVSBench S4 test set
bash scripts/s4_test.sh
- Test on AVSBench MS3 test set
bash scripts/ms3_test.sh
- Train AVS-Synthetic
bash scripts/synthetic_train.sh
- Train AVSBench S4
bash scripts/s4_train.sh
- Train AVSBench MS3
bash scripts/ms3_train.sh
@inproceedings{liu2024annotation,
title={Annotation-free audio-visual segmentation},
author={Liu, Jinxiang and Wang, Yu and Ju, Chen and Ma, Chaofan and Zhang, Ya and Xie, Weidi},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
pages={5604--5614},
year={2024}
}
If you have any question, feel free to contact jinxliu#sjtu.edu.cn
(replace #
with @
).