Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
mask_rcnn_dinov2_adapter_base_fpn_3x_coco.py		mask_rcnn_dinov2_adapter_base_fpn_3x_coco.py
mask_rcnn_dinov2_adapter_large_fpn_3x_coco.py		mask_rcnn_dinov2_adapter_large_fpn_3x_coco.py
mask_rcnn_dinov2_adapter_small_fpn_3x_coco.py		mask_rcnn_dinov2_adapter_small_fpn_3x_coco.py

README.md

ViT-Adapter with DINOv2

Preparation

Please download the DINOv2 pretrained weights into the pretrained/ folder:

model	# of params	ImageNet k-NN	ImageNet linear	download
ViT-S/14 distilled	21 M	79.0%	81.1%	backbone only
ViT-B/14 distilled	86 M	82.1%	84.5%	backbone only
ViT-L/14 distilled	300 M	83.5%	86.3%	backbone only
ViT-g/14	1,100 M	83.5%	86.5%	backbone only

Then convert these models to have patch size 16:

python convert_14to16.py pretrained/dinov2_vits14_pretrain.pth
python convert_14to16.py pretrained/dinov2_vitb14_pretrain.pth
python convert_14to16.py pretrained/dinov2_vitl14_pretrain.pth
python convert_14to16.py pretrained/dinov2_vitg14_pretrain.pth

After that, the directory structure is:

detection
├── pretrained
│   └── dinov2_vits14_pretrain.pth
│   └── dinov2_vitb14_pretrain.pth
│   └── dinov2_vitl14_pretrain.pth
│   └── dinov2_vitg14_pretrain.pth
│   └── dinov2_vits14_pretrain_14to16.pth
│   └── dinov2_vitb14_pretrain_14to16.pth
│   └── dinov2_vitl14_pretrain_14to16.pth
│   └── dinov2_vitg14_pretrain_14to16.pth
└── convert_14to16.py

Results and Models

Backbone	Pretrain	Lr schd	box AP	mask AP	#Param	Config	Download
ViT-Adapter-S	DeiT-S	3x+MS	48.2	42.8	48M	config	ckpt
ViT-Adapter-S	DINOv2-S	3x+MS	51.5 (+3.3)	45.6 (+2.8)	48M	config	ckpt \| log
ViT-Adapter-B	DeiT-B	3x+MS	49.6	43.6	120M	config	ckpt
ViT-Adapter-B	DINOv2-B	3x+MS	54.1 (+4.5)	47.8 (+4.2)	120M	config	ckpt \| log
ViT-Adapter-L	AugReg-L	3x+MS	52.1	46.0	348M	config	ckpt \| log
ViT-Adapter-L	DINOv2-L	3x+MS	55.3 (+3.2)	49.0 (+3.0)	348M	config	ckpt \| log

Note that, the hyper-parameter layer_decay_rate significantly impacts on the performance of DINOv2. For example, for the ViT-Adapter-S with DINOv2-S, the box AP of different layer_decay_rate are:

Backbone	Pretrain	0.70	0.75	0.80	0.90	0.95
ViT-Adapter-S	DINOv2-S	51.5	51.0	50.8	49.4	48.8

Perhaps further reducing layer_decay_rate will continue to improve performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dinov2

dinov2

README.md

ViT-Adapter with DINOv2

Preparation

Results and Models

Files

dinov2

Directory actions

More options

Directory actions

More options

Latest commit

History

dinov2

Folders and files

parent directory

README.md

ViT-Adapter with DINOv2

Preparation

Results and Models