Please download the DINOv2 pretrained weights into the pretrained/
folder:
model | # of params |
ImageNet k-NN |
ImageNet linear |
download |
---|---|---|---|---|
ViT-S/14 distilled | 21 M | 79.0% | 81.1% | backbone only |
ViT-B/14 distilled | 86 M | 82.1% | 84.5% | backbone only |
ViT-L/14 distilled | 300 M | 83.5% | 86.3% | backbone only |
ViT-g/14 | 1,100 M | 83.5% | 86.5% | backbone only |
Then convert these models to have patch size 16:
python convert_14to16.py pretrained/dinov2_vits14_pretrain.pth
python convert_14to16.py pretrained/dinov2_vitb14_pretrain.pth
python convert_14to16.py pretrained/dinov2_vitl14_pretrain.pth
python convert_14to16.py pretrained/dinov2_vitg14_pretrain.pth
After that, the directory structure is:
detection
├── pretrained
│ └── dinov2_vits14_pretrain.pth
│ └── dinov2_vitb14_pretrain.pth
│ └── dinov2_vitl14_pretrain.pth
│ └── dinov2_vitg14_pretrain.pth
│ └── dinov2_vits14_pretrain_14to16.pth
│ └── dinov2_vitb14_pretrain_14to16.pth
│ └── dinov2_vitl14_pretrain_14to16.pth
│ └── dinov2_vitg14_pretrain_14to16.pth
└── convert_14to16.py
Backbone | Pretrain | Lr schd | box AP | mask AP | #Param | Config | Download |
---|---|---|---|---|---|---|---|
ViT-Adapter-S | DeiT-S | 3x+MS | 48.2 | 42.8 | 48M | config | ckpt |
ViT-Adapter-S | DINOv2-S | 3x+MS | 51.5 (+3.3) | 45.6 (+2.8) | 48M | config | ckpt | log |
ViT-Adapter-B | DeiT-B | 3x+MS | 49.6 | 43.6 | 120M | config | ckpt |
ViT-Adapter-B | DINOv2-B | 3x+MS | 54.1 (+4.5) | 47.8 (+4.2) | 120M | config | ckpt | log |
ViT-Adapter-L | AugReg-L | 3x+MS | 52.1 | 46.0 | 348M | config | ckpt | log |
ViT-Adapter-L | DINOv2-L | 3x+MS | 55.3 (+3.2) | 49.0 (+3.0) | 348M | config | ckpt | log |
Note that, the hyper-parameter layer_decay_rate
significantly impacts on the performance of DINOv2. For example, for the ViT-Adapter-S
with DINOv2-S
, the box AP of different layer_decay_rate
are:
Backbone | Pretrain | 0.70 | 0.75 | 0.80 | 0.90 | 0.95 |
---|---|---|---|---|---|---|
ViT-Adapter-S | DINOv2-S | 51.5 | 51.0 | 50.8 | 49.4 | 48.8 |
Perhaps further reducing layer_decay_rate
will continue to improve performance.