$ python obj_detect_tracking.py \
--model_path obj_coco_resnet50_partial_tfv1.14_1280x720_rpn300.pb \
--video_dir videos --tracking_dir output/ --video_lst_file videos.lst \
--version 2 --is_coco_model --use_partial_classes --frame_gap 8 \
--is_load_from_pb --get_tracking \
--tracking_objs Person,Vehicle --min_confidence 0.85 \
--resnet50 --rpn_test_post_nms_topk 300 --max_size 1280 --short_edge_size 720 \
--use_lijun_video_loader --nms_max_overlap 0.85 --max_iou_distance 0.5 \
--max_cosine_distance 0.5 --nn_budget 5
This is for processing AVI videos. For MP4 videos, run without --use_lijun
.
Add --log_time_and_gpu
to get GPU utilization and time profile.
The EfficientDet (CVPR 2020) (D7) is reported to be more than 12 mAP better than the Resnet-50 FPN model we used on COCO.
I have made the following changes based on the code from early May:
- Added multi-level ROI align with the final detection boxes since we need the FPN box features for deep-SORT tracking. Basically since one-stage object detection models have box predictions at each feature level, I added a level index variable to keep track of each box's feature level so that in the end they can be efficiently backtracked to the original feature map and crop the features.
- Similar to the MaskRCNN model, I modified the EfficientDet to allow NMS on only some of the COCO classes (currently we only care about person and vehicle) and save computations.
Example command [d0 model from early May]:
$ python obj_detect_tracking.py \
--model_path efficientdet-d0 \
--efficientdet_modelname efficientdet-d0 --is_efficientdet \
--efficientdet_max_detection_topk 5000 \
--video_dir videos --tracking_dir output/ --video_lst_file videos.lst \
--version 2 --is_coco_model --use_partial_classes --frame_gap 8 \
--get_tracking --tracking_objs Person,Vehicle --min_confidence 0.6 \
--max_size 1280 --short_edge_size 720 \
--use_lijun_video_loader --nms_max_overlap 0.85 --max_iou_distance 0.5 \
--max_cosine_distance 0.5 --nn_budget 5
This is for processing AVI videos. I have tried it with pyav==6.2.0. Install it by
$ sudo apt-get install -y \
libavformat-dev libavcodec-dev libavdevice-dev \
libavutil-dev libswscale-dev libswresample-dev libavfilter-dev
$ sudo pip install av==6.2.0
For MP4 videos, run without --use_lijun
.
Add --log_time_and_gpu
to get GPU utilization and time profile.
Example command with a partial frozen graph [d0-TFv1.15] (slightly faster):
$ python obj_detect_tracking.py \
--model_path efficientd0_tfv1.15_1280x720.pb --is_load_from_pb \
--efficientdet_modelname efficientdet-d0 --is_efficientdet \
--efficientdet_max_detection_topk 5000 \
--video_dir videos --tracking_dir output/ --video_lst_file videos.lst \
--version 2 --is_coco_model --use_partial_classes --frame_gap 8 \
--get_tracking --tracking_objs Person,Vehicle --min_confidence 0.6 \
--max_size 1280 --short_edge_size 720 \
--use_lijun_video_loader --nms_max_overlap 0.85 --max_iou_distance 0.5 \
--max_cosine_distance 0.5 --nn_budget 5
[05/04/2020] Tried to optimize the frozen model with TensorRT by:
$ python tensorrt_optimize_tf1.15.py efficientd0_tfv1.15_1280x720.pb \
efficientd0_tfv1.15_1280x720_trt_fp16.pb --precision_mode FP16
But it does not work:
2020-05-04 22:11:48.850233: F tensorflow/core/framework/op_kernel.cc:875] Check failed: mutable_output(index) == nullptr (0x7f82d4244ff0 vs. nullptr)
Aborted (core dumped)
Run object detection and visualization on images. This could be used to reproduce the official repo's tutorial output:
$ python obj_detect_imgs.py --model_path efficientdet-d0 \
--efficientdet_modelname efficientdet-d0 --is_efficientdet \
--img_lst imgs.lst --out_dir test_d0_json \
--visualize --vis_path test_d0_vis --vis_thres 0.4 \
--max_size 1920 --short_edge_size 1080 \
--efficientdet_max_detection_topk 5000
- VIRAT
Models | COCO-validation-AP-80classes | VIRAT Person-Val-AP | VIRAT Vehicle-Val-AP | VIRAT Bike-Val-AP |
MaskRCNN, R50-FPN | 0.389 | 0.374 | 0.943 | 0.367 |
MaskRCNN, R101-FPN | 0.407 | 0.378 | 0.947 | 0.399 |
EfficientDet-d2 | 0.425 | 0.371 | 0.949 | 0.293 |
EfficientDet-d6 | 0.513 | 0.422 | 0.947 | 0.355 |
- AVA-Kinetics
Models | COCO-validation-AP-80classes | AVA-Kinetics Train-Person-AP | AVA-Kinetics Val-Person-AP |
MaskRCNN, R101-FPN | 0.407 | 0.664 | 0.682 |
EfficientDet-d2 | 0.425 | 0.650 | 0.680 |
EfficientDet-d6 | 0.513 | 0.623 | 0.658 |
VIRAT consists of mostly small person boxes, while AVA-Kineitcs has much bigger ones. So it seems EfficientDet is slightly better on detecting small person. However, EfficientDet-d6 is about 2.4x the inference time of MaskRCNN-R101-FPN.