Skip to content

Latest commit

 

History

History
173 lines (148 loc) · 9 KB

README.md

File metadata and controls

173 lines (148 loc) · 9 KB

TSMF Model

PWC PWC

This repository holds the codebase, dataset and models for the paper:

Multimodal Fusion via Teacher-Student Network for Indoor Action Recognition Bruce X.B. Yu, Yan Liu, Keith C.C. Chan, AAAI 2021 (PDF)

Presentation is available on SlidesLive

Abstract

Indoor action recognition plays an important role in modern society, such as intelligent healthcare in large mobile cabin hospital. With the wide usage of depth sensors like Kinect, multimodal information including skeleton and RGB modalities brings a promising way to improve the performance. However, existing methods are either focusing on single data modality or failed to take the advantage of multiple data modalities. In this paper, we propose a Teacher-Student Multimodal Fusion (TSMF) model that fuses the skeleton and RGB modalities at model level for indoor action recognition. In our TSMF, we utilize a teacher network to transfer the structural knowledge of the skeleton modality to a student network for the RGB modality. With extensively experiments on two benchmarking datasets: NTU RGB+D and PKU-MMD, results show that the proposed TSMF consistently performs better than state-of-the-art single modal and multimodal methods. It also indicates that our TSMF could not only improve the accuracy of the student network but also significantly improve the ensemble accuracy.

Prerequisites

  • Python3 (>3.5)
  • PyTorch
  • Openpose with Python API. (Optional: for evaluation purpose)
  • Other Python libraries can be installed by pip install -r requirements.txt

Installation

cd torchlight; python setup.py install; cd ..

Get pretrained models

Download pretrained models from GoogleDrive, and manually put them into ./trained_models.

Data Preparation

Datasets

NTU RGB+D

NTU RGB+D can be downloaded from their website. The 3D skeletons(5.8GB) modality and the RGB modality are required in our experiments. After that, this command should be used to build the database for training or evaluation:

python tools/ntu_gendata.py --data_path <path to nturgbd+d_skeletons>

where the <path to nturgbd+d_skeletons> points to the 3D skeletons modality of NTU RGB+D dataset you download.

For evaluation, the processed data includes: val_data and val_label are available from GoogleDrive. Please manually put it in folder: ./data/NTU_RGBD

PKU-MMD

The dataset can be found in PKU-MMD. PKU-MMD is a large action recognition dataset that contains 1076 long video sequences in 51 action categories, performed by 66 subjects in three camera views. It contains almost 20,000 action instances and 5.4 million frames in total. We transfer the 3D skeleton modality to seperate action repetition files with the command:

python tools/utils/skeleton_to_ntu_format.py

After that, this command should be used to build the database for training or evaluation:

python tools/pku_gendata.py --data_path <path to pku_mmd_skeletons>

where the <path to nturgbd+d_skeletons> points to the 3D skeletons modality of PKU-MMD dataset you processed with the above command.

For evaluation, the processed data includes: val_data and val_label are available from GoogleDrive. Please manually put it in folder: ./data/PKU_MMD

2D Skeleton Retrieval from the RGB Video Input

After installed the Openpose tool, run

su
sh tools/2D_Retrieve_<dataset>.sh

where the <dataset> must be ntu_rgbd or pku_mmd, depending on the dataset you want to use. We also provide the retrieved OpenPose 2D skeleton data for both datasets, which could downloaded from GoogleDrive (OpenPose NTU RGB+D) and GoogleDrive (OpenPose PKU-MMD).

Generate Region of Interest

python tools/gen_fivefs_<dataset>

where the <dataset> must be ntu_rgbd or pku_mmd, depending on the dataset you want to use.

The processed ROI of NTU-RGB+D is available from GoogleDrive; The processed ROI of PKU-MMD is available from GoogleDrive.

Testing Pretrained Models

Evaluate on NTU-RGB+D

For cross-subject evaluation in NTU RGB+D, run

python main_student.py recognition -c config/ntu_rgbd/xsub/student_test.yaml

Check the emsemble:

python ensemble.py --datasets ntu_xsub

For cross-view evaluation in NTU RGB+D, run

python main_student.py recognition -c config/ntu_rgbd/xview/student_test.yaml

Check the ensemble:

python ensemble.py --datasets ntu_xview

Evaluate on PKU-MMD

For cross-subject evaluation in PKU-MMD, run

python main_student.py recognition -c config/pku_mmd/xsub/student_test.yaml

Check the emsemble:

python ensemble.py --datasets pku_xsub

For cross-view evaluation in PKU-MMD, run

python main_student.py recognition -c config/pku_mmd/xview/student_test.yaml

Check the emsemble:

python ensemble.py --datasets pku_xview

Training

To train a new TSMF model, run

python main_student.py recognition -c config/<dataset>/train_student.yaml [--work_dir <work folder>]

where the <dataset> must be ntu_rgbd/xsub, ntu_rgbd/xview, pku_mmd/xsub or pku_mmd/xview, depending on the dataset you want to use. The training results, including model weights, configurations and logging files, will be saved under the ./work_dir by default or <work folder> if you appoint it.

You can modify the training parameters such as work_dir, batch_size, step, base_lr and device in the command line or configuration files. The order of priority is: command line > config file > default parameter. For more information, use main_student.py -h.

Evaluation

Finally, custom model evaluation can be performed by this command as we mentioned above:

python main_student.py recognition -c config/<dataset>/student_test.yaml --weights <path to model weights>

Acknowledgements

This repo is based on

Thanks to the original authors for their work!

Citation

Please cite the following paper if you use this repository in your reseach.

@inproceedings{bruce2021multimodal,
  title={Multimodal Fusion via Teacher-Student Network for Indoor Action Recognition},
  author={Bruce, XB and Liu, Yan and Chan, Keith CC},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={35},
  number={4},
  pages={3199--3207},
  year={2021}
}
@ARTICLE{9782511,
  author={Yu, Bruce X.B. and Liu, Yan and Zhang, Xiang and Zhong, Sheng-hua and Chan, Keith C.C.},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, 
  title={MMNet: A Model-based Multimodal Network for Human Action Recognition in RGB-D Videos}, 
  year={2022},
  volume={},
  number={},
  pages={1-1},
  doi={10.1109/TPAMI.2022.3177813}
  }

Contact

For any question, feel free to contact Bruce Yu: b r u c e y o 2011 AT gmail.com(remove space)