This repository contains the code and data for my "Machine Learning for Handling Missing Data in Wearable Electromyographic Systems" final project.
The project can be summarized as follows. First, the missing data is simulated under two different assumptions: Missing Completely at Random, and Missing Under Gait Assumptions. For each assumption, this project implements five different imputation methods: zero-imputation, mean-imputation, K-Nearest Neighbors, Gaussian Process, and Predictive Mean Matching on the raw EMG signal and study their effectiveness.
The final report is available here: https://drive.google.com/file/d/1opO8s8WlVKeHu3qXj9SBhkTk5SYA-hKq/view?usp=sharing
-
FuncAndParams.py - This file contains functions and parameters needed to run the rest of the files. Please run this file first.
-
GPboundssim.py - This file contains the simulation to find the optimal bounds for Gaussian Process length scale hyperparameters. Please run this file second.
-
MainSimulation.py - This file contain code that generate results when data under missing-completely-at-random assumptions.
-
GaitMISim.py - This file contain code that generate results when data under gait assumptions.
-
visknndistanalysis.py - This file contains the visualizations, and distributional analysis in the Results section.
-
PCAvsLDA.py - This file contains the simulation to determine between LDA and PCA.
-
CompAna.py - This file generate running time analysis.
-
The rest are data files. Please run the codes in the same fold with the data files.