Skip to content

Latest commit

 

History

History
12 lines (8 loc) · 959 Bytes

README.md

File metadata and controls

12 lines (8 loc) · 959 Bytes

wav442letter

Fully convolutional speech-to-text model based on Facebook's Wav2Letter. Developed alongside Andrew Schallwig and Matt Palazzolo for EECS 442 at the University of Michigan.

The original paper can be found here.

Our results are summarized below, with Facebook's original results on the left and ours on the right. Our goal was to try to replicate Facebook's results with far fewer computational resources; although clearly not successful, we certainly achieved a decent approximation given that we used 0.3% of the training data and 30% of the trainable parameters of the original model.

Screen Shot 2022-12-18 at 00 20 53

The model was built in PyTorch and trained on the dev-clean subset of the LibriSpeech ASR corpus, available here.