Authors: Henrik Høiness (henrhoi), Axel Harstad (axeloh) and Marius Landsverk Cervera (mariusmcl)
Deep learning models trained on visual data have changed the field of computer vision and is now finding more and more ways into consumer products and business applications.
Humans however are heavily dependent on audio data to navigate their lives, and being able to create powerful models for raw audio data has numerous applications ranging from the arts to new business opportunities. In this work we expand upon recent architectural advances (Aaron van den Oord et. al.: WaveNet: A Generative Model for Raw Audio (2016) ) and propose a WaveNet-like autoencoder with a shared encoder and multiple decoders to perform style transfer between multiple musical instruments. In addition, we explore this models' potential of producing automatic composition of music, by reducing the latent space used in the network.
Our method consists in training multiple decoders, one for each domain, together with a shared encoder. In order to enforce a disentangled latent representation, a domain classifier is trained to classify the latent representations' domain. Also, the audio input is augmented to further enforce the encoder to encode in a high-level, semantic way.
The architecture is heavily inspired by Noem Mor et al.: A Universal Music Translation Network (2018) (repo).
All samples in this repo are from a WaveNet Autoencoder trained 4 days on two Tesla V100 with two decoders for the two domains Bach Solo Cello and Beethoven Solo Piano.
You can listen to the samples and see the architecture used here.
Dataset used in experiments is Musicnet
, which can be found here
To preprocess and extract domains from the dataset, run the scripts below in the following order:
$ python preprocessing/seperate_domains.py
$ python preprocessing/train_test_val_split.py
$ python preprocessing/preprocess.py
Pre-trained models can be downloaded here and place it under checkpoints/trained_models
To train on a single GPU-node run:
$ python train_music_translation.py
In order to generate samples from dataset/samples/input
, run:
$ python generate_style_and_instrument_transfer.py