-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SpecAugment Layer added #93
base: kapre-0.3.3
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the initiative. Because the target branch already had keras.augmentation
, there's a merge conflict but it wouldn't be anything too annoying.
Ok, let me know if you need something else, I created a branch from master and there was no augmentation :D I didn't see 0.3.3 branch hh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the following up. I requested some changes. I acknowledge that this could be not a very little amount of work, but please understand that I have to be responsible for the consistency of the package.
Besides the comments, some unit tests for the expected behaviors of the function and layer + save/load test of the layer is a mandatory, as done for other layers.. so that we can trust Kapre
!
"""Apply masking to a spectrogram in the freq domain. | ||
TensorFlow/io | ||
|
||
def random_masking_along_axis(input_f, param, axis, name='masking'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as in the example i prototyped (
def random_masking_along_axis(x, axis, x_min=0, x_max=None, max_width=None):
if x_max is None:
x_max = x.shape[axis]
if max_width is None:
max_width = (x_max - x_min) // 4 # 4 is an arbitrary choice though
# do the work..
```)
I think this `param` should be unpacked and listed in this API.
|
||
def random_masking_along_axis(input_f, param, axis, name='masking'): | ||
""" | ||
Apply masking to a spectrogram in the time/freq domain. | ||
Args: | ||
input: An audio spectogram. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name mistmatch. also, the format would be..
input (`Tensor`): Audio input spectrogram
Returns: | ||
A tensor of spectrogram. | ||
""" | ||
# TODO: Support audio with channel > 1. | ||
freq_max = tf.shape(input_f)[1] | ||
_max = tf.shape(input_f)[axis + 1] | ||
_shape = [-1, 1, 1, 1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to reshape the input tensor. We can instead keep the code cleaner/shorter by just passing the right axis
argument in the layer implementation. At the same time, this function would just blindly do masking over the specified axis
.
) | ||
indices = tf.reshape(tf.range(freq_max), (-1,freq_max,1,1)) | ||
indices = tf.reshape(tf.range(_max), tuple(_shape)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now because we're taking an arbitrary shaped input tensor, the reshaping should be taken care well. and that should be verified with unittest
.
@@ -40,10 +45,9 @@ def __init__( | |||
|
|||
self.freq_param = freq_param |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the backend API changes, the API of this layer should follow. Maybe it should be like
def __init__(self, time_min=0, time_max=None, time_max_width=None, freq_min=0, freq_max=None, freq_max_width=None, data_format='default', **kwargs):
where if time_min
is None
, it doesn't mask over time (and same for frequency axis).
Please note that all Kapre
layers are compatible with both channels_first
and channels_last
data format, so this one should be, too.
5f3f956
to
6f45b89
Compare
Any progress on this PR? Having SpecAugment as Keras layer would be very useful for the DL-based audio processing community... |
Hi bagustris, unfortunately I was working in a project at that moment and I didn't have time to do proper test and documentation as per requested, however there is a functional version, I would really appreciate if you can continue with the work and make a pull request in the repository, please check my code https://github.com/Toku11/kapre/blob/d64e19d4917a5c7f8f109b2cfe5b7e06d118b8e2/kapre/augmentation.py#L21 |
Hi, it's a pity I hadn't seen this pull request before. A couple of days ago I uploaded a package to Pypi where I do just this, that is, a custom layer of tensorflow.keras that implements the SpecAugment technique. This is the repo if you want to take a look: https://github.com/MichaelisTrofficus/spec_augment If you see that it can be useful I can adapt it to kapre and make a new pull request with proper testing and documentation or simply continue this one but with my own implementation. |
I didn't add white noise layer because it has been removed