Multichannel Audio Signal Augmentation Python Module Wavaugmentate
Introduction
In recent years, neural networks (AI and friends) have become indispensable tools for solving audio processing problems such as speech recognition, sound classification, and audio signal enhancement. However, successful operation of these models requires a large amount of high-quality training data. Unfortunately, real data is often incomplete, noisy, or unbalanced, which can negatively affect the accuracy of the model. To overcome this problem, data augmentation is a technique used — artificially increasing the diversity of the training set by adding various modifications to the original data.
There are many Python libraries that help in the process of augmenting audio data. For example, popular tools such as librosa and torchaudio provide a wide range of functions for working with audio, including tempo change, pitch change, noise imposition, and other transformations. These libraries have proven themselves among developers and researchers due to their flexibility and ease of use.
However, it is worth noting the emergence of a new module called Wavaugmentate, which is an innovative tool for augmenting audio data. This package differs from its predecessors in its ability to work with multi-channel audio, which opens up new horizons for creating complex and realistic audio data. Thanks to the support of multi-channel formats, Wavaugmentate allows you to create more complex and varied audio transformations than traditional single-channel approaches.
Therefore, choosing the right tool for augmenting audio data depends on the specific requirements of the project. In this article, we will look at the various augmentation methods available in Wavaugmentate.
About Wavaugmentate
The module makes audio signal augmentation conversions. It provides the MultiChannelSignal, SignalAugmentation classes and wavaug-cli console utility.
- MultiChannalSignal provides basic operations with multi-channel signals.
- SignalAugmentation helps to perform augmentation of multi-channel signals for AI models learning purpose.
PyPi: https://pypi.org/project/wavaugmentate
GitHub: https://github.com/chetverovod/wavaugmentate
Installation
pip install wavaugmentate
Input Data
WAV-file or NumPy array.
Array shape: (num_channels, num_samples).
Output Data
Same types as in section Input data.
Supported Augmentation Methods
- Amplitude (volume change, inversion).
- Time shift.
- Echo.
- Adding noise.
- Time stretching. (not implemented yet)
- Tempo change. (not implemented yet)
- Pitch shift. (not implemented yet)
- Adding silence.
- Frequency masking. (not implemented yet)
- Time masking. (not implemented yet)
- Combinations of methods.
Additional Functionality
- Generation multichannel tonal signals of desired frequency, amplitude, durance.
- Generation multichannel speech-like signals of desired formants frequency, amplitude, durance.
Interfaces
Signal augmentation can be applied by two ways:
- As python module Mcs, Aug classes methods.
- As console application wavaugmentate with CLI interface options.
Python Module
rom wavaugmentate.mcs import MultiChannelSignal as Mcs
from wavaugmentate.aug import SignalAugmentation as Aug
# File name of original sound.
file_name = "./outputwav/sound.wav"
# Create Mcs-object.
mcs = Mcs()
# Read WAV-file to Mcs-object.
mcs.read(file_name)
# Change quantity of channels to 7.
mcs.split(7)
# Create augmentation object.
aug = Aug(mcs)
# Apply delays.
# Corresponds to channels quantity.
delay_list = [0, 150, 200, 250, 300, 350, 400]
aug.delay_ctrl(delay_list)
# Apply amplitude changes.
# Corresponds to channels quantity.
amplitude_list = [1, 0.17, 0.2, 0.23, 0.3, 0.37, 0.4]
aug.amplitude_ctrl(amplitude_list)
# Augmentation result saving by single file, containing 7 channels.
aug.get().write(sound_aug_file_path)
# Augmentation result saving to 7 files, each 1 by channel.
# ./outputwav/sound_augmented_1.wav
# ./outputwav/sound_augmented_2.wav and so on.
aug.get().write_by_channel(sound_aug_file_path)
Original signal is shown on picture:
Output signal with augmented data (channel 1 contains original signal without changes):
The same code as a chain of operations with the same result, Example 2:
from wavaugmentate.mcs import MultiChannel as Mcs
from wavaugmentate.aug import SignalAugmentation as Aug
# File name of original sound.
file_name = "./outputwav/sound.wav"
delay_list = [0, 150, 200, 250, 300, 350, 400]
amplitude_list = [1, 0.17, 0.2, 0.23, 0.3, 0.37, 0.4]
# Apply all transformations of Example 1 in chain.
ao_obj = Aug(Mcs().rd(file_name))
ao_obj.splt(7).dly(delay_list).amp(amplitude_list).get().wr(
"sound_augmented_by_chain.wav"
)
# Augmentation result saving to 7 files, each 1 by channel.
ao_obj.get().wrbc("sound_augmented_by_chain.wav")
Command Line Interface
Use for help:
wavaug-cli -h
command line interface provides the same functionality.
Example 3 (procedural approach):
wavaug-cli -i ./test_sounds/test_sound_1.wav -o ./outputwav/out.wav -d "100, 200, 300, 400"
wavaug-cli -i ./outputwav/out.wav -o ./outputwav/out.wav -a "0.1, 0.2, 0.3, 0.4"
Example 4 (OOP approach as chain operations):
wavaug-cli -c 'rd("./test_sounds/test_sound_1.wav").dly([100, 200, 300, 400]).amp([0.1, 0.2, 0.3, 0.4]).wr("./outputwav/sound_delayed.wav")'
How To
in the next example amplitudes and delays will be augmented by code shown in example 5.
Example 5 (single file augmentation):
from wavaugmentate.mcs import MultiChannel as Mcs
from wavaugmentate.aug import SignalAugmentation as Aug
file_name = "./outputwav/sound.wav"
mcs = Mcs()
mcs.rd(file_name) # Read original file with single channel.
file_name_head = "sound_augmented"
# Suppose we need 15 augmented files.
aug_count = 15
for i in range(aug_count):
signal = Aug(mcs.copy())
# Apply random amplitude [0.3..1.7) and delay [70..130)
# microseconds changes to each copy of original signal.
signal.amp([1], [0.7]).dly([100], [30])
name = file_name_head + f"_{i + 1}.wav"
signal.get().write(name)
Detailed documentation for Wavaugmentate can be found here: Documentation on the Read the Docs