Wavelet-based decomposition of F0 as a secondary task for DNN-based speech synthesis with multi-task learning

Author：Manuel Sam Ribeiro, Oliver Watts, Junichi Yamagishi, Robert A. J. Clark

#音声処理
#音声合成

2016 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2016)

We investigate two wavelet-based decomposition strategies of the f0 signal and their usefulness as a secondary task for speech synthesis using multi-task deep neural networks (MTL-DNN). The first decomposition strategy uses a static set of scales for all utterances in the training data. We propose a second strategy, where the scale of the mother wavelet is dynamically adjusted to the rate of each utterance. This approach is able to capture f0 variations related to the syllable, word, clitic-group, and phrase units. This method also constrains the wavelet components to be within the frequency range that previous experiments have shown to be more natural. These two strategies are evaluated as a secondary task in multi-task deep neural networks (MTL-DNNs). Results indicate that on an expressive dataset there is a strong preference for the systems using multi-task learning when compared to the baseline system.

一覧へ戻る