Research
研究プロジェクト・論文・書籍等
- テクニカルレポート
[国内学会] Do prosodic manual annotations matter for Japanese speech synthesis systems with WaveNet vocoder?
- #音声処理
- #音声合成
電子情報通信学会 音声研究会
We investigated the impact of noisy linguistics features on the performance of a Japanese neural net- work based speech synthesis system using a WaveNet vocoder. This investigation compared the ideal system using manually corrected linguistic features in training and test sets against a few other systems using corrupted linguistic features. Both subjective and objective results demonstrate that corrupted linguistic features, especially those in the test set, affected the system’s performance significantly in a statistical sense due to mismatched conditions between training and test sets. Interestingly, while an utterance-level Turing test shows that listeners had a difficult time to differentiate synthetic speech from natural speech, it further indicates that adding noise to the linguistic features in the training set partially can reduce the mismatched effect, regularize the model and help the system perform better when the test set linguistic features are noisy.