[国内学会] 複数のFeed-Forward Deep Neural Networkに基づく統計的パラメトリック音声合成

Author：高木信二, SangJin Kim, 山岸順一, JongJin Kim

#音声処理
#音声合成

電子情報通信学会音声研究会

近年，Deep Neural Network (DNN)を用いた手法が様々な分野で高い性能を示しており，盛んに研究されている．統計的パラメトリック音声合成においてもDNNを用いた手法が注目を集め，例えば，スペクトルからの特徴量抽出，音響モデリング，ポストフィルタなどの性能改善が報告されている．本論文では3種類のfeed-forward DNNを用い，高性能な統計的パラメトリック音声合成システムの構築を検討する．提案システムでは統計的パラメトリック音声合成システムにおいて標準的に行われる音響特徴量抽出，音響モデリング，ポストフィルタ，生成されたパラメータのスムージングの処理が全てDNNにより実現される．テキスト音声合成実験において，提案システムとHMM音声合成システム，従来のDNN音声合成システム，素片選択音声合成システムとの比較を行い，評価を行った

In this paper, we investigate a combination of several feed-forward deep neural networks (DNNs) for a high-quality statistical parametric speech synthesis system. Recently, DNNs have significantly improved the performance of essential components in the statistical parametric speech synthesis, e.g. spectral feature extraction, acoustic modeling and spectral post-filter. In this paper our proposed technique combines these feed-forward DNNs so that the DNNs can perform all standard steps of the statistical speech synthesis from end to end, including the feature extraction from STRAIGHT spectral amplitudes, acoustic modeling, smooth trajectory generation and spectral post-filter. The proposed DNN-based speech synthesis system is then compared to the state-of-the-art speech synthesis systems,
i.e. conventional HMM-based, DNN-based and unit selection ones.

一覧へ戻る