Research

研究プロジェクト・論文・書籍等

Share

  • 論文

A deep auto-encoder based low-dimensional feature extraction from FFT spectral envelopes for statistical parametric speech synthesis

Author:Shinji Takaki, Junichi Yamagishi

  • #音声処理
  • #音声合成

2016 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2016)

In the state-of-the-art statistical parametric speech synthesis system, a speech analysis module, e.g. STRAIGHT spectral analysis, is generally used for obtaining accurate and stable spectral envelopes, and then low-dimensional acoustic features extracted from obtained spectral envelopes are used for training acoustic models. However, a spectral envelope estimation algorithm used in such a speech analysis module includes various processing derived from human knowledge. In this paper, we present our investigation of deep autoencoder based, non-linear, data-driven and unsupervised low-dimensional feature extraction using FFT spectral envelopes for statistical parametric speech synthesis. Experimental results showed that a text-to-speech synthesis system using deep auto-encoder based low-dimensional feature extraction from FFT spectral envelopes is indeed a promising approach.