[国内学会] Generating Segment-Level Foreign-Accented Synthetic Speech with Natural Speech Prosody

Author：Gustav Eje Henter, Jaime Lorenzo-Trueba, Xin Wang, Mariko Kondo, Junich, Yamagishi

#音声処理
#音声合成

情報処理学会第118回音楽情報科学・第120回音声言語情報処理合同研究発表会

We present a new application of deep-learning-based TTS, namely multilingual speech synthesis for generating controllable foreign accent. We train an acoustic model on non-accented multilingual speech recordings from the same speaker and interpolate quinphone linguistic features between languages to generate microscopic foreign accent. By copying pitch and durations from a pre-recorded utterance of the desired prompt, natural prosody is achieved. We call this paradigm “cyborg speech” as it combines human and machine speech parameters. Experiments on synthetic American-English-accented Japanese confirm the success of the approach.

一覧へ戻る