Non-parallel voice conversion using i-vector PLDA: towards unifying speaker verification and transformation

Author：Tomi Kinnunen, Lauri Juvela, Paavo Alku, Junichi Yamagishi

#音声処理
#音声合成

2017 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2017)

Text-independent speaker verification (recognizing speakers regardless of content) and non-parallel voice conversion (transforming voice identities without requiring content-matched training utterances) are related problems. We adopt i-vector method to voice conversion. An i-vector is a fixed-dimensional representation of a speech utterance that enables treating voice conversion in utterance domain, as opposed to frame domain. The high dimensionality (800) and small number of training utterances (24) necessitates using prior information of speakers. We adopt probabilistic linear discriminant analysis (PLDA) for voice conversion. The proposed approach requires neither parallel utterances, transcriptions nor time alignment procedures at any stage.

一覧へ戻る