[Keynote Talk] The use of speaker embeddings in neural audio generation

#生成モデル
#話者ベクトル

講演者：Junichi Yamagishi
会議名：The VoxSRC Workshop 2022
主催者：Interspeech 2022
開催地：Incheon, 韓国
開催日：2022年9月22日
URL：https://www.robots.ox.ac.uk/~vgg/data/voxceleb/interspeech2022.html
動画：http://mm.kaist.ac.kr/datasets/voxceleb/voxsrc/interspeech2022.html

Neural speaker embedding vectors are becoming an essential technology not only in speaker recognition but also in speech synthesis. In this talk, I will first outline how speaker embedding vectors are used in voice conversion, where one speaker’s voice is converted to another speaker’s voice, and in multi-speaker TTS systems, where multiple speakers’ natural-sounding voices can be synthesized from input sentences by a single model. Then I will explain how the performance of speaker vectors in the speaker recognition task is related to the speaker similarity of the synthesized voices. The latest performance of voice conversion systems will also be presented based on the results of the Voice Conversion Challenge 2020.

I will then introduce “speaker anonymization” as a new example of the use of speaker embeddings in the field of speech privacy. Speaker anonymization aims to convert only the speaker characteristics of the input speech so that the ASV does not identify the original speaker while preserving the usefulness of the anonymized audio in the downstream tasks the user wishes to perform. As an example of such speaker anonymization using speaker embedding vectors, we present a language-independent speaker anonymization system using ECAPA-TDNN, HuBERT, and HiFi-GAN and show its excellent evaluation results using the VoicePrivacy challenge metrics.

一覧へ戻る