The VoiceMOS Challenge 2023:Zero-shot Subjective Speech Quality Prediction for Multiple Domains

Author：Erica Cooper, Wen-Chin Huang Yu Tsao, Hsin-Min Wang, Tomoki Toda, Junichi Yamagishi

#音声処理
#品質評価

2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

We present the second edition of the VoiceMOS Challenge, a scientific event that aims to promote the study of automatic prediction of the mean opinion score (MOS) of synthesized and processed speech. This year, we emphasize real-world and challenging zero-shot out-of-domain MOS prediction with three tracks for three different voice evaluation scenarios. Ten teams from industry and academia in seven different countries participated. Surprisingly, we found that the two sub-tracks of French text-to-speech synthesis had large differences in their predictability, and that singing voice-converted samples were not as difficult to predict as we had expected. Use of diverse datasets and listener information during training appeared to be successful approaches.

Poster

Video

一覧へ戻る