Hot topics in speech synthesis evaluation

Author：Gérard Bailly, Elisabeth André, Erica Cooper, Esther Klabbers, Benjamin Cowan, Jens Edlund, Naomi Harte, Simon King, Sébastien Le Maguer, Roger K. Moore, Bernd Möbius, Sebastian Möller, Ayushi Pandey, Olivier Perrotin, Fritz Seebauer, Sofia Strömbergsson, David R. Traum, Christina Tånnander, Petra Wagner, Junichi Yamagishi, Yusuke Yasuda

#音声処理
#音声合成
#品質評価

13th edition of the Speech Synthesis Workshop

Speech synthesis is advancing rapidly, often reaching levels that challenge the distinction between synthetic and human speech. Its capabilities are increasingly diverse, and it can no longer be treated as a one-size-fits-all technology. Consequently, one-size-fits-all evaluations based solely on Mean Opinion Score (MOS) fail to reflect the specific requirements, conditions, and success criteria across the wide and ever-evolving landscape of applications and usage contexts. As evaluation necessarily becomes more task-oriented, determining what to evaluate and how to evaluate it must be an integral part of the evaluation process. In this overview of the current speech synthesis evaluation landscape, we begin by revisiting a range of existing evaluation methodologies that are often overlooked in favour of MOS, despite offering valuable insights for specific tasks. We then highlight a set of emerging “hot topics” in speech synthesis, examining their unique demands and proposing directions for their evaluation. The hot topics are structured around specific use cases, which serve as examples to highlight the speech synthesis capabilities that are critical in each context. The use case framing also facilitates a dual perspective, capturing both the evaluation of speech synthesis as an integrated part of an application and the assessment of its standalone capabilities.

一覧へ戻る