Research

研究プロジェクト・論文・書籍等

Share

  • 論文

Experimental evaluation of MOS, AB and BWS listening test designs

Author:Dan Wells, Andrea Aldana, Cassia Valentini, Erica Cooper, Aidan Pine, Junichi Yamagishi, Korin Richmond

  • #音声処理
  • #品質評価

Interspeech 2024

Mean Opinion Score (MOS) tests are the most widely used test type for subjective evaluation of speech samples. However, their use has been questioned, as results can vary significantly depending on the test material included. Forced-choice tests such as AB or Best Worst Scaling (BWS) can in principle mitigate some of these issues. Our aim here is to compare MOS, AB and BWS tests in 3 regards: 1) Which test type do listeners prefer in terms of ease, engagement and overall likeability? 2) How fast are listeners at each test type? 3) Does each test type provide the same pattern of results? To answer these questions we re-use a subset of stimuli from the Blizzard Challenge 2013 and conduct new MOS, AB and BWS tests. Overall, we conclude each test type is broadly equally valid, MOS may not in fact be the fastest or easiest test type for listeners, but the theoretical advantages of BWS are counterbalanced by it seeming less liked by our listeners here.