Research

研究プロジェクト・論文・書籍等

Share

  • 論文

Speaker Detection by the Individual Listener and the Crowd: Parametric Models Applicable to Bonafide and Deepfake Speech

Author:Tomi Kinnunen, Rosa Gonzalez Hautamäki, Xin Wang, Junichi Yamagishi

  • #音声処理
  • #品質評価

Interspeech 2024

Subjective speaker detection, whether for bonafide (real) or spoofed (fake) speech, is often implemented through crowdsourcing to facilitate comparison of systems, with less attention paid to the source of the ratings–the listener. We characterize speaker detection both at the level of listener and the crowd. Each listener possesses certain sensitivity and bias for observing speaker differences. By combining detection model with random between-listener effects, we obtain a generalized linear mixed effects (GLME) model, demonstrated here for two different tasks. The first one involves bonafide data from VoxCeleb1 under a biased set-up containing varied role-play instructions; the second one, focused on spoofing, presents re-analysis of the ASVspoof 2019 subjective data. Our GLME enables sampling listeners and obtaining parametric detection error trade-off (DET) profiles and equal error rates (EERs).