Towards a Transparent and Interpretable Strategy for Spoofed Speech Detection

Author：Carolina Lins Machado, Xin Wang, and Junichi Yamagishi

#音声処理
#ディープフェイク検知

The 33rd Annual Conference of the International Association for Forensic Phonetics and Acoustics (IAFPA)

The quick spread of artificially-generated (henceforth spoofed) speech applied for malicious purposes poses unprecedented challenges for forensic investigators and legal systems (Gambin et al., 2024; Verdoliva, 2020). The “black box” character of many systems used in the detection of spoof speech poses a problem in forensic contexts where the interpretability of the conclusions drawn by such detection methods are crucial. (Mitchell, 2010; Hall et al., 2022). Therefore, in order to ensure a fair justice outcome, emerging regulations governing the use of complex systems in the detection of artificially-generated media require that the decisions made by these system be understandable and justified to all parties involved in the process (Hall et al., 2022; Siegel et al., 2024). This suggests that interpretability and transparency are necessary for a method to be forensically valid. This work aims to address this need by examining how acoustic-phonetic features and explainable machine learning approaches may provide clarity on the process of spoofed speech detection. Moreover, this exploratory study attempts to (i) understand how acoustic-phonetic features perform in various spoofing types and (ii) provide a baseline against which future state-of-the-art attacks can be compared by displaying how these features and their discriminative performance change in various voice spoofing attack types.

一覧へ戻る