AfriHuBERT: A self-supervised speech representation model for African languages

Author：Jesujoba O. Alabi, Xuechen Liu, Dietrich Klakow, Junichi Yamagishi

#音声処理
#低資源言語

Interspeech 2025

In this work, we present AfriHuBERT, an extension of mHuBERT-147, a compact self-supervised learning (SSL) model pretrained on 147 languages. While mHuBERT-147 covered 16 African languages, we expand this to 1,226 through continued pretraining on 10K+ hours of speech data from diverse sources, benefiting an African population of over 600M. We evaluate AfriHuBERT on two key speech tasks, Spoken Language Identification (SLID) and Automatic Speech Recognition (ASR), using the FLEURS benchmark. Our results show a +3.6% F1 score improvement for SLID and a -2.1% average Word Error Rate (WER) reduction for ASR over mHuBERT-147, and demonstrates competitiveness with larger SSL models such as MMS and XEUS. Further analysis shows that ASR models trained on AfriHuBERT exhibit improved cross-corpus generalization and are competitive in extremely low-resource ASR scenarios.

学習済みモデル＠Hugging Face: https://huggingface.co/ajesujoba/AfriHuBERT
学習済みモデル＠Zenodo: https://doi.org/10.5281/zenodo.15531766
ソースコード: https://github.com/nii-yamagishilab/AfriHuBERT

一覧へ戻る