Member
メンバー

研究分野
情報通信 / 知覚情報処理 / Biometrics
情報通信 / 知覚情報処理 / 音声情報処理
研究分野
Junichi Yamagishi received a Ph.D. from the Tokyo Institute of Technology in 2006 for a thesis that pioneered speaker-adaptive speech synthesis. He is a Professor at the National Institute of Informatics, Tokyo, Japan. Previously, he held an EPSRC Career Acceleration Fellowship in the Centre for Speech Technology Research (CSTR) at the University of Edinburgh, U.K., from 2011 to 2016. He has authored and co-authored more than 400 refereed papers in international journals and conferences.
He served as an elected member of the IEEE Speech and Language Technical Committee from 2013 to 2019, as an Associate Editor for the IEEE/ACM Transactions on Audio, Speech, and Language Processing from 2014 to 2017, as a chairperson of ISCA SynSIG from 2017 to 2021, as a Senior Associate Editor for the IEEE/ACM Transaction on Audio, Speech and Language Processing from 2019 to 2023, and as a technical committee of the Asia-Pacific Signal and Information Processing Association (APSIPA) Multimedia Security and Forensics from 2018 to 2023. He has been a member of the IEEE Signal Processing Society Education Board since 2021.
He received the Itakura Prize from the Acoustic Society of Japan, the Kiyasu Special Industrial Achievement Award from the Information Processing Society of Japan, the Young Scientists’ Prize from the Minister of Education, Science and Technology, the JSPS prize from the Japan Society for the Promotion of Science, the best paper award at the IEEE International Workshop on Information Forensics and Security (WIFS), the Docmo mobile science award, and IEEE Biometrics Council’s BTAS/IJCB 5-Year Highest Impact Award in 2010, 2013, 2014, 2016, 2017, 2018, and 2023, respectively.
経歴
- 2006年
- 東京工業大学大学院博士課程修了(工学博士)
- 2006年〜2007年
- 英国エジンバラ大学客員研究員.(日本学術振興会特別研究員-PD)
- 2007年〜2011年
- 同大学Research Fellow
- 2011年〜2013年
- 同大学EPSRC Career Acceleration Fellow
- 2013年〜2019年
- 国立情報学研究所(NII)コンテンツ科学研究系准教授
- 2013年〜2020年
- 英国エジンバラ大学Senior Research Fellow兼任
- 2019年〜現在
- 国立情報学研究所(NII)コンテンツ科学研究系教授
- 2020年
- 英国エジンバラ大学よりHonorary Professorの称号付与
- 2021年〜現在
- 国立情報学研究所(NII)シンセティックメディア国際研究センター 副センター長
受賞歴
- 2007年 手島博士論文賞
- 2010年 日本音響学会板倉記念研究賞
- 2014年 文部科学省大臣表彰 若手科学者賞
- 2016年 日本学術振興会賞
- 2018年 ドコモ・モバイル・サイエンス賞先端技術部門優秀賞
- 2023年 電気通信普及財団賞 テレコム学際研究賞 特例表彰
- 2023年 電子情報通信学会ISS論文賞および電子情報通信学会論文賞
- 2023年および2024年 IEEE Biometrics Council Society BTAS/IJCB 5-Year Highest Impact Award
学会活動
- 2013年〜2019年 IEEE Signal Processing Society (SPS), Speech & Language Technical Committee
- 2013年〜現在 IEEE Senior member
- 2014年〜2017年 IEEE/ACM Transaction on Audio, Speech, and Language Processing, Associate Editor
- 2017年〜2021年 ISCA Special Interest Group: Speech Synthesis (SynSig) Chairperson
- 2018年〜2023年 Asia-Pacific Signal and Information Processing Association (APSIPA), Multimedia Security and Forensics Technical Committee
- 2019年〜現在 情報処理学会シニア会員
- 2019年〜2023年 IEEE/ACM Transaction on Audio, Speech, and Language Processing, Senior Area Editor
- 2021年〜現在 IEEE Signal Processing Society, Education Board Member
- その他、ISCA, 電子情報通信学会, 日本音響学会各会員
最近の研究
- Towards An Integrated Approach for Expressive Piano Performance Synthesis from Music Scores
- Explaining Speaker and Spoof Embeddings via Probing
- Speech Generation for Indigenous Language Education
- Libri2Vox Dataset: Target Speaker Extraction with Diverse Speaker Conditions and Synthetic Data
- It Takes Two: Real-time Co-Speech Two-person’s Interaction Generation via Reactive Auto-regressive Diffusion Model
- The VoiceMOS Challenge 2024: Beyond Speech Quality Prediction
- Improving curriculum learning for target speaker extraction with synthetic speakers
- Disentangling the Prosody and Semantic Information with Pre-trained Model for In-Context Learning based Zero-Shot Voice Conversion
- Spoofing-Aware Speaker Verification Robust Against Domain and Channel Mismatches
- AfriHuBERT: A self-supervised speech representation model for African languages
- A Preliminary Study on Long-Form In-the-Wild Audio Spoofing Detection
- Exploring Active Data Selection Strategies for Continuous Training in Deepfake Detection
- Quantifying Source Speaker Leakage in One-to-One Voice Conversion
- Experimental evaluation of MOS, AB and BWS listening test designs
- Speaker Detection by the Individual Listener and the Crowd: Parametric Models Applicable to Bonafide and Deepfake Speech
- Revisiting and Improving Scoring Fusion for Spoofing-aware Speaker Verification Using Compositional Data Analysis
- An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios
- Generating Speakers by Prompting Listener Impressions for Pre-trained Multi-Speaker Text-to-Speech Systems
- Target Speaker Extraction with Curriculum Learning
- Spoof Diarization: “What Spoofed When” in Partially Spoofed Audio
- To what extent can ASV systems naturally defend against spoofing attacks?
- Exploring Self-Supervised Vision Transformers for Deepfake Detection: A Comparative Analysis
- ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
- DDSP-based Neural Waveform Synthesis of Polyphonic Guitar Performance from String-wise MIDI Input
- ASVspoof 5: Crowdsourced Speech Data, Deepfakes, and Adversarial Attacks at Scale
- The VoicePrivacy 2022 Challenge: Progress and Perspectives in Voice Anonymisation
- A review on subjective and objective evaluation of synthetic speech
- Joint Speaker Encoder and Neural Back-end Model for Fully End-to-End Automatic Speaker Verification with Multiple Enrollment Utterances
- Bridging Textual and Tabular Worlds for Fact Verification: A Lightweight, Attention-Based Model
- Uncertainty as a Predictor: Leveraging Self-Supervised Learning for Zero-Shot MOS Prediction
- Spoofing Attack Augmentation: Can Differently-Trained Attack Models Improve Generalisation?
- SynVox2: Towards a privacy-friendly VoxCeleb2 dataset
- Can large-scale vocoded spoofed data improve speech spoofing countermeasure with a self-supervised front end?
- Analysis of Fine-grained Counting Methods for Masked Face Counting: A Comparative Study
- eKYC-DF: A Large-Scale Deepfake Dataset for Developing and Evaluating eKYC Systems
- Speaker-Text Retrieval via Contrastive Learning
- The VoiceMOS Challenge 2023:Zero-shot Subjective Speech Quality Prediction for Multiple Domains
- Partial Rank Similarity Minimization Method for Quality MOS Prediction of Unseen Speech Synthesis Systems in Zero-Shot and Semi-supervised setting
- Exploring Isolated Musical Notes as Pre-training Data for Predominant Instrument Recognition in Polyphonic Music
- Cyber Vaccine for Deepfake Immunity
- XFEVER: Exploring Fact Verification across Languages
- How Close are Other Computer Vision Tasks to Deepfake Detection?
- Speaker Anonymization using Orthogonal Householder Neural Network
- Towards single integrated spoofing-aware speaker verification embeddings
- Range-Based Equal Error Rate for Spoof Localization
- Controlling Multi-Class Human Vocalization Generation via a Simple Scheme of Segment-based Labeling
- Improving Generalization Ability of Countermeasures for New Mismatch Scenario by Combining Multiple Advanced Regularization Terms
- Investigating Range-Equalizing Bias in Mean Opinion Score Ratings of Synthesized Speech
- BodyFormer: Semantics-guided 3D Body Gesture Synthesis with Transformer
- Revisiting Pathologies of Neural Models under Input Reduction
- Hiding speaker’s sex in speech using zero-evidence speaker representation in an analysis/synthesis pipeline
- Can Knowledge of End-to-End Text-to-Speech Models Improve Neural MIDI-to-Audio Synthesis Systems?
- Spoofed training data for speech spoofing countermeasure can be efficiently created using neural vocoders
- Joint Noise Reduction and Listening Enhancement for Full-End Speech Enhancement
- ASVspoof 2021: Towards Spoofed and Deepfake Speech Detection in the Wild
- Analysis of Master Vein Attacks on Finger Vein Recognition Systems
- Investigating Active-learning-based Training Data Selection for Speech Spoofing Countermeasure
- The PartialSpoof Database and Countermeasures for the Detection of Short Fake Speech Segments Embedded in an Utterance
- Outlier-Aware Training for Improving Group Accuracy Disparities
- Mitigating the Diminishing Effect of Elastic Weight Consolidation
- Spoofing-Aware Attention based ASV Back-end with Multiple Enrollment Utterances and a Sampling Strategy for the SASV Challenge 2022
- Analyzing Language-Independent Speaker Anonymization Framework under Unseen Conditions
- The VoiceMOS Challenge 2022
- DDS: A new device-degraded speech dataset for speech enhancement
- Privacy and utility of x-vector based speaker anonymization
- The VoicePrivacy 2020 Challenge: Results and findings
- Language-Independent Speaker Anonymization Approach using Self-Supervised Pre-Trained Models
- Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation
- Investigating self-supervised front ends for speech spoofing countermeasures
- LDNET: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech
- Estimating the Confidence of Speech Spoofing Countermeasure
- Generalization Ability of MOS Prediction Networks
- On the Interplay Between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis
- Attention Back-end for Automatic Speaker Verification with Multiple Enrollment Utterances
- Master Face Attacks on Face Recognition Systems
- SVSNet: An End-to-end Speaker Voice Similarity Assessment Model
- Use of speaker recognition approaches for learning and evaluating embedding representations of musical instrument sounds
- Effects of Image Processing Operations on Adversarial Noise and Their Use in Detecting and Correcting Adversarial Images
- Optimizing Tandem Speaker Verification and Anti-Spoofing Systems
- Effectiveness of Detection-based and Regression-based Approaches for Estimating Mask-Wearing Ratio
- Revisiting Speech Content PrivacyGenerating Sentiment-Preserving Fake Online Reviews Using Neural Language Models and Their Human- and Machine-based Detection
- Benchmarking and challenges in security and privacy for voice biometrics
- OpenForensics: Large-Scale Challenging Dataset For Multi-Face Forgery Detection And Segmentation In-The-Wild
- Multi-Metric Optimization Using Generative Adversarial Networks for Near-End Speech Intelligibility Enhancement
- ASVspoof 2021: accelerating progress in spoofed and deepfake speech detection
- Multi-Task Learning in Utterance-Level and Segmental-Level Spoof Detection
- An Initial Investigation for Detecting Partially Spoofed Audio
- A Comparative Study on Recent Neural Spoofing Countermeasures for Synthetic Speech Detection
- Preliminary study on using vector quantization latent spaces for consistent performance TTS/VC systems
- How do Voices from Past Speech Synthesis Challenges Compare Today?
- Exploring Disentanglement with Multilingual and Monolingual VQ-VAE
- Text-to-Speech Synthesis Techniques for MIDI-to-Audio Synthesis
- A Multi-Level Attention Model for Evidence-Based Fact Checking
- How Similar or Different Is Rakugo Speech Synthesizer to Professional Performers?
- Learning Disentangled Phone and Speaker Representations in a Semi-Supervised VQ-VAE Paradigm
- End-to-End Text-to-Speech using Latent Duration based on VQ-VAE
- Fashion-Guided Adversarial Attack on Person Segmentation
- Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis
- ASVspoof 2019: spoofing countermeasures for the detection of synthesized, converted and replayed speech
- Generation and Detection of Media Clones
- Preventing Fake Information Generation Against Media Clone Attacks
- Denoising-and-Dereverberation Hierarchical Neural Vocoder for Robust Waveform Generation
- Enhancing Low-Quality Voice Recordings Using Disentangled Channel Factor and Neural Waveform Model
- Color Transfer to Anonymized Gait Images While Maintaining Anonymization
- A Method for Identifying Origin of Digital Images Using a Convolution Neural Network
- ASVspoof 2019: a large-scale public database of synthesized, converted and replayed speech
- Viable Threat on News Reading: Generating Biased News Using Natural Language Models
- An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep Learning
- Security of Facial Forensics Models Against Adversarial Attacks
- Latent linguistic embedding for cross-lingual text-to-speech and voice conversion
- Predictions of Subjective Ratings and Spoofing Assessments of Voice Conversion Challenge 2020 Submissions
- Voice Conversion Challenge 2020: Intra-lingual semi-parallel and cross-lingual voice conversion
- NAUTILUS: a Versatile Voice Cloning System
- The Privacy ZEBRA: Zero Evidence Biometric Recognition Assessment
- Design Choices for X-vector Based Speaker Anonymization
- Improved Prosody from Learned F0 Codebook Representations for VQ-VAE Speech Waveform Reconstruction
- Reverberation Modeling for Source-Filter-based Neural Vocoder
- Introducing the VoicePrivacy Initiative
- Can Speaker Augmentation Improve Multi-Speaker End-to-End TTS?
- Noise Tokens: Learning Neural Noise Templates for Environment-Aware Speech Enhancement
- Using Cyclic Noise as the Source Signal for Neural Source-Filter-based Speech Waveform Model
- iMetricGAN: Intelligibility Enhancement for Speech-in-Noise using Generative Adversarial Network-based Metric Learning
- Generating Master Faces for Use in Performing Wolf Attacks on Face Recognition Systems
- Modeling of Rakugo Speech and Its Limitations: Toward Speech Synthesis That Entertains Audiences
- Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification: Fundamentals
- Effect of choice of probability distribution, randomness, and search methods for alignment modeling in sequence-to-sequence text-to-speech synthesis using hard alignment
- Transferring Neural Speech Waveform Synthesizers to Musical Instrument Sounds Generation
- Zero-Shot Multi-Speaker Text-To-Speech with State-Of-The-Art Neural Speaker Embeddings
- An initial investigation on optimizing tandem speaker verification and countermeasure systems using reinforcement learning
- Generating Sentiment-Preserving Fake Online Reviews Using Neural Language Models and Their Human- and Machine-based Detection
- Bootstrapping Non-Parallel Voice Conversion from Speaker-Adaptive Text-to-Speech
- An RGB Gait Anonymization Model for Low Quality Silhouette
- Neural source-filter waveform models for statistical parametric speech synthesis
- A Vector Quantized Variational Autoencoder (VQ-VAE) Autoregressive Neural F0 Model for Statistical Parametric Speech Synthesis
- Rakugo speech synthesis using segment-to-segment neural transduction and style tokens — toward speech synthesis for entertaining audiences
- Initial investigation of encoder-decoder end-to-end TTS using marginalization of monotonic hard alignments
- Neural Harmonic-plus-Noise Waveform Model with Trainable Maximum Voice Frequency for Text-to-Speech Synthesis
- Speaker Anonymization Using X-vector and Neural Waveform Models
- Multi-task Learning For Detecting and Segmenting Manipulated Facial Images and Videos
- MOSNet: Deep Learning based Objective Assessment for Voice Conversion
- GELP: GAN-Excited Liner Prediction for Speech Synthesis from Mel-spectrogram
- ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection
- Training Multi-Speaker Neural Text-to-Speech Systems using Speaker-Imbalanced Speech Corpora
- Does the Lombard Effect Improve Emotional Communication in Noise? – Analysis of Emotional Speech Acted in Noise –
- Joint training framework for text-to-speech and voice conversion using multi-source Tacotron and WaveNet
- Spatio-Temporal Generative Adversarial Network for Gait Anonymization
- Cycle-consistent adversarial networks for non-parallel vocal effort based speaking style conversion
- Attentive Filtering Networks for Audio Replay Attack Detection
- Audiovisual speaker conversion: jointly and simultaneously transforming facial expression and acoustic characteristics
- Waveform generation for text-to-speech synthesis using pitch-synchronous multi-scale generative adversarial networks
- Investigation of enhanced Tacotron text-to-speech synthesis systems with self-attention for pitch accent language
- Neural source-filter-based waveform model for statistical parametric speech synthesis
- STFT spectral loss for training a neural speech waveform model
- Capsule-Forensics: Using Capsule Networks to Detect Forged Images and Videos
- Complex-Valued Restricted Boltzmann Machine for Speaker-Dependent Speech Parameterization From Complex Spectra
- Scaling and bias codes for modeling speaker-adaptive DNN-based speech synthesis systems
- Identifying Computer-Translated Paragraphs using Coherence Features
- Transforming acoustic characteristics to deceive playback spoofing countermeasures of speaker verification systems
- MesoNet: a Compact Facial Video Forgery Detection Network
- Wasserstein GAN and Waveform Loss-based Acoustic Model Training for Multi-speaker Text-to-Speech Synthesis Systems Using a WaveNet Neural Vocoder
- Unsupervised speaker adaptation for DNN-based speech synthesis using input codes
- A Comparison Between STRAIGHT, Glottal, and Sinusoidal Vocoding in Statistical Parametric Speech Synthesis
- Multimodal speech synthesis architecture for unsupervised speaker adaptation
- Investigating accuracy of pitch-accent annotations in neural network-based speech synthesis and denoising effects
- Integrated Presentation Attack Detection and Automatic Speaker Verification: Common Features and Gaussian Back-end Fusion
- Speaker-independent raw waveform model for glottal excitation
- Expressive Speech Synthesis Using Sentiment Embeddings
- Speech Enhancement of Noisy and Reverberant Speech for Text-to-Speech
- Autoregressive Neural F0 Model for Statistical Parametric Speech Synthesis
- Modular Convolutional Neural Network for Discriminating between Computer-Generated Images and Photographic Images
- Transformation on Computer-Generated Facial Image to Avoid Detection by Spoofing Detector
- t-DCF: a Detection Cost Function for the Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification
- A Spoofing Benchmark for the 2018 Voice Conversion Challenge: Leveraging from Spoofing Countermeasures for Speech Artifact Assessment
- ASVspoof 2017 Version 2.0: meta-data analysis and baseline enhancements
- Can we steal your vocal identity from the Internet?: Initial investigation of cloning Obama’s voice using GAN, WaveNet and low-quality found data
- The Voice Conversion Challenge 2018: Promoting Development of Parallel and Nonparallel Methods
- Investigating different representations for modeling and controlling multiple emotions in DNN-based speech synthesis
- A Comparison of Recent Waveform Generation and Acoustic Modeling Methods for Neural-Network-Based Speech Synthesis
- Speech Waveform Synthesis from MFCC Sequences with Generative Adversarial Networks
- Cyborg Speech: Deep Multilingual Speech Synthesis for Generating Segmental Foreign Accent with Natural Prosody
- High-Quality Nonparallel Voice Conversion Based on Cycle-Consistent Adversarial Network
- Investigating very deep highway networks for parametric speech synthesis
- Identifying Computer-Generated Text Using Statistical Analysis
- Distinguishing Computer Graphics from Natural Images Using Convolution Neural Networks
- An Approach for Gait Anonymization Using Deep Learning
- Influence of speaker familiarity on blind and visually impaired children’s and young adults’ perception of synthetic voices
- Investigating different representations for modeling multiple emotions in DNN-based speech synthesis
- Learning word vector representations based on acoustic counts
- The ASVspoof 2017 challenge: Assessing the limits of replay spoofing attack detection
- Reducing mismatch in training of DNN-based glottal excitation models in a statistical parametric text-to-speech system
- Complex-valued restricted Boltzmann machine for direct learning of frequency spectra
- Misperceptions of the emotional content of natural and vocoded speech in a car
- Direct modeling of frequency spectra and waveform generation based on phase recovery for DNN-based speech synthesis
- An RNN-based quantized f0 model with multi-tier feedback links for text-to-speech synthesis
- Principles for learning controllable TTS from annotated and latent variation
- Speech intelligibility in cars: The effect of speaking style, noise and listener age
- A simple RNN-plus-highway network for statistical parametric speech synthesis
- Introduction to the Issue on Spoofing and Countermeasures for Automatic Speaker Verification
- ASVspoof: The Automatic Speaker Verification Spoofing and Countermeasures Challenge
- An autoregressive recurrent mixture density network for parametric speech synthesis
- Non-parallel voice conversion using i-vector PLDA: towards unifying speaker verification and transformation
- Adapting and controlling DNN-based speech synthesis using input codes
- Continuous Expressive Speaking Styles Synthesis based on CVSM and MR-HMM
- Investigation of Using Continuous Representation of Various Linguistic Units in Neural Network Based Text-to-Speech Synthesis
- The NII speech synthesis entry for Blizzard Challenge 2016
- Multidimensional scaling of systems in the Voice Conversion Challenge 2016
- Investigating Very Deep Highway Networks for Parametric Speech Synthesis
- A Comparative Study of the Performance of HMM, DNN, and RNN based Speech Synthesis Systems Trained on Very Large Speaker-Dependent Corpora
- Speaker Adaptation of Various Components in Deep Neural Network based Speech Synthesis
- Parallel and cascaded deep neural networks for text-to-speech synthesis
- Development and evaluation of a statistical parametric synthesis system for operatic singing in German
- Investigating RNN-based speech enhancement methods for noise-robust Text-to-Speech
- Analysis of the Voice Conversion Challenge 2016 Evaluation Results
- The Voice Conversion Challenge 2016
- Syllable-level representations of suprasegmental features for DNN-based text-to-speech synthesis
- The SIWIS database: a multilingual speech database with acted emphasis
- A Hierarchical Predictor of Synthetic Speech Naturalness Using Neural Networks
- Majorisation-minimisation based optimisation of the composite autoregressive system with application to glottal inverse filtering
- Using text and acoustic features in predicting glottal excitation waveforms for parametric speech synthesis with recurrent neural networks
- Enhance the word vector with prosodic information for the recurrent neural network based TTS system
- Applying Spectral Normalisation and Efficient Envelope Estimation and Statistical Transformation for the Voice Conversion Challenge 2016
- Speech Enhancement for a Noise-Robust Text-to-Speech Synthesis System using Deep Recurrent Neural Networks
- Voice Liveness Detection for Speaker Verification based on a Tandem Single/Double-channel Pop Noise Detector
- Privacy-preserving sound to degrade automatic speaker verification performance
- A deep auto-encoder based low-dimensional feature extraction from FFT spectral envelopes for statistical parametric speech synthesis
- Initial investigation of speech synthesis based on complex-valued neural networks
- Testing the consistency assumption: Pronunciation variant forced alignment in read and spontaneous speech synthesis
- Wavelet-based decomposition of F0 as a secondary task for DNN-based speech synthesis with multi-task learning
- Deep neural network-guided unit selection synthesis
- Anti-Spoofing for Text-Independent Speaker Verification: An Initial Database, Comparison of Countermeasures, and Human Performance
- ALISA: An automatic lightly supervised speech segmentation and alignment tool
- Intelligibility of time-compressed synthetic speech: Compression method and speaking style
- The use of articulatory movement data in speech synthesis applications: an overview –Application of articulatory movements using machine learning algorithms–
- A Deep Generative Architecture for Postfiltering in Statistical Parametric Speech Synthesis
- Emotion transplantation through adaptation in HMM-based speech synthesis
- A perceptual investigation of wavelet-based decomposition of f0 for text-to-speech synthesis
- Reconstructing Voices within the Multiple-Average-Voice-Model Framework
- Influence of speaker familiarity on blind and visually impaired children’s perception of synthetic voices in audio games
- Deep neural network context embeddings for model selection in rich-context HMM synthesis
- Multiple Feed-forward Deep Neural Networks for Statistical Parametric Speech Synthesis
- Fusion of multiple parameterisations for DNN-based sinusoidal speech synthesis with multi-task learning
- Voice liveness detection algorithms based on pop noise caused by human breath for automatic speaker verification
- ASVspoof 2015: the First Automatic Speaker Verification Spoofing and Countermeasures Challenge
- Human vs Machine Spoofing Detection on Wideband and Narrowband Data
- Constructing a Deep Neural Network Based Spectral Model for Statistical Speech Synthesis
- SAS: A speaker verification spoofing database containing diverse attacks
- Methods for applying dynamic sinusoidal models to statistical parametric speech synthesis