Member
メンバー

研究分野
Xin Wang is a Project Associate Professor at the National Institute of Informatics and a PRESTO researcher at the Japan Science and Technology Agency. He was an organizing team member of the past ASVspoof Challenges since 2019 and the VoicePrivacy initiatives in 2020 and 2022. His research focuses on speech synthesis, anti-spoofing, and other speech security- and private-related tasks. He is a Member of IEEE, ISCA, and ASJ. He was a guest editor of the Computer Speech & Language Special Issue on Advances in Automatic Speaker Verification Anti-spoofing. He was on the appointed team of the ISCA Special Interest Group on Security and Privacy in Speech Communication from 2022 to 2024.
経歴
- 2015年〜2018年
- 総合研究大学院大学 博士課程
- 2018年〜2019年
- 国立情報学研究所 特任研究員
- 2019年〜2023年
- 国立情報学研究所 プロジェクト特任助教
- 2023年〜現在
- 国立情報学研究所 プロジェクト特任准教授
- 2023年〜現在
- 科学技術振興機構さきがけ研究員
受賞歴
- 2017年 IEEE Signal Processing Society Japan, 11th IEEE Signal Processing Society Japan Student Best Paper Award
- 2018年 SOKENDAI, SOKENDAI Award
- 2019年 SSW Best Paper Award (SSW9 2016, test-of-time award)
学会活動
- 2022 – 2024 ISCA Special Interest Group: Security and Privacy in Speech Communication, appointed team member
- その他、IEEE, ISCA, 日本音響学会各会員
最近の研究
- Towards An Integrated Approach for Expressive Piano Performance Synthesis from Music Scores
- Spoofing-Aware Speaker Verification Robust Against Domain and Channel Mismatches
- A Preliminary Study on Long-Form In-the-Wild Audio Spoofing Detection
- Exploring Active Data Selection Strategies for Continuous Training in Deepfake Detection
- Speaker Detection by the Individual Listener and the Crowd: Parametric Models Applicable to Bonafide and Deepfake Speech
- Revisiting and Improving Scoring Fusion for Spoofing-aware Speaker Verification Using Compositional Data Analysis
- An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios
- Spoof Diarization: “What Spoofed When” in Partially Spoofed Audio
- To what extent can ASV systems naturally defend against spoofing attacks?
- ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
- DDSP-based Neural Waveform Synthesis of Polyphonic Guitar Performance from String-wise MIDI Input
- ASVspoof 5: Crowdsourced Speech Data, Deepfakes, and Adversarial Attacks at Scale
- The VoicePrivacy 2022 Challenge: Progress and Perspectives in Voice Anonymisation
- Joint Speaker Encoder and Neural Back-end Model for Fully End-to-End Automatic Speaker Verification with Multiple Enrollment Utterances
- Spoofing Attack Augmentation: Can Differently-Trained Attack Models Improve Generalisation?
- SynVox2: Towards a privacy-friendly VoxCeleb2 dataset
- Can large-scale vocoded spoofed data improve speech spoofing countermeasure with a self-supervised front end?
- Collaborative Watermarking for Adversarial Speech Synthesis
- Speaker-Text Retrieval via Contrastive Learning
- Speaker Anonymization using Orthogonal Householder Neural Network
- Towards single integrated spoofing-aware speaker verification embeddings
- Range-Based Equal Error Rate for Spoof Localization
- Improving Generalization Ability of Countermeasures for New Mismatch Scenario by Combining Multiple Advanced Regularization Terms
- Hiding speaker’s sex in speech using zero-evidence speaker representation in an analysis/synthesis pipeline
- Can Knowledge of End-to-End Text-to-Speech Models Improve Neural MIDI-to-Audio Synthesis Systems?
- Spoofed training data for speech spoofing countermeasure can be efficiently created using neural vocoders
- ASVspoof 2021: Towards Spoofed and Deepfake Speech Detection in the Wild
- Investigating Active-learning-based Training Data Selection for Speech Spoofing Countermeasure
- The PartialSpoof Database and Countermeasures for the Detection of Short Fake Speech Segments Embedded in an Utterance
- Analyzing Language-Independent Speaker Anonymization Framework under Unseen Conditions
- Privacy and utility of x-vector based speaker anonymization
- The VoicePrivacy 2020 Challenge: Results and findings
- Language-Independent Speaker Anonymization Approach using Self-Supervised Pre-Trained Models
- Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation
- Investigating self-supervised front ends for speech spoofing countermeasures
- Estimating the Confidence of Speech Spoofing Countermeasure
- Attention Back-end for Automatic Speaker Verification with Multiple Enrollment Utterances
- Benchmarking and challenges in security and privacy for voice biometrics
- ASVspoof 2021: accelerating progress in spoofed and deepfake speech detection
- Multi-Task Learning in Utterance-Level and Segmental-Level Spoof Detection
- An Initial Investigation for Detecting Partially Spoofed Audio
- A Comparative Study on Recent Neural Spoofing Countermeasures for Synthetic Speech Detection
- Text-to-Speech Synthesis Techniques for MIDI-to-Audio Synthesis
- A Multi-Level Attention Model for Evidence-Based Fact Checking
- How Similar or Different Is Rakugo Speech Synthesizer to Professional Performers?
- End-to-End Text-to-Speech using Latent Duration based on VQ-VAE
- Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis
- ASVspoof 2019: spoofing countermeasures for the detection of synthesized, converted and replayed speech
- Denoising-and-Dereverberation Hierarchical Neural Vocoder for Robust Waveform Generation
- ASVspoof 2019: a large-scale public database of synthesized, converted and replayed speech
- Design Choices for X-vector Based Speaker Anonymization
- Reverberation Modeling for Source-Filter-based Neural Vocoder
- Introducing the VoicePrivacy Initiative
- Using Cyclic Noise as the Source Signal for Neural Source-Filter-based Speech Waveform Model
- Modeling of Rakugo Speech and Its Limitations: Toward Speech Synthesis That Entertains Audiences
- Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification: Fundamentals
- Effect of choice of probability distribution, randomness, and search methods for alignment modeling in sequence-to-sequence text-to-speech synthesis using hard alignment
- Transferring Neural Speech Waveform Synthesizers to Musical Instrument Sounds Generation
- Zero-Shot Multi-Speaker Text-To-Speech with State-Of-The-Art Neural Speaker Embeddings
- Neural source-filter waveform models for statistical parametric speech synthesis
- A Vector Quantized Variational Autoencoder (VQ-VAE) Autoregressive Neural F0 Model for Statistical Parametric Speech Synthesis
- Rakugo speech synthesis using segment-to-segment neural transduction and style tokens — toward speech synthesis for entertaining audiences
- Initial investigation of encoder-decoder end-to-end TTS using marginalization of monotonic hard alignments
- Neural Harmonic-plus-Noise Waveform Model with Trainable Maximum Voice Frequency for Text-to-Speech Synthesis
- Speaker Anonymization Using X-vector and Neural Waveform Models
- ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection
- Training Multi-Speaker Neural Text-to-Speech Systems using Speaker-Imbalanced Speech Corpora
- Joint training framework for text-to-speech and voice conversion using multi-source Tacotron and WaveNet
- Audiovisual speaker conversion: jointly and simultaneously transforming facial expression and acoustic characteristics
- Investigation of enhanced Tacotron text-to-speech synthesis systems with self-attention for pitch accent language
- Neural source-filter-based waveform model for statistical parametric speech synthesis
- STFT spectral loss for training a neural speech waveform model
- Investigating accuracy of pitch-accent annotations in neural network-based speech synthesis and denoising effects
- Autoregressive Neural F0 Model for Statistical Parametric Speech Synthesis
- Can we steal your vocal identity from the Internet?: Initial investigation of cloning Obama’s voice using GAN, WaveNet and low-quality found data
- A Comparison of Recent Waveform Generation and Acoustic Modeling Methods for Neural-Network-Based Speech Synthesis
- Speech Waveform Synthesis from MFCC Sequences with Generative Adversarial Networks
- Cyborg Speech: Deep Multilingual Speech Synthesis for Generating Segmental Foreign Accent with Natural Prosody
- Investigating very deep highway networks for parametric speech synthesis
- An RNN-based quantized f0 model with multi-tier feedback links for text-to-speech synthesis
- A simple RNN-plus-highway network for statistical parametric speech synthesis
- An autoregressive recurrent mixture density network for parametric speech synthesis
- Investigation of Using Continuous Representation of Various Linguistic Units in Neural Network Based Text-to-Speech Synthesis
- The NII speech synthesis entry for Blizzard Challenge 2016
- Investigating Very Deep Highway Networks for Parametric Speech Synthesis
- A Comparative Study of the Performance of HMM, DNN, and RNN based Speech Synthesis Systems Trained on Very Large Speaker-Dependent Corpora
- Investigating RNN-based speech enhancement methods for noise-robust Text-to-Speech
- Using text and acoustic features in predicting glottal excitation waveforms for parametric speech synthesis with recurrent neural networks
- Enhance the word vector with prosodic information for the recurrent neural network based TTS system
- Speech Enhancement for a Noise-Robust Text-to-Speech Synthesis System using Deep Recurrent Neural Networks