Audio samples from "Generative Adversarial Network based Speaker Adaptation for High Fidelity WaveNet Vocoder"

Paper: arXiv

Authors: Qiao Tian, Xucheng Wan, Shan Liu.

Abstract: Although state-of-the-art parallel WaveNet has addressed the issue of real-time waveform generation, there remains problems. Firstly, due to the noisy input signal of the model, there is still a gap between the quality of generated and natural waveforms. Secondly, a parallel WaveNet is trained under a distillation framework, which makes it tedious to adapt a well trained model to a new speaker. To address these two problems, in this paper we propose an end-to-end adaptation method based on the generative adversarial network (GAN), which can reduce the computational cost for the training of new speaker adaptation. Our subjective experiments shows that the proposed training method can further reduce the quality gap between generated and natural waveforms.

Experiment: Adaptation on ground-truth mel-spectrogram

Ground-truthAGAN(λ = 0.05)AGAN(λ = 1.5)Parallel WaveNet
1: For while Roman Catholics and Dissenters were encouraged to see ministers of their own persuasion,
2: Pieman and ballad-monger did their usual roaring trade amidst the dense throng.
3: He was lying on his face, his legs tied up to his hips so as to allow of the body fitting into the hole.
4: Brennan saw a man in the window who closely resembled Lee Harvey Oswald, and that Brennan believes the man he saw was in fact.
5: Such ideas of grandeur were apparently accompanied by notions of oppression.
6: This assignment was given to Agent James P. Hosty, Jr. of the Dallas office upon Fain's retirement.

Extension experiment: Re-train on predicted mel-spectrogram

(Completed in our internal dataset which is not clean, all mel-spectrograms was predicted by a tacotron-like model.)

Parallel WaveNetAGAN
1: I never saw anything like her in my life.
2: 你可以教我怎么在YouTube上下载视频吗?
3: 话说在越来越重视社交媒体的当下,instagram的粉丝数量, 是衡量一个明星影响力的关键。
4: 11月2号晴,22到29度;11月3号小雨转阴,22到30度;