Audio samples for the paper: Voice Conversion Across Arbitrary Speakers based on a Single Target-Speaker Utterance.
The models are trained using VCTK corpus. Audios are resampled to 16kHz.
In the sequel, "1 sample" means that i-vector or speaker embedding of a new target speaker are computed using one target speaker's utterance and so on so forth. "IVC Converted" denotes the converted speech using the i-vector-based VC model (please refer to the paper for the details), while "SEVC converted" denotes the converted speech using the speaker-encoder-based VC model.