Oytun Turk and Levent M.Arslan Subband Based Voice Conversion SESTEK Inc., R&D Dept. Istanbul, Turkey Bogazici University, Electrical-Electronics Eng. Dept. Istanbul, Turkey
Overview Definitions Applications Fullband Approach Subband Approach Evaluations Demonstration
Original Looping Sicilian Code :
What Is Voice Conversion (VC)?
Applications of VC 1.Film Industry 2.TTS : Adaptive systems enabling TTS with any user’s voice 3.Healthcare/Voice Disorders 4.Speech Recognition, Speaker Identification and Verification 5.Multimedia
Fullband Approach (STASC) Method : S peaker T ransformation A lgorithm Using S egmental C odebooks Steps : 1. Same utterances from source & target speakers recorded 2. Sentence HMM based alignment 3. Codebook generation 4. Transformation
Subband Approach (1) Subband decomposition using Discrete Wavelet Transform(DWT)
Subband Approach (2) Advantages of DWT: 1.Perfect reconstruction with orthonormal filters 2.FIR filters 3.Computational efficiency
Subband Training 1.Subband decomposition of source and target utterances 2.fs = Hz 4 subbands 3.Alignment using Sentence HMMs 4.Generation of subband codebooks 5.Satisfactory alignment performance with lower subbands 6.Training takes much shorter time
Subband Transformation (1) 1.Subband decomposition of input utterance(s) from source speaker 2.fs = Hz 4 subbands 3.Only first subband converted 4.5.5Khz-22.05KHz bandpass filtered 5.FD-PSOLA applied to whole spectrum
Subband Transformation (2)
Evaluations (1) ABX Listening Test : 1.5 female (F) and 5 male(M) speakers as source and target 2.M F, F M, M M, F F conversions 3.20 subjects 4.(A) and (B) : fullband/subband output 5.(X) : target recording 6.Subband output is preferred by 92.1%.
Evaluations (2) Perceptual Experiments: 1.Assessment of frequency bands for perception of speaker identity KHz-1.8 KHz range is the dominant region
Evaluations (3) Advantages : 1.Solution to root finding problems for LSFs 2.Distortion at non-speech regions prevented 3.Faster training 4.Faster codebook search & transformation
Voice Conversion System (VCS) 1.A software tool for voice conversion incorporating: - the voice conversion algorithm - tools for pre- and post-processing,recording, analysis and testing 2. VOX is a VCS developed by SESTEK Inc.
Demonstration Fullband : Subband : (1) (2)
Future Work 1.Modifications related to experimental results 2.Better prosody conversion 3.Modifications related to TTS applications