Presentation is loading. Please wait.

Presentation is loading. Please wait.

Oytun Turk and Levent M.Arslan Subband Based Voice Conversion SESTEK Inc., R&D Dept. Istanbul, Turkey Bogazici University, Electrical-Electronics Eng.

Similar presentations


Presentation on theme: "Oytun Turk and Levent M.Arslan Subband Based Voice Conversion SESTEK Inc., R&D Dept. Istanbul, Turkey Bogazici University, Electrical-Electronics Eng."— Presentation transcript:

1 Oytun Turk and Levent M.Arslan Subband Based Voice Conversion SESTEK Inc., R&D Dept. Istanbul, Turkey Bogazici University, Electrical-Electronics Eng. Dept. Istanbul, Turkey

2 Overview Definitions Applications Fullband Approach Subband Approach Evaluations Demonstration

3 Original Looping Sicilian Code :

4 What Is Voice Conversion (VC)?

5 Applications of VC 1.Film Industry 2.TTS : Adaptive systems enabling TTS with any user’s voice 3.Healthcare/Voice Disorders 4.Speech Recognition, Speaker Identification and Verification 5.Multimedia

6 Fullband Approach (STASC) Method : S peaker T ransformation A lgorithm Using S egmental C odebooks Steps : 1. Same utterances from source & target speakers recorded 2. Sentence HMM based alignment 3. Codebook generation 4. Transformation

7 Subband Approach (1) Subband decomposition using Discrete Wavelet Transform(DWT)

8 Subband Approach (2) Advantages of DWT: 1.Perfect reconstruction with orthonormal filters 2.FIR filters 3.Computational efficiency

9 Subband Training 1.Subband decomposition of source and target utterances 2.fs = 44100 Hz  4 subbands 3.Alignment using Sentence HMMs 4.Generation of subband codebooks 5.Satisfactory alignment performance with lower subbands 6.Training takes much shorter time

10 Subband Transformation (1) 1.Subband decomposition of input utterance(s) from source speaker 2.fs = 44100 Hz  4 subbands 3.Only first subband converted 4.5.5Khz-22.05KHz bandpass filtered 5.FD-PSOLA applied to whole spectrum

11 Subband Transformation (2)

12 Evaluations (1) ABX Listening Test : 1.5 female (F) and 5 male(M) speakers as source and target 2.M  F, F  M, M  M, F  F conversions 3.20 subjects 4.(A) and (B) : fullband/subband output 5.(X) : target recording 6.Subband output is preferred by 92.1%.

13 Evaluations (2) Perceptual Experiments: 1.Assessment of frequency bands for perception of speaker identity 2.1.0 KHz-1.8 KHz range is the dominant region

14 Evaluations (3) Advantages : 1.Solution to root finding problems for LSFs 2.Distortion at non-speech regions prevented 3.Faster training 4.Faster codebook search & transformation

15 Voice Conversion System (VCS) 1.A software tool for voice conversion incorporating: - the voice conversion algorithm - tools for pre- and post-processing,recording, analysis and testing 2. VOX is a VCS developed by SESTEK Inc.

16 Demonstration Fullband : Subband : (1) (2)

17 Future Work 1.Modifications related to experimental results 2.Better prosody conversion 3.Modifications related to TTS applications


Download ppt "Oytun Turk and Levent M.Arslan Subband Based Voice Conversion SESTEK Inc., R&D Dept. Istanbul, Turkey Bogazici University, Electrical-Electronics Eng."

Similar presentations


Ads by Google