Oytun Turk and Levent M.Arslan Subband Based Voice Conversion SESTEK Inc., R&D Dept. Istanbul, Turkey Bogazici University, Electrical-Electronics Eng.

Oytun Turk and Levent M.Arslan Subband Based Voice Conversion SESTEK Inc., R&D Dept. Istanbul, Turkey Bogazici University, Electrical-Electronics Eng. Dept. Istanbul, Turkey

Overview Definitions Applications Fullband Approach Subband Approach Evaluations Demonstration

Original Looping Sicilian Code :

What Is Voice Conversion (VC)?

Applications of VC 1.Film Industry 2.TTS : Adaptive systems enabling TTS with any user’s voice 3.Healthcare/Voice Disorders 4.Speech Recognition, Speaker Identification and Verification 5.Multimedia

Fullband Approach (STASC) Method : S peaker T ransformation A lgorithm Using S egmental C odebooks Steps : 1. Same utterances from source & target speakers recorded 2. Sentence HMM based alignment 3. Codebook generation 4. Transformation

Subband Approach (1) Subband decomposition using Discrete Wavelet Transform(DWT)

Subband Approach (2) Advantages of DWT: 1.Perfect reconstruction with orthonormal filters 2.FIR filters 3.Computational efficiency

Subband Training 1.Subband decomposition of source and target utterances 2.fs = 44100 Hz  4 subbands 3.Alignment using Sentence HMMs 4.Generation of subband codebooks 5.Satisfactory alignment performance with lower subbands 6.Training takes much shorter time

Subband Transformation (1) 1.Subband decomposition of input utterance(s) from source speaker 2.fs = 44100 Hz  4 subbands 3.Only first subband converted 4.5.5Khz-22.05KHz bandpass filtered 5.FD-PSOLA applied to whole spectrum

Subband Transformation (2)

Evaluations (1) ABX Listening Test : 1.5 female (F) and 5 male(M) speakers as source and target 2.M  F, F  M, M  M, F  F conversions 3.20 subjects 4.(A) and (B) : fullband/subband output 5.(X) : target recording 6.Subband output is preferred by 92.1%.

Evaluations (2) Perceptual Experiments: 1.Assessment of frequency bands for perception of speaker identity 2.1.0 KHz-1.8 KHz range is the dominant region

Evaluations (3) Advantages : 1.Solution to root finding problems for LSFs 2.Distortion at non-speech regions prevented 3.Faster training 4.Faster codebook search & transformation

Voice Conversion System (VCS) 1.A software tool for voice conversion incorporating: - the voice conversion algorithm - tools for pre- and post-processing,recording, analysis and testing 2. VOX is a VCS developed by SESTEK Inc.

Demonstration Fullband : Subband : (1) (2)

Future Work 1.Modifications related to experimental results 2.Better prosody conversion 3.Modifications related to TTS applications

Oytun Turk and Levent M.Arslan Subband Based Voice Conversion SESTEK Inc., R&D Dept. Istanbul, Turkey Bogazici University, Electrical-Electronics Eng.

Similar presentations

Presentation on theme: "Oytun Turk and Levent M.Arslan Subband Based Voice Conversion SESTEK Inc., R&D Dept. Istanbul, Turkey Bogazici University, Electrical-Electronics Eng."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Oytun Turk and Levent M.Arslan Subband Based Voice Conversion SESTEK Inc., R&D Dept. Istanbul, Turkey Bogazici University, Electrical-Electronics Eng.

Similar presentations

Presentation on theme: "Oytun Turk and Levent M.Arslan Subband Based Voice Conversion SESTEK Inc., R&D Dept. Istanbul, Turkey Bogazici University, Electrical-Electronics Eng."— Presentation transcript:

Similar presentations

About project

Feedback