Download presentation
Presentation is loading. Please wait.
1
Oytun Turk and Levent M.Arslan Subband Based Voice Conversion SESTEK Inc., R&D Dept. Istanbul, Turkey Bogazici University, Electrical-Electronics Eng. Dept. Istanbul, Turkey
2
Overview Definitions Applications Fullband Approach Subband Approach Evaluations Demonstration
3
Original Looping Sicilian Code :
4
What Is Voice Conversion (VC)?
5
Applications of VC 1.Film Industry 2.TTS : Adaptive systems enabling TTS with any user’s voice 3.Healthcare/Voice Disorders 4.Speech Recognition, Speaker Identification and Verification 5.Multimedia
6
Fullband Approach (STASC) Method : S peaker T ransformation A lgorithm Using S egmental C odebooks Steps : 1. Same utterances from source & target speakers recorded 2. Sentence HMM based alignment 3. Codebook generation 4. Transformation
7
Subband Approach (1) Subband decomposition using Discrete Wavelet Transform(DWT)
8
Subband Approach (2) Advantages of DWT: 1.Perfect reconstruction with orthonormal filters 2.FIR filters 3.Computational efficiency
9
Subband Training 1.Subband decomposition of source and target utterances 2.fs = 44100 Hz 4 subbands 3.Alignment using Sentence HMMs 4.Generation of subband codebooks 5.Satisfactory alignment performance with lower subbands 6.Training takes much shorter time
10
Subband Transformation (1) 1.Subband decomposition of input utterance(s) from source speaker 2.fs = 44100 Hz 4 subbands 3.Only first subband converted 4.5.5Khz-22.05KHz bandpass filtered 5.FD-PSOLA applied to whole spectrum
11
Subband Transformation (2)
12
Evaluations (1) ABX Listening Test : 1.5 female (F) and 5 male(M) speakers as source and target 2.M F, F M, M M, F F conversions 3.20 subjects 4.(A) and (B) : fullband/subband output 5.(X) : target recording 6.Subband output is preferred by 92.1%.
13
Evaluations (2) Perceptual Experiments: 1.Assessment of frequency bands for perception of speaker identity 2.1.0 KHz-1.8 KHz range is the dominant region
14
Evaluations (3) Advantages : 1.Solution to root finding problems for LSFs 2.Distortion at non-speech regions prevented 3.Faster training 4.Faster codebook search & transformation
15
Voice Conversion System (VCS) 1.A software tool for voice conversion incorporating: - the voice conversion algorithm - tools for pre- and post-processing,recording, analysis and testing 2. VOX is a VCS developed by SESTEK Inc.
16
Demonstration Fullband : Subband : (1) (2)
17
Future Work 1.Modifications related to experimental results 2.Better prosody conversion 3.Modifications related to TTS applications
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.