Oytun Turk and Levent M.Arslan Subband Based Voice Conversion SESTEK Inc., R&D Dept. Istanbul, Turkey Bogazici University, Electrical-Electronics Eng.

Slides:



Advertisements
Similar presentations
Learning Introductory Signal Processing Using Multimedia 1 Outline Overview of Information and Communications Some signal processing concepts Tools available.
Advertisements

By: Hossein and Hadi Shayesteh Supervisor: Mr J.Connan.
Speech Processing for NSR Vs DSR Veeru Ramaswamy PhD CTO, Vianix LLC
August 2004Multirate DSP (Part 2/2)1 Multirate DSP Digital Filter Banks Filter Banks and Subband Processing Applications and Advantages Perfect Reconstruction.
Speech Compression. Introduction Use of multimedia in personal computers Requirement of more disk space Also telephone system requires compression Topics.
Physics 145 Introduction to Experimental Physics I Instructor: Karine Chesnel Office: N319 ESC Tel: Office hours: on appointment.
AUDIO COMPRESSION TOOLS & TECHNIQUES Gautam Bhattacharya.
2004 COMP.DSP CONFERENCE Survey of Noise Reduction Techniques Maurice Givens.
Speaking Style Conversion Dr. Elizabeth Godoy Speech Processing Guest Lecture December 11, 2012.
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
VOICE CONVERSION METHODS FOR VOCAL TRACT AND PITCH CONTOUR MODIFICATION Oytun Türk Levent M. Arslan R&D Dept., SESTEK Inc., and EE Eng. Dept., Boğaziçi.
CEN352, Dr. Ghulam Muhammad King Saud University
Speech Coding Nicola Orio Dipartimento di Ingegneria dell’Informazione IV Scuola estiva AISV, 8-12 settembre 2008.
6/3/20151 Voice Transformation : Speech Morphing Gidon Porat and Yizhar Lavner SIPL – Technion IIT December
Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec
Speech Recognition in Noise
Fundamentals of Multimedia Chapter 8 Lossy Compression Algorithms (Wavelet) Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Introduction to Wavelets -part 2
1 New Technique for Improving Speech Intelligibility for the Hearing Impaired Miriam Furst-Yust School of Electrical Engineering Tel Aviv University.
Digital to Analogue Conversion Natural signals tend to be analogue Need to convert to digital.
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
EE513 Audio Signals and Systems Digital Signal Processing (Systems) Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
DSP Techniques for Software Radio DSP Front End Processing Dr. Jamil Ahmad.
Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.
„Bandwidth Extension of Speech Signals“ 2nd Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction 22nd and 23rd June.
The Wavelet Tutorial: Part3 The Discrete Wavelet Transform
Details, details… Intro to Discrete Wavelet Transform The Story of Wavelets Theory and Engineering Applications.
Prepared by: Waleed Mohamed Azmy Under Supervision:
Preprocessing Ch2, v.5a1 Chapter 2 : Preprocessing of audio signals in time and frequency domain  Time framing  Frequency model  Fourier transform 
An Evaluation of Many-to-One Voice Conversion Algorithms with Pre-Stored Speaker Data Sets Daisuke Tani, Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari.
Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Second Lecture Stuttgart, October 25, 2001.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
DR.D.Y.PATIL POLYTECHNIC, AMBI COMPUTER DEPARTMENT TOPIC : VOICE MORPHING.
Team 5 Wavelets for Image Fusion Xiaofeng “Sam” Fan Jiangtao “Willy” Kuang Jason “Jingsu” West.
DCT.
Speech Signal Processing I
Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective Xiaohan Ma, Binh H. Le, and Zhigang Deng Department of Computer Science.
Embedded Image coding using zero-trees of Wavelet Transform Authors: Harish Rajagopal Brett Buehl.
IIT Bombay 14 th National Conference on Communications, 1-3 Feb. 2008, IIT Bombay, Mumbai, India 1/27 Intro.Intro.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
Performance Comparison of Speaker and Emotion Recognition
Subband Feature Statistics Normalization Techniques Based on a Discrete Wavelet Transform for Robust Speech Recognition Jeih-weih Hung, Member, IEEE, and.
Digital Signal Processing Using MATLAB®V.4 Associate Prof. Supervisor of Master Degree Student LILI Office: Room 402, Electromechanic Building
ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition?  also known as automatic speech recognition or computer speech.
HMM-Based Speech Synthesis Erica Cooper CS4706 Spring 2011.
Wavelet Transform Yuan F. Zheng Dept. of Electrical Engineering The Ohio State University DAGSI Lecture Note.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
The Discrete Wavelet Transform for Image Compression Speaker: Jing-De Huang Advisor: Jian-Jiun Ding Graduate Institute of Communication Engineering National.
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
Multi resolution Watermarking For Digital Images Presented by: Mohammed Alnatheer Kareem Ammar Instructor: Dr. Donald Adjeroh CS591K Multimedia Systems.
By Dr. Rajeev Srivastava CSE, IIT(BHU)
1 What is Multimedia? Multimedia can have a many definitions Multimedia means that computer information can be represented through media types: – Text.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Speaker Verification System Middle Term Presentation Performed by: Barak Benita & Daniel Adler Instructor: Erez Sabag.
Dialect Simulation through Prosody Transfer: A preliminary study on simulating Masan dialect with Seoul dialect Kyuchul Yoon Division of English, Kyungnam.
SIMD Implementation of Discrete Wavelet Transform Jake Adriaens Diana Palsetia.
G. Anushiya Rachel Project Officer
High Quality Voice Morphing
Mr. Darko Pekar, Speech Morphing Inc.
ARTIFICIAL NEURAL NETWORKS
Overview Communication is the transfer of information from one place to another. This should be done - as efficiently as possible - with as much fidelity/reliability.
EE Audio Signals and Systems
Presented by: Chen Shi 02/22/2018
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
Software Development Approaches
Govt. Polytechnic Dhangar(Fatehabad)
CEN352, Dr. Ghulam Muhammad King Saud University
Presentation transcript:

Oytun Turk and Levent M.Arslan Subband Based Voice Conversion SESTEK Inc., R&D Dept. Istanbul, Turkey Bogazici University, Electrical-Electronics Eng. Dept. Istanbul, Turkey

Overview Definitions Applications Fullband Approach Subband Approach Evaluations Demonstration

Original Looping Sicilian Code :

What Is Voice Conversion (VC)?

Applications of VC 1.Film Industry 2.TTS : Adaptive systems enabling TTS with any user’s voice 3.Healthcare/Voice Disorders 4.Speech Recognition, Speaker Identification and Verification 5.Multimedia

Fullband Approach (STASC) Method : S peaker T ransformation A lgorithm Using S egmental C odebooks Steps : 1. Same utterances from source & target speakers recorded 2. Sentence HMM based alignment 3. Codebook generation 4. Transformation

Subband Approach (1) Subband decomposition using Discrete Wavelet Transform(DWT)

Subband Approach (2) Advantages of DWT: 1.Perfect reconstruction with orthonormal filters 2.FIR filters 3.Computational efficiency

Subband Training 1.Subband decomposition of source and target utterances 2.fs = Hz  4 subbands 3.Alignment using Sentence HMMs 4.Generation of subband codebooks 5.Satisfactory alignment performance with lower subbands 6.Training takes much shorter time

Subband Transformation (1) 1.Subband decomposition of input utterance(s) from source speaker 2.fs = Hz  4 subbands 3.Only first subband converted 4.5.5Khz-22.05KHz bandpass filtered 5.FD-PSOLA applied to whole spectrum

Subband Transformation (2)

Evaluations (1) ABX Listening Test : 1.5 female (F) and 5 male(M) speakers as source and target 2.M  F, F  M, M  M, F  F conversions 3.20 subjects 4.(A) and (B) : fullband/subband output 5.(X) : target recording 6.Subband output is preferred by 92.1%.

Evaluations (2) Perceptual Experiments: 1.Assessment of frequency bands for perception of speaker identity KHz-1.8 KHz range is the dominant region

Evaluations (3) Advantages : 1.Solution to root finding problems for LSFs 2.Distortion at non-speech regions prevented 3.Faster training 4.Faster codebook search & transformation

Voice Conversion System (VCS) 1.A software tool for voice conversion incorporating: - the voice conversion algorithm - tools for pre- and post-processing,recording, analysis and testing 2. VOX is a VCS developed by SESTEK Inc.

Demonstration Fullband : Subband : (1) (2)

Future Work 1.Modifications related to experimental results 2.Better prosody conversion 3.Modifications related to TTS applications