Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June,

Slides:



Advertisements
Similar presentations
Acoustic/Prosodic Features
Advertisements

Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),
Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project.
Speech Recognition Chapter 3
ACOUSTICAL THEORY OF SPEECH PRODUCTION
The Human Voice Chapters 15 and 17. Main Vocal Organs Lungs Reservoir and energy source Larynx Vocal folds Cavities: pharynx, nasal, oral Air exits through.
Speech Sound Production: Recognition Using Recurrent Neural Networks Abstract: In this paper I present a study of speech sound production and methods for.
A 12-WEEK PROJECT IN Speech Coding and Recognition by Fu-Tien Hsiao and Vedrana Andersen.
Itay Ben-Lulu & Uri Goldfeld Instructor : Dr. Yizhar Lavner Spring /9/2004.
Automatic Lip- Synchronization Using Linear Prediction of Speech Christopher Kohnert SK Semwal University of Colorado, Colorado Springs.
Eva Björkner Helsinki University of Technology Laboratory of Acoustics and Audio Signal Processing HUT, Helsinki, Finland KTH – Royal Institute of Technology.
PHYS 103 lecture 29 voice acoustics. Vocal anatomy Air flow through vocal folds produces “buzzing” (like lips) Frequency is determined by thickness (mass)
Unit 4 Articulation I.The Stops II.The Fricatives III.The Affricates IV.The Nasals.
6/3/20151 Voice Transformation : Speech Morphing Gidon Porat and Yizhar Lavner SIPL – Technion IIT December
Introduction to Speech Synthesis ● Key terms and definitions ● Key processes in sythetic speech production ● Text-To-Phones ● Phones to Synthesizer parameters.
Pole Zero Speech Models Speech is nonstationary. It can approximately be considered stationary over short intervals (20-40 ms). Over thisinterval the source.
SPPA 403 Speech Science1 Unit 3 outline The Vocal Tract (VT) Source-Filter Theory of Speech Production Capturing Speech Dynamics The Vowels The Diphthongs.
Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.
Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition Nengheng Zheng Supervised under Professor P.C. Ching Nov. 26,
Representing Acoustic Information
CS 551/651: Structure of Spoken Language Lecture 1: Visualization of the Speech Signal, Introductory Phonetics John-Paul Hosom Fall 2010.
LE 460 L Acoustics and Experimental Phonetics L-13
IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults.
Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.
Second International Conference on Intelligent Interactive Technologies and Multimedia (IITM 2013), March 2013, Allahabad, India 09 March 2013 Speech.
Multimedia Specification Design and Production 2013 / Semester 2 / week 3 Lecturer: Dr. Nikos Gazepidis
IIT Bombay AIM , IIT Bombay, 27 June ’03 1 Online Monitoring of Dissipation Factor Dayashankar Dubey (MTech) Suhas P. Solanki,
Acoustic Phonetics 3/9/00. Acoustic Theory of Speech Production Modeling the vocal tract –Modeling= the construction of some replica of the actual physical.
Digital Systems: Hardware Organization and Design
Eva Björkner Helsinki University of Technology Laboratory of Acoustics and Audio Signal Processing HUT, Helsinki, Finland KTH – Royal Institute of Technology.
Jacob Zurasky ECE5526 – Spring 2011
1 Linear Prediction. 2 Linear Prediction (Introduction) : The object of linear prediction is to estimate the output sequence from a linear combination.
1 Linear Prediction. Outline Windowing LPC Introduction to Vocoders Excitation modeling  Pitch Detection.
♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 1/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay NCC 2011 : 17.
Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.
P.C. Pandey, EE Dept, IIT Bombay ♠♠ 1. Intro. 2. Visual STA 3. LPC VTSE 4. VTSE DSC 5. Results 6. Con. ♦♦ ◄◄ ►► 1 / 81 ♠♠1. Intro2. Visual STA 3. LPC VTSE4.
Structure of Spoken Language
Speech Science VI Resonances WS Resonances Reading: Borden, Harris & Raphael, p Kentp Pompino-Marschallp Reetzp
Linear Predictive Analysis 主講人:虞台文. Contents Introduction Basic Principles of Linear Predictive Analysis The Autocorrelation Method The Covariance Method.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.
IIT Bombay 14 th National Conference on Communications, 1-3 Feb. 2008, IIT Bombay, Mumbai, India 1/27 Intro.Intro.
ICVGIP 2012 ICVGIP 2012 Speech training aids Visual feedback of the articulatory efforts during acquisition of speech production by a hearing-impaired.
P.C. Pandey, EE Dept, IIT Bombay ♠♠ 1. Intro. 2. Visual STA 3. LPC VTSE 4. VTSE DSC 5. Impro,VTS 6. Visual Disp. 7. Con. ♦♦ ◄◄ ►► 1 / 95 ♠♠1. Intro2. Visual.
More On Linear Predictive Analysis
EE Dept., IIT Bombay Workshop “Radar and Sonar Signal Processing,” NSTL Visakhapatnam, Aug 2015 Coordinator: Ms. M. Vijaya.
1 Introduction1 Introduction 2 Noise red. tech 3 Spect. Subtr. 4. QBNE 5 Invest. QBNE 6 Conc., & future work2 Noise red. tech 3 Spect. Subtr.4. QBNE5 Invest.
Chapter 20 Speech Encoding by Parameters 20.1 Linear Predictive Coding (LPC) 20.2 Linear Predictive Vocoder 20.3 Code Excited Linear Prediction (CELP)
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
IIT Bombay ISTE, IITB, Mumbai, 28 March, SPEECH SYNTHESIS PC Pandey EE Dept IIT Bombay March ‘03.
Speech Generation and Perception
P105 Lecture #27 visuals 20 March 2013.
Acoustic Phonetics 3/14/00.
1 Introduction1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work2 Spectral subtraction 3 QBNE4 Results5 Conclusion, & future.
Linear Prediction.
Phonetics: A lecture Raung-fu Chung Southern Taiwan University
Bayesian Enhancement of Speech Signals Jeremy Reed.
P105 Lecture #26 visuals 18 March 2013.
Figure 11.1 Linear system model for a signal s[n].
Vocoders.
Estimation and Display of Vocal Tract Shape for Speech Training
Linear Prediction.
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
Speech Generation and Perception
Linear Predictive Coding Methods
Linear Prediction.
An Introduction to Sound
Speech Generation and Perception
Speech Processing Final Project
Presentation transcript:

Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June, 2003

2 ABSTRACT: The display of intensity, pitch, and vocal tract shape is considered to be helpful in speech training of the hearing impaired. A speech analysis package is developed in MATLAB for displaying speech waveforms, pitch and energy contours, spectrogram, and areagram (a two-dimensional plot of cross- sectional area of vocal tract as a function of time and position along the tract length). While vocal tract shape estimation works satisfactorily for vowels, during stop closures, the place of closure can not be estimated due to very low signal energy. There is a need to investigate methods for predicting vocal tract shape during stop closure from the shapes estimated on either side of the closure. Work is in progress for lip shape estimation which may find application in video telephony.

3 Introduction  Hearing impairment → Lack of auditory feedback during speech production → Speech impairment  Speech training to hearing impaired children by visual (using a mirror) & tactile feedback : some important features and efforts not distinguishable  Speech training aids: Display of articulatory efforts and acoustic parameters: vocal tract and lip shape, pitch, and energy variations

4 Vocal tract shape estimation General model for speech production system Where s ( n ) = speech signal, u ( n ) = glottal excitation, g ( n ) = glottis impulse response, v ( n ) = impulse response of the vocal tract, r ( n ) = impulse response of radiation from lips. Cont..

5 Acoustic tube model of the vocal tract Cont.. At the mth section, volume velocity: pressure: reflection coefficient:

6 Speech analysis model (Wakita-1973) Assumption vocal tract represented as an all-pole filter with Algorithmic steps: inverse filtering for error signal with LMS technique set of simultaneous equations solved with Robinson’s algorithm for reflection coefficients & relative area values Cont..

7 Implementation ■ Set-up: PC with sound card for signal acquisition (sampling rate used: k sa/s) ■ “ VTAG-1 ” developed for speech pr. & display  Pre-emphasis for 6 dB/octave equalization, analysis window: 256-sample Hamming with 50% overlap  Robinson’s algorithm for obtaining reflection coefficients & area values  Beizer form algorithm for interpolation of area values

8 VTAG-1 result for all-vowel word /aIje/

9 Synthesized vowels /a//u//i/

10 Amplitude/pitch modulated synthesized vowel /a/ Amplitude modulatedPitch modulated Amp. & pitch modulated

11 Spectrograms for V-C-V sequence /aka/ /aga/ /ata/ /ada/

12 /aka/ /aga/ /ata/ /ada/ Areagram for V-C-V sequence

13 Lip shape estimation  Mouth parameters:  Parameter estimation : Pitch tracking : odd harmonics absent for analysis window length = 2 * pitch period Magnitude spectrum above 4000 Hz clipped to zero Mean & variance used for generation of predictor surfaces

14 Lip shape estimation results Pitch and mean vs. variance result (1): synthesized amplitude modulated vowel / u /

15 Pitch and mean vs. variance result (2): synthesized pitch/amplitude modulated vowel / a /

16 Pitch and mean vs. variance result (3): synthesized pitch modulated vowel / i /

17 Summary ■ Analysis & display package VTAG-1 developed for pitch/energy variation, spectrogram, & areagram (2-D plot of v.t. area) to investigate the problems in estimation of vocal tract shape, for use in speech training aid of the hearing impaired children. Cont.

18 ■ Area estimation for vowels: not affected by amplitude & pitch variation ■ Area estimation during stop closure: place of closure can not be estimated from analysis result during stop closure ■ Further work:  Investigate methods for predicting vocal tract area during stop closure from the areas estimated on either side of closure  Implement algorithm for generation of predictor surfaces for extraction of lip shape estimation parameters