♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 1/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay NCC 2011 : 17.

Slides:



Advertisements
Similar presentations
SPPA 403 Speech Science1 Unit 3 outline The Vocal Tract (VT) Source-Filter Theory of Speech Production Capturing Speech Dynamics The Vowels The Diphthongs.
Advertisements

Basic Spectrogram & Clinical Application: Consonants
Philip Harrison J P French Associates & Department of Language & Linguistic Science, York University IAFPA 2006 Annual Conference Göteborg, Sweden Variability.
Analysis and Digital Implementation of the Talk Box Effect Yuan Chen Advisor: Professor Paul Cuff.
“Connecting the dots” How do articulatory processes “map” onto acoustic processes?
Coarticulation Analysis of Dysarthric Speech Xiaochuan Niu, advised by Jan van Santen.
Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project.
Speech Recognition Chapter 3
Vineel Pratap Girish Govind Abhilash Veeragouni. Human listeners are capable of extracting information from the acoustic signal beyond just the linguistic.
ACOUSTICAL THEORY OF SPEECH PRODUCTION
The Human Voice Chapters 15 and 17. Main Vocal Organs Lungs Reservoir and energy source Larynx Vocal folds Cavities: pharynx, nasal, oral Air exits through.
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
A 12-WEEK PROJECT IN Speech Coding and Recognition by Fu-Tien Hsiao and Vedrana Andersen.
Itay Ben-Lulu & Uri Goldfeld Instructor : Dr. Yizhar Lavner Spring /9/2004.
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
Speech Group INRIA Lorraine
VOICE CONVERSION METHODS FOR VOCAL TRACT AND PITCH CONTOUR MODIFICATION Oytun Türk Levent M. Arslan R&D Dept., SESTEK Inc., and EE Eng. Dept., Boğaziçi.
6/3/20151 Voice Transformation : Speech Morphing Gidon Porat and Yizhar Lavner SIPL – Technion IIT December
1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.
Anatomic Aspects Larynx: Sytem of muscles, cartileges and ligaments.
Analysis & Synthesis The Vocoder and its related technology.
1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.
1 Lab Preparation Initial focus on Speaker Verification –Tools –Expertise –Good example “Biometric technologies are automated methods of verifying or recognising.
Voice Transformation Project by: Asaf Rubin Michael Katz Under the guidance of: Dr. Izhar Levner.
Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.
Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition Nengheng Zheng Supervised under Professor P.C. Ching Nov. 26,
Representing Acoustic Information
LE 460 L Acoustics and Experimental Phonetics L-13
IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults.
S Legrand S nack for R uby. Talk Objectives Tour of API Learn the walk and talk Have Fun.
Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.
Multimedia Specification Design and Production 2013 / Semester 2 / week 3 Lecturer: Dr. Nikos Gazepidis
1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.
Acoustic Phonetics 3/9/00. Acoustic Theory of Speech Production Modeling the vocal tract –Modeling= the construction of some replica of the actual physical.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.
ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.
Speech Coding Submitted To: Dr. Mohab Mangoud Submitted By: Nidal Ismail.
1 Linear Prediction. 2 Linear Prediction (Introduction) : The object of linear prediction is to estimate the output sequence from a linear combination.
1 Linear Prediction. Outline Windowing LPC Introduction to Vocoders Excitation modeling  Pitch Detection.
Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.
P.C. Pandey, EE Dept, IIT Bombay ♠♠ 1. Intro. 2. Visual STA 3. LPC VTSE 4. VTSE DSC 5. Results 6. Con. ♦♦ ◄◄ ►► 1 / 81 ♠♠1. Intro2. Visual STA 3. LPC VTSE4.
Structure of Spoken Language
Linear Predictive Analysis 主講人:虞台文. Contents Introduction Basic Principles of Linear Predictive Analysis The Autocorrelation Method The Covariance Method.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.
P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti / DSP 2009, Santorini, 5-7 July DSP 2009 (Santorini, Greece. 5-7 July 2009), Session: S4P,
IIT Bombay 14 th National Conference on Communications, 1-3 Feb. 2008, IIT Bombay, Mumbai, India 1/27 Intro.Intro.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
ICVGIP 2012 ICVGIP 2012 Speech training aids Visual feedback of the articulatory efforts during acquisition of speech production by a hearing-impaired.
P.C. Pandey, EE Dept, IIT Bombay ♠♠ 1. Intro. 2. Visual STA 3. LPC VTSE 4. VTSE DSC 5. Impro,VTS 6. Visual Disp. 7. Con. ♦♦ ◄◄ ►► 1 / 95 ♠♠1. Intro2. Visual.
Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June,
More On Linear Predictive Analysis
1 Introduction1 Introduction 2 Noise red. tech 3 Spect. Subtr. 4. QBNE 5 Invest. QBNE 6 Conc., & future work2 Noise red. tech 3 Spect. Subtr.4. QBNE5 Invest.
Chapter 20 Speech Encoding by Parameters 20.1 Linear Predictive Coding (LPC) 20.2 Linear Predictive Vocoder 20.3 Code Excited Linear Prediction (CELP)
Detection of Vowel Onset Point in Speech S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute.
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
IIT Bombay ISTE, IITB, Mumbai, 28 March, SPEECH SYNTHESIS PC Pandey EE Dept IIT Bombay March ‘03.
1 Introduction1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work2 Spectral subtraction 3 QBNE4 Results5 Conclusion, & future.
IIT Bombay ICSCN International Conference on Signal Processing, Communications and Networking 1/30 Intro.Intro. Clear speech.
Vocoders.
Estimation and Display of Vocal Tract Shape for Speech Training
Automated Detection of Speech Landmarks Using
Linear Prediction.
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
Linear Predictive Coding Methods
Linear Prediction.
Presentation transcript:

♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 1/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay NCC 2011 : 17 th National Conference on Communications, Jan , 2011, Bangalore, India, Sp Pr. II, P4 Improving the Consistency of Vocal Tract Shape Estimation K. S. Nataraj Jagbandhu P. C. Pandey {natarajks, jagbandhu, M. S. Shah IIT Bombay

♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 2/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay OVERVIEW 1. Introduction 2. Variation in Vocal Tract Shape 3. Method 4. Results 5. Conclusion

♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 3/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay 1. INTRODUCTION Vocal Tract Shape : Cross-section area of the vocal tract as a function of the distance from the glottis towards the lips along its length. Applications  Articulatory synthesis  Speech recognition  Speech-training aids Visual Speech-training Aids Visual feedback of articulatory effort for teaching the production of vowels and lingual consonants Intro. 1/4

♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 4/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay Estimation of Vocal Tract Shape from Speech Signal  Linear Predictive Coding (LPC)  Formant analysis  Articulatory codebook mapping Intro. 2/4 LPC Based Estimation of Vocal Tract Shape  Vocal tract modeled as a lossless acoustic tube with sections of equal length and varying cross-section area & as an all-pole filter.  Reflection coefficients (ratio of volume velocities at section interfaces) obtained from LPC analysis of speech signal.  Area ratios calculated from the reflection coefficients.  Area values obtained by multiplying the area ratios by an assumed area at the glottis end.

♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 5/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay Intro. 3/4 Features of LPC Based VT Shape Estimation  Usable for estimating fixed as well as transitional vocal tract configurations.  Real-time processing feasible. Limitations of LPC Based VT Shape Estimation  Improper estimation during nasalized vowels, nasal stops, and fricatives, due to deviations from all-pole filter model.  Improper estimation during stop closures due to very low signal energy.  Error in estimation due to band-limited speech signal.  Error due to uncertain glottal source characteristics.  Error during varying tract configuration due to assumption of fixed area at the glottal end.  Variability in vocal tract shape during fixed tract configuration due to variations in the position of the analysis window with respect to the glottal pulse.

♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 6/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay Intro. 4/4 Objective of the Investigation A method for improving the consistency of the LPC-based estimation of the area values of the vocal tract cross-sections without smearing the variations during speech segments with transitional vocal tract configuration.

♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 7/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay 2. VARIATION IN VOCAL TRACT SHAPE VT Shape Estimation by LPC Analysis ▪ F s = 10 kHz, Pre-emphasis: 6 dB/octave, LPC order = 12 ▪ Analysis frame length: twice the average pitch period ▪ Analysis window: Hamming  Variation in the area values estimated with window shift of 5 ms, even for the vowel segments with fixed tract configurations.  Reduction in the variability possible by low-pass filtering (along time) of the estimated area values or by using a longer analysis window, but at the expense of smearing of the transitions during segments with transitional tract configurations e.g. diphthongs, VC and CV transitions.

♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 8/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay Example: Synthesized /-a-i-u-/ (a) speech waveform, (b) spectrogram, (c) areagram VTS Var. 2/4  Effect of analysis-frame position (window shift: 1 sample) Areagram 2D plot of square root of the area values as a function of time and distance from the glottis towards the lips (40 values obtained from interpolation of 12 section values) ▪ Large variation in the area values as a function of time ▪ Variations related to the position of the analysis frame with respect to the glottal pulse.

♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 9/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay Earlier Studies Rabiner et al. (1977) ▪ A substantial variation in the LPC prediction error with change in the position of the analysis frame. ▪ Variability in the prediction error could be reduced by all-pass filtering and pre- emphasis of the speech signal, but at the expense of an increase in the error. Mezzalama (1979) ▪ A large variation in the formants estimated by LPC analysis with change in the position of the analysis frame with respect to the glottal pulse. ▪ Variation could be reduced by selecting the frame length to be equal to the multiple of the pitch period and by repeatedly concatenating the frame before applying the analysis window. Mizoguchi et al. (1982): " S elective LP in time domain", involving rejection of speech segments corresponding to prediction error above a threshold, for reducing the variation in the prediction coefficients across the frames for steady-state vowel segments. Ma et al. (1993): Selection of speech samples on the basis of short-time energy found to be more robust for reducing the variation in the prediction coefficients than the selection based on LPC prediction error. VTS Var. 3/4

♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 10/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay VTS Var. 4/4 Selection of Frames for Reducing Variability in VT Shape  Variation in the RMS value of the LPC prediction error with the analysis frame position.  Frame positions corresponding to the minimum in the prediction error found to be related to the least estimation error in the vocal tract parameters.  Difficulty in consistently locating the peaks or the valleys of the LPC prediction error.  The variation in the prediction error found to be related to the GCIs, but the location of the frame positions for minimum error with respect to the GCIs found to be different for different vowels.  Minima of the prediction error coincide with the minima of the windowed energy for steady-state vowel segments.

♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 11/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay 3. METHOD Windowed Energy Index Automated selection of frames by using “windowed energy index”, calculated as the ratio of the energy of the windowed frame to the frame energy E w (n) = Windowed energy index for frame position n w(m) = Hamming window of length N s n (m) = speech segment for the frame position n Method 1/3

♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 12/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay Method 2/3 Windowed Energy Index E w for Synthesized Vowels /-a-i-u-/ Plots of signal waveform, Prediction error, and Windowed energy index for different frame lengths. a) Frame length = 2 (1 / F 0 ) ▪ Periodic with period equal to the pitch period ▪ Distinct minima, corresponding to the low values of prediction error b) Frame length = 2 (0.9 / F 0 ) ▪ Distinct minima, corresponding to the low values of the prediction error ▪ Different shapes for the three vowels c) Frame length = 2 (1.1 / F 0 ) Indistinct minima

♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 13/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay Method 3/3 Observations from E w for Synthesized Vowels Variability in estimated area values can be reduced by selecting the frame positions corresponding to the minima in E w, calculated with analysis frames of length equal to two pitch periods or slightly shorter.

♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 14/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay Results 1/3 (a) Analysis frames with 1- sample shift (b) Analysis frames with positions corresponding to the E w -minima (detected by valley picking ) 4. RESULTS Areagrams for Synthesized /-a-i-u-/ Much smaller variations in the E w -minima based areagram for all the three vowels.

♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 15/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay Results 2/3 Plot of Variation in the Sq.Root Area Values for Synthesized Vowels  Values for analysis frames with 1- sample shift (lines with light shade): a large spread.  Values for E w –minima selected frames (dark lines): smaller spread. A decrease of greater than an order of magnitude in the max-min deviations of the values for all the three vowels, and no significant change in the mean values.

♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 16/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay Results 3/3 Example: Vowel-Semivowel-Vowel Synth. / aja / Natural / aja / (speaker S1) (a) speech waveform, (b) spectrogram, (c) 1-sample shift areagram, (d) E w –minima areagram Areagram with E w –minima selected frames: Reduction in the variation during the fixed-tract configuration without smearing during the transitional configuration.

♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 17/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay 5. CONCLUSION  Analysis frames positioned at E w -minima resulted in ▪ low prediction error in LPC analysis, ▪ significantly reduced variability in the area values estimated by LP analysis during vowel segments with fixed-tract configurations.  Consistency of vocal tract shape estimation improved without smearing the variations in the shape during semivowel segments with transitional-tract configuration.  Method may be used to estimate the VC and CV transition area values during Vowel-Oral stop-Vowel utterances for improving ▪ the accuracy of the vocal tract shape during stop closures as estimated by bivariate surface modeling, ▪ vocal tract shape estimation for speech training aids. Concl. 1/1

♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 18/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay Thank You

♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 19/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay Example 1: Vowel-Semivowel-Vowel Synth. / awa /Natural / awa /(speaker S2) (a) speech waveform, (b) spectrogram, (c) 1-sample shift areagram, (d) E w –minima areagram

♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 20/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay Example 2: Vowel-Semivowel-Vowel Natural / aja / (speaker S2) (a) speech waveform, (b) spectrogram, (c) 1-sample shift areagram, (d) E w –minima areagram

♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 21/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay Example 3: Synth. Vowel Sequence with Pitch Varaiation ( F 0 : Hz) (a) speech waveform, (b) spectrogram, (c) 1-sample shift areagram, (d) E w –minima areagram Synth. /- a-i-u- /

♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 22/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay Example 4: Natural Vowel Sequence Natural /- a-i-u- / (speaker S1) (a) speech waveform, (b) spectrogram, (c) 1-sample shift areagram, (d) E w –minima areagram