ICVGIP 2012 ICVGIP 2012 Speech training aids Visual feedback of the articulatory efforts during acquisition of speech production by a hearing-impaired.

Slides:



Advertisements
Similar presentations
A. Hatzis, P.D. Green, S. Howard (1) Optical Logo-Therapy (OLT) : Visual displays in practical auditory phonetics teaching. Introduction What.
Advertisements

Descriptive schemes for facial expression introduction.
“Connecting the dots” How do articulatory processes “map” onto acoustic processes?
Basic Spectrogram & Clinical Application Lab 9. Spectrographic Features of Vowels n 1st formant carries much information about manner of articulation.
General Problems  Foreign language speakers of a target language cause a great difficulty to native speakers because the sounds they produce seems very.
The Human Voice Chapters 15 and 17. Main Vocal Organs Lungs Reservoir and energy source Larynx Vocal folds Cavities: pharynx, nasal, oral Air exits through.
Itay Ben-Lulu & Uri Goldfeld Instructor : Dr. Yizhar Lavner Spring /9/2004.
Automatic Lip- Synchronization Using Linear Prediction of Speech Christopher Kohnert SK Semwal University of Colorado, Colorado Springs.
Performance Evaluation Measures for Face Detection Algorithms Prag Sharma, Richard B. Reilly DSP Research Group, Department of Electronic and Electrical.
Facial feature localization Presented by: Harvest Jang Spring 2002.
Section 1.1 The Distance and Midpoint Formulas. x axis y axis origin Rectangular or Cartesian Coordinate System.
Standardized Test Practice EXAMPLE 2 SOLUTION Plot points P, Q, R, and S on a coordinate plane. Point P is located in Quadrant IV. Point Q is located in.
6/3/20151 Voice Transformation : Speech Morphing Gidon Porat and Yizhar Lavner SIPL – Technion IIT December
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Lip Feature Extraction Using Red Exclusion Trent W. Lewis and David M.W. Powers Flinders University of SA VIP2000.
LYU 0102 : XML for Interoperable Digital Video Library Recent years, rapid increase in the usage of multimedia information, Recent years, rapid increase.
I.1 ii.2 iii.3 iv.4 1+1=. i.1 ii.2 iii.3 iv.4 1+1=
Copyright 2003, Elsevier Science (USA). All rights reserved. Chapter 28 Oral Diagnosis and Treatment Planning Copyright 2003, Elsevier Science (USA). All.
I.1 ii.2 iii.3 iv.4 1+1=. i.1 ii.2 iii.3 iv.4 1+1=
Teaching Tool For French Speech Pronunciation Capstone Design Project 2008 Joseph Ciaburri Advisor: Professor Catravas.
Database Construction for Speech to Lip-readable Animation Conversion Gyorgy Takacs, Attila Tihanyi, Tamas Bardi, Gergo Feldhoffer, Balint Srancsik Peter.
LE 460 L Acoustics and Experimental Phonetics L-13
Senior Project – Electrical Engineering Tool for Improving Non-Native French Speech Pronunciation Joseph Ciaburri Advisor – Professor Catravas,
Copyright © 2013 Pearson Education, Inc. All rights reserved Section 1.1 The Distance and Midpoint Formulas; Graphing Utilities; Introduction to Graphing.
Copyright © 2013 Pearson Education, Inc. All rights reserved Section 1.1 The Distance and Midpoint Formulas; Graphing Utilities; Introduction to Graphing.
Effect of Mutual Coupling on the Performance of Uniformly and Non-
Multimodal Interaction Dr. Mike Spann
Björkner, Eva Researcher, Doctoral Student Address Helsinki University of Technology Laboratory of Acoustics and Audio Signal Processing P.O. Box 3000.
Translations Translations and Getting Ready for Reflections by Graphing Horizontal and Vertical Lines.
Multimodal Information Analysis for Emotion Recognition
♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 1/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay NCC 2011 : 17.
DIEGO AGUIRRE COMPUTER VISION INTRODUCTION 1. QUESTION What is Computer Vision? 2.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
BING: Binarized Normed Gradients for Objectness Estimation at 300fps
DR.D.Y.PATIL POLYTECHNIC, AMBI COMPUTER DEPARTMENT TOPIC : VOICE MORPHING.
Variation of aspect ratio Voice section Correct voice section Voice Activity Detection by Lip Shape Tracking Using EBGM Purpose What is EBGM ? Experimental.
Speech Science VI Resonances WS Resonances Reading: Borden, Harris & Raphael, p Kentp Pompino-Marschallp Reetzp
Face Detection Ying Wu Electrical and Computer Engineering Northwestern University, Evanston, IL
Linguistics The fourth week. Chapter 2 The Sounds of Language 2.1 Introduction 2.1 Introduction 2.2 Phonetics 2.2 Phonetics.
Section 1.1 Rectangular Coordinates; Graphing Utilities; Introduction to Graphing Equations.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
5. Vowels he who.
P.C. Pandey, EE Dept, IIT Bombay ♠♠ 1. Intro. 2. Visual STA 3. LPC VTSE 4. VTSE DSC 5. Impro,VTS 6. Visual Disp. 7. Con. ♦♦ ◄◄ ►► 1 / 95 ♠♠1. Intro2. Visual.
Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June,
Performance Driven Facial Animation
Observing Lip and Vertical Larynx Movements During Smiled Speech (and Laughter) - work in progress - Sascha Fagel 1, Jürgen Trouvain 2, Eva Lasarcyk 2.
1 Introduction1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work2 Spectral subtraction 3 QBNE4 Results5 Conclusion, & future.
Preliminary Transformations Presented By: -Mona Saudagar Under Guidance of: - Prof. S. V. Jain Multi Oriented Text Recognition In Digital Images.
Wrapping Snakes For Improved Lip Segmentation Matthew Ramage Dr Euan Lindsay (Supervisor) Department of Mechanical Engineering.
An Articulatory Analysis of Phonological Transfer Using Real-Time MRI Joseph Tepperman, Erik Bresch, Yoon-Chul Kim, Sungbok Lee, Louis Goldstein, and Shrikanth.
Acoustic to Articoulatory Speech Inversion by Dynamic Time Warping
Every segment is congruent to its image.
Every segment is congruent to its image.
Copyright © American Speech-Language-Hearing Association
Section 1.1 The Distance and Midpoint Formulas; Graphing Utilities; Introduction to Graphing Equations.
Rectangular Coordinates;
Speech Organs The process of producing speech
Linear Predictive Coding Methods
Transformations.
The Distance and Midpoint Formulas
Progress report 2019/1/14 PHHung.
Index Notation Sunday, 24 February 2019.
Experiment Report Format
Putative MBTA microbial community sources.
The Distance and Midpoint Formulas
Example segmentations - unseen images
A Japanese trilogy: Segment duration, articulatory kinematics, and interarticulator programming Anders Löfqvist Haskins Laboratories New Haven, CT.
Rectangular Coordinates; Introduction to Graphing Equations
INTRODUCTION TO PHONETICS for III H.E.C.E., V Semester Students
The Human Voice.
Presentation transcript:

ICVGIP 2012 ICVGIP 2012 Speech training aids Visual feedback of the articulatory efforts during acquisition of speech production by a hearing-impaired child. Display of articulatory effort using LPC-based analysis of speech signal Oral cavity: fixed length tubular sections. LPC analysis of windowed speech frames >> LPC reflection coefficients >> Section area ratios >> Section areas, assuming constant glottis-end area >> Vocal tract shape [Wakita, 1973] >> Display of the articulatory efforts not visible on speaker's face. Introduction

ICVGIP 2012 ICVGIP 2012 Problem: Errors due to variation in glottis-end area during speech production [Wakita,1979]. Proposed solution Acquisition of speech as audio and facial image as video. Using mouth opening area estimated from the video as the reference area of the lip-end section, for scaling of the area ratios obtained from LPC analysis of simultaneously acquired speech signal [Nayak et al., 2012]. Investigation A technique for estimation of the mouth opening, without errors caused by teeth and tongue between the lips Contrast enhancement with multi-threshold binarization Connected component detection

ICVGIP 2012 ICVGIP 2012 Processing steps iv) Horizontal opening v) Vertical opening: segmentation, multi-threshold binarization, connected component detection vi) Det. of inner lip boundaries vii) Mouth opening area calculation i) Input frame ii) Face sub-image iii) Mouth sub-image [Viola & Jones, 2004] [Hsu et al., 2002]

ICVGIP 2012 ICVGIP 2012 Test results Test material: video recordings of vowels /a i u/ of 12 male speakers. Scatter plot of estimated values & values obtained manually Corr. coeffi.: 0.91