Factor Analysis of MRI- Derived Tongue Shapes Mark Hasegawa-Johnson ECE Department and Beckman Institute University of Illinois at Urbana-Champaign.

Slides:



Advertisements
Similar presentations
Face Recognition Sumitha Balasuriya.
Advertisements

EigenFaces and EigenPatches Useful model of variation in a region –Region must be fixed shape (eg rectangle) Developed for face recognition Generalised.
11/11/02 IDR Workshop Dealing With Location Uncertainty in Images Hasan F. Ates Princeton University 11/11/02.
Coarticulation Analysis of Dysarthric Speech Xiaochuan Niu, advised by Jan van Santen.
Automatic determination of skeletal age from hand radiographs of children Image Science Institute Utrecht University C.A.Maas.
Multi-Task Compressive Sensing with Dirichlet Process Priors Yuting Qi 1, Dehong Liu 1, David Dunson 2, and Lawrence Carin 1 1 Department of Electrical.
Evidence of a Production Basis for Front/Back Vowel Harmony Jennifer Cole, Gary Dell, Alina Khasanova University of Illinois at Urbana-Champaign Is there.
ECE 8443 – Pattern Recognition Objectives: Course Introduction Typical Applications Resources: Syllabus Internet Books and Notes D.H.S: Chapter 1 Glossary.
Vocal Tract Physiology December 2, 2014 Almost There… The final interim course project report is due today! I’ll get your last graded homeworks back.
Speech Group INRIA Lorraine
Basis Expansion and Regularization Presenter: Hongliang Fei Brian Quanz Brian Quanz Date: July 03, 2008.
EARS1160 – Numerical Methods notes by G. Houseman
Model-Based Organ Segmentation: Recent Methods Jiun-Hung Chen General Exam Paper
CSci 6971: Image Registration Lecture 3: Images and Transformations January 20, 2004 Prof. Chuck Stewart, RPI Dr. Luis Ibanez, Kitware Prof. Chuck Stewart,
Exchanging Faces in Images SIGGRAPH ’04 Blanz V., Scherbaum K., Vetter T., Seidel HP. Speaker: Alvin Date: 21 July 2004.
Describing the sounds of language
Speaker Adaptation for Vowel Classification
Illumination Estimation via Non- Negative Matrix Factorization By Lilong Shi, Brian Funt, Weihua Xiong, ( Simon Fraser University, Canada) Sung-Su Kim,
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
CS 188: Artificial Intelligence Fall 2009 Lecture 21: Speech Recognition 11/10/2009 Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint.
Face Recognition Based on 3D Shape Estimation
Image Registration Narendhran Vijayakumar (Naren) 12/17/2007 Department of Electrical and Computer Engineering 1.
The Visible Human Project "The Visible Human Project includes digitized photographic images for cryosectioning, digital images derived from computerized.
An Interactive Segmentation Approach Using Color Pre- processing Marisol Martinez Escobar Ph.D Candidate Major Professor: Eliot Winer Department of Mechanical.
Fundamentals of Statistical Analysis DR. SUREJ P JOHN.
Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004.
Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)
Active Shape Models: Their Training and Applications Cootes, Taylor, et al. Robert Tamburo July 6, 2000 Prelim Presentation.
Vocal Tract Physiology April 5, 2013 The Toolkit There are four primary active articulators in speech. (articulators we can move around ) 1.The lips.
Hyperspectral Imaging Alex Chen 1, Meiching Fong 1, Zhong Hu 1, Andrea Bertozzi 1, Jean-Michel Morel 2 1 Department of Mathematics, UCLA 2 ENS Cachan,
Page 0 of 14 Dynamical Invariants of an Attractor and potential applications for speech data Saurabh Prasad Intelligent Electronic Systems Human and Systems.
2-1 MGMG 522 : Session #2 Learning to Use Regression Analysis & The Classical Model (Ch. 3 & 4)
Acoustic Phonetics 3/9/00. Acoustic Theory of Speech Production Modeling the vocal tract –Modeling= the construction of some replica of the actual physical.
بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance.
Sparse Shape Representation using the Laplace-Beltrami Eigenfunctions and Its Application to Modeling Subcortical Structures Xuejiao Chen.
Support Vector Machines Mei-Chen Yeh 04/20/2010. The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined.
On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.
CSci 6971: Image Registration Lecture 3: Images and Transformations March 1, 2005 Prof. Charlene Tsai.
Comparison of Surface Coil and Automatically-tuned, Flexible Interventional Coil Imaging in a Porcine Knee R. Venook 1, B. Hargreaves 1, S. Conolly 1,
Chapter 8 Curve Fitting.
2004 All Hands Meeting Analysis of a Multi-Site fMRI Study Using Parametric Response Surface Models Seyoung Kim Padhraic Smyth Hal Stern (University of.
♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 1/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay NCC 2011 : 17.
Medical Image Analysis Image Registration Figures come from the textbook: Medical Image Analysis, by Atam P. Dhawan, IEEE Press, 2003.
Author :Monica Barbu-McInnis, Jose G. Tamez-Pena, Sara Totterman Source : IEEE International Symposium on Biomedical Imaging April 2004 Page(s): 840 -
Computer Vision Lecture #10 Hossam Abdelmunim 1 & Aly A. Farag 2 1 Computer & Systems Engineering Department, Ain Shams University, Cairo, Egypt 2 Electerical.
D. M. J. Tax and R. P. W. Duin. Presented by Mihajlo Grbovic Support Vector Data Description.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Drawing With Lines and Shapes!
Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.
Lateralized change of ventricular shape in monozygotic twins discordant for schizophrenia 2 M Styner, 1,2 G Gerig, 3 DW Jones, 3 DR Weinberger, 1 JA Lieberman.
ICVGIP 2012 ICVGIP 2012 Speech training aids Visual feedback of the articulatory efforts during acquisition of speech production by a hearing-impaired.
Lecture 6: Point Interpolation
Presented at the 7th Annual CMAS Conference, Chapel Hill, NC, October 6-8, 2008 Identifying Optimal Temporal Scale for the Correlation of AOD and Ground.
 Present by 陳群元.  Introduction  Previous work  Predicting motion patterns  Spatio-temporal transition distribution  Discerning pedestrians  Experimental.
Spatial Smoothing and Multiple Comparisons Correction for Dummies Alexa Morcom, Matthew Brett Acknowledgements.
0 / 27 John-Paul Hosom 1 Alexander Kain Brian O. Bush Towards the Recovery of Targets from Coarticulated Speech for Automatic Speech Recognition Center.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Landmark-Based Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Networks, and Phonology Mark Hasegawa-Johnson
Level Set Segmentation ~ 9.37 Ki-Chang Kwak.
CIVET seminar Presentation day: Presenter : Park, GilSoon.
An Articulatory Analysis of Phonological Transfer Using Real-Time MRI Joseph Tepperman, Erik Bresch, Yoon-Chul Kim, Sungbok Lee, Louis Goldstein, and Shrikanth.
Part 5 - Chapter
Background: The age old question to Mr
PARAFAC Analysis of 3-D Tongue Shape
CH 5: Multivariate Methods
LECTURE 01: COURSE OVERVIEW
Final Year Project Presentation --- Magic Paint Face
Spatial Analysis Longley et al..
Image Registration 박성진.
LECTURE 01: COURSE OVERVIEW
Presentation transcript:

Factor Analysis of MRI- Derived Tongue Shapes Mark Hasegawa-Johnson ECE Department and Beckman Institute University of Illinois at Urbana-Champaign

Background The vowel sounds of English are classified in two dimensions: “high/low” and “front/back.” i u aae e o FrontBack High Low

Background Tongue is composed of about 9 muscles (4 intrinsic, 5 extrinsic) Styloglossus Superior Phar. Constrictor Genioglossus Hyoglossus Transversus Verticalis Superior Longitudinalis Inferior Longitudinalis Palatoglossus

Theories of Motor Control Theory 1: Direct Control Theory 2: Hierarchical Control

Factor Analysis of X-Ray Images Harshman, Ladefoged, &Goldstein, 1977

Finding: Two factors account for 92% of variance.

Factor loadings seem to represent distinctive features: v 1 = [  front] v 2 = [  high]

Can Three-Dimensional Tongue Shape be Explained Using Shape Factors? Hypothesis 1 3D tongue shape during speech = weighted sum of 2-3 factors. Hypothesis 2 Shape of the factors t 1 (i), t 2 (i) is speaker-dependent. (??)

Why is 3D Different from 2D? Linear Source-Filter Theory: - Vowel Quality is Determined by Areas - Area Correlated w/Midsagittal Width

Do Shape Factors Exist in 3D? n If inter-speaker shape similarity is governed by desire for acoustic similarity, and... n If acoustic similarity depends on cross- sectional area, not cross-sectional shape... n Then Variation in 3D Shape May Not Have a Shape Factor Basis

Factor Analysis of MRI-Derived Tongue Shapes: Methodology 1. Recruit Subjects 2. Collect MRI Images 3. Segment the Images 4. Interpolate ROI to Create 3D Tongue Shapes for Each Vowel 5. Speaker-Dependent Factor Analysis 6. Speaker-Independent Factor Analysis

Subject Recruitment: n Ten subjects recruited; five successfully imaged (3 male, 2 female). n Subjects were college undergrads and grads with no metal fillings and no claustrophobia. n Subjects were trained to sustain vowel sounds with little variation. n Human subjects approval: both UCLA and Cedars-Sinai Medical Center.

MRI Image Collection GE Signa 1.5T T1-weighted 3mm slices 24 cm FOV 256 x 256 pixels Coronal, Axial Sounds per Subject. Breath-hold in vowel position for 25 seconds

Image Viewing and Segmentation: the CTMRedit GUI and toolbox n Display series of CT or MR image slices n Segment ROI manually or automatically n Interpolate and reconstruct ROI in 3D space

Calibration: Segmentation of Phantom (J. Cha) n Test tubes of 3 sizes n Radius estimated from manual segmentation has an absolute error of u typical case: 0.1mm u worst case: 0.4mm

Calibration: Articulatory Speech Synthesis (J. Cha) n /a,i,u/ synthesized using Maeda articulatory synthesizer n F1-F4 errors: u worst case: +/- 30% u mean error: +2.8% u std dev: 19.5%

Reconstruction of ROI n Interpolate between image slices to create 3D object.

Tongue Shape During /ae/

Speaker Normalization: VT Length, Inter-Molar Width (S. Pizza)

Speaker-Dependent Factor Analysis n 12 tongue shapes from one speaker: u Each tongue shape modeled as a 25 point x 40 point rubber sheet. n Principal Components Analysis: u 11 Non-Zero Factors (12 vowels - 1 mean vector = 11 degrees of freedom). u 2 Factors: 78% of variance u 3 Factors: 88% of variance

“Excuses:” Why Didn’t it Work? n Tongue Length changes from /ao/ to /iy/. n Human Transcriber Error? n Interpolation to Form 3D Image Causes Error u Spline & Sinc interpolation: very large errors u Linear interpolation: smaller errors, but still too large.

New Approaches: ---- Avoid Interpolation General Method: Avoid interpolation by modeling the measured data directly. n J. Huang: Control factor shape using an a priori probability distribution. n Y. Zheng: Limit factor to the set of polynomial surfaces.

Polynomial Smoothing (Y. Zheng) n Polynomial Surface Modeling u Tongue shape = polynomial surface u 4D surface model enforces smoothness constraints. n Hybrid Polynomial/Factor model u Midsagittal tongue shape is as predicted by Harshman et al. u 3D shape = (midsag. shape)X(polynomial)

Conclusions n X-ray analysis suggests hierarchical motor control, but... n “Hierarchical control” might reflect structure of the acoustic space. n MRI analysis does not find hierarchical control (yet), but... n Negative finding might be result of methodological weakness.

Speaker-Dependent Factor Analysis