GEPPETO 1 : A modeling approach to study the production of speech gestures Pascal Perrier (ICP – Grenoble) with Stéphanie Buchaillard (PhD) Matthieu Chabanas.

Slides:



Advertisements
Similar presentations
Advances in Speech Synthesis
Advertisements

“Connecting the dots” How do articulatory processes “map” onto acoustic processes?
A two dimensional kinematic mapping between speech acoustics and vocal tract configurations : WISP A.Hatzis, P.D.Green1 History of Vowel.
Coarticulation Analysis of Dysarthric Speech Xiaochuan Niu, advised by Jan van Santen.
Speech Production Process
Vocal Tract Physiology December 2, 2014 Almost There… The final interim course project report is due today! I’ll get your last graded homeworks back.
Speech Perception Overview of Questions Can computers perceive speech as well as humans? Does each word that we hear have a unique pattern associated.
Speech Sound Production: Recognition Using Recurrent Neural Networks Abstract: In this paper I present a study of speech sound production and methods for.
Speech Group INRIA Lorraine
Chapter 17 Design Analysis using Inventor Stress Analysis Module
Hierarchical Multi-Resolution Finite Element Model for Soft Body Simulation Matthieu Nesme, François Faure, Yohan Payan 2 nd Workshop on Computer Assisted.
SPEECH PERCEPTION The Speech Stimulus Perceiving Phonemes Top-Down Processing Is Speech Special?
What is Phonetics? Short answer: The study of speech sounds in all their aspects. Phonetics is about describing speech. (Note: phonetics ¹ phonics) Phonetic.
Physiology of Articulation
Biological motor control Andrew Richardson McGovern Institute for Brain Research March 14, 2006.
Auditory-acoustic relations and effects on language inventory Carrie Niziolek [carrien] may 2004.
Antoine Girard VAL-AMS Project Meeting April 2007 Behavioral Metrics for Simulation-based Circuit Validation.
Introduction to virtual engineering László Horváth Budapest Tech John von Neumann Faculty of Informatics Institute of Intelligent Engineering.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Definition of an Industrial Robot
Vocal Tract Physiology April 5, 2013 The Toolkit There are four primary active articulators in speech. (articulators we can move around ) 1.The lips.
Abstract Research Questions The present study compared articulatory patterns in production of dental stop [t] with conventional dentures to productions.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Speech Science Fall 2009 Nov 2, Outline Suprasegmental features of speech Stress Intonation Duration and Juncture Role of feedback in speech production.
Un androïde doué de parole A speech-gifted android Institut de la Communication Parlée Laplace.
Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto.
Applied Speech Sciences 4/11/00. Speech Science Application Speech production via computers Forensics- criminal investigations; voice prints Assessing.
Articulation and Coarticulation March 16, Update The hard drive on the computer in the booth failed. It will hopefully be fixed soon. The lab assignment.
SH 316- Speech Science Syllabus Office hours: T, TH: 11-12:30pm (by appointment) W119 Thompson TA (Donna Eduardo): W112 Thompson Text: The Speech Sciences.
Introduction Surgical training environments as well as pre- and intra-operative planning environments require physics-based simulation systems to achieve.
T. Bajd, M. Mihelj, J. Lenarčič, A. Stanovnik, M. Munih, Robotics, Springer, 2010 ROBOT CONTROL T. Bajd and M. Mihelj.
Failed, because: Discriminability alone is not enough; code on speech needs to be compatible with speech. Minimally, must have the speed of speech. Lessons:
♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 1/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay NCC 2011 : 17.
The effect of varying the visual context on trajectory planning and perceptual awareness of one’s own performance 1 Psychology and Neurocognition Laboratory.
Materials Process Design and Control Laboratory Finite Element Modeling of the Deformation of 3D Polycrystals Including the Effect of Grain Size Wei Li.
Speech Science IX How is articulation organized? Version WS
Speech Science IX How is articulation organized?.
Study of Neural Correlates of Mandarin Tonal Production with Neural Network Model Department of Electrical Engineering, National Central University, Jhongli.
Motor Control. Beyond babbling Three problems with motor babbling: –Random exploration is slow –Error-based learning algorithms are faster but error signals.
Neural Bases of the Motor Theory of Speech Perception
MURI Annual Review, Vanderbilt, Sep 8 th, 2009 Heterogeneous Sensor Webs for Automated Target Recognition and Tracking in Urban Terrain (W911NF )
Cognition – 2/e Dr. Daniel B. Willingham
SPPA 4030 Speech Science1 Instructor: Stephen Tasko.
Accurate Robot Positioning using Corrective Learning Ram Subramanian ECE 539 Course Project Fall 2003.
Tongue movement kinematics in speech: Task specific control of movement speed Anders Löfqvist Haskins Laboratories New Haven, CT.
0 / 27 John-Paul Hosom 1 Alexander Kain Brian O. Bush Towards the Recovery of Targets from Coarticulated Speech for Automatic Speech Recognition Center.
Speech Production “Problems” Key problems that science must address How is speech coded? How is speech coded? What is the size of the “basic units” of.
Lecture Fall 2001 Controlling Animation Boundary-Value Problems Shooting Methods Constrained Optimization Robot Control.
STATIC ANALYSIS OF UNCERTAIN STRUCTURES USING INTERVAL EIGENVALUE DECOMPOSITION Mehdi Modares Tufts University Robert L. Mullen Case Western Reserve University.
Chapter 4 Motor Control Theories Concept: Theories about how we control coordinated movement differ in terms of the roles of central and environmental.
Motor learning through the combination of primitives. Mussa-Ivaldi & Bizzi Phil.Trans. R. Soc. Lond. B 355:
Against formal phonology (Port and Leary).  Generative phonology assumes:  Units (phones) are discrete (not continuous, not variable)  Phonetic space.
SPARSITY & SPEECH SCIENCE? TOWARDS A DATA-DRIVEN CHARACTERIZATION OF SPEECH MOTOR CONTROL V IKRAM R AMANARAYANAN University of Southern California, Los.
Human Joint Transportation in a Multi-User Virtual Environment Stephan Streuber Astros.
ROBOTICS 01PEEQW Basilio Bona DAUIN – Politecnico di Torino.
PHONETICS AND PHONOLOGY
Physically-Based Motion Synthesis in Computer Graphics
CHAPTER 2 - EXPLICIT TRANSIENT DYNAMIC ANALYSYS
Does Function Follow Form in the TMJ Disc?
Accurate Robot Positioning using Corrective Learning
What is Phonetics? Short answer: The study of speech sounds in all their aspects. Phonetics is about describing speech. (Note: phonetics ¹ phonics) Phonetic.
Why sonority and intra-oral pressure?
Digital Control Systems (DCS)
Digital Control Systems (DCS)
Speech Perception (acoustic cues)
Discrete Controller Synthesis
Motor theory.
Computer Vision in Cell Biology
A Japanese trilogy: Segment duration, articulatory kinematics, and interarticulator programming Anders Löfqvist Haskins Laboratories New Haven, CT.
Eugeniusz Cyran KUL, Lublin
Presentation transcript:

GEPPETO 1 : A modeling approach to study the production of speech gestures Pascal Perrier (ICP – Grenoble) with Stéphanie Buchaillard (PhD) Matthieu Chabanas (ICP) Ma Liang (PhD), Yohan Payan (TIMC – Grenoble) 1 GEstures shaped by the Physics and by a PErceptually oriented Targets Optimization

Outline Introduction Current hypotheses implemented in GEPPETO Some results obtained with a 2D biomechanical tongue model New issues raised by the use of 3D biomechanical tongue model

Basic issues in Speech Production Research Phonology/Phonetics Interface –Link between discrete representations and continuous physical signals –Nature of physical correlates of speech units

Basic issues in Speech Production Research Control and Production of Speech Gestures –Control variables –Central representations of physical characteristics of the speech production apparatus –Interaction Perception-Action

Basic issues in Speech Production Research From Gestures to Speech Sounds –Nature of acoustic sources –Relations between motor commands and acoustics –Interaction between airflow and articulatory gestures.

What is GEPPETO? An evolutive modeling framework to quantitatively test hypotheses about the control and the production of speech gestures. It includes –Hypotheses about the physical correlates of phonological units. –Models of motor control –Physical models of the speech production apparatus

Current Hypotheses Phonology/Phonetic Interface –The smallest phonological unit is the phoneme –Phonemes are associated with target regions in the auditory domain –Larger phonological units are associated with speech sequences for which specific constraints exist for target optimization or for motor commands sequencing

Current Hypotheses Control of speech gestures –Control variables: commands (EP Hypothesis, Feldman, 1966) –No on line use of feedback going through the cortex. –Short-delay orosensory and proprioceptive feedbacks are taken into account. –Existence in the brain of internal representations of the speech apparatus (internal models).

Current Hypotheses Control of speech gestures –Internal representations do not account for the whole physical complexity of the speech production apparatus –Kinematic characteristics are not directly controlled. They are the results of the interaction between motor control setups and physical phenomena of speech production Which characteristics of speech signals are specifically controlled?

Application to the generation of speech gestures with a 2 D biomechanical tongue model Implementation of the model of control Inversion from desired perceptual objectives to motor commands Generation of gestures

2D Biomechanical Model Finite element structure Linear elasticity (small deformations) No account of the gravity

2D Biomechanical Model Posterior genioglossus Anterior Genioglossus Hyoglossus

2D Biomechanical Model VerticalisStyloglossusInferior Longitudinalis

Learning a static internal model From  commands to formants Step 1: - Uniform sampling of the commands space -Generation of the corresponding tongue shapes simulations

Learning a static internal model From commands to formants Step 2: Computation of the area function.

Step 3: Formants computation for 2 lip apertures (red dots: spread lips; blue dots: rounded lips) Learning a static internal model From commands to formants

Step 4: Learning and generalizing with radial basis functions 1 st layer 2 nd layer

Inversion From target regions to commands Target regions for some non rounded French phonemes Target regions Dispersion ellipses in the (F 1, F 2, F 3 ) space Currently defined by F c1, F c2, F c3 and  F1,  F2,  F3

Inversion From target regions to commands Target regions Dispersion ellipses in the (F 1, F 2, F 3 ) space Currently defined by F c1, F c2, F c3 and  F1,  F2,  F3 Target regions for some non rounded French phonemes

Inversion From target regions to commands + Cost for a sequence made of N phonemes with Optimization Cost minimization (Gradient descent technique) Speaker orientedListener oriented

Inversion From target regions to commands Example 1 Sequence [ œ-e-k-i ]

Example 2 Sequence [ œ-e-k-a ] Inversion From target regions to commands

Production of tongue movements from inferred commands Serial command patterns No difference between vowels and consonants [oe] [e] [k] [a]

Execution of tongue movements from inferred commands Öhman’s model: Vowel-to-Vowel basis Consonants are seen as perturbation of V-V [oe] [e] [k] [a]

Observed flesh point Execution of tongue movements from inferred commands

Production of tongue movements from inferred commands Serial command patterns [a] [i]

Production of tongue movements from inferred commands Öhman’s command patterns [a] [i]

R. Houde (1969) [aka][ika] Example: the Articulatory loops Interaction control / physics. Influence on the shapes of the articulatory paths

Fluid-Wall Interaction Forces Mechanics of thetissues. Finiteelement model) Flow model Imposed pressure difference Deformation

[aka] Example: the Articulatory loops Interaction control / physics. Influence on the shapes of the articulatory paths

[aka] No aerodynamicsWith aerodynamics Interaction control / physics. Influence on the shapes of the articulatory paths Example: the Articulatory loops

Deplacement X - Y X - mm Y - mm... PS = 1600 Pa ---- No aerodynamics [ika] Interaction control / physics. Influence on the shapes of the articulatory paths Example: the Articulatory loops

[ika] No aerodynamicsWith aerodynamics Interaction control / physics. Influence on the shapes of the articulatory paths Example: the Articulatory loops

A 3D biomechanical tongue model: For a better account of physics Visible Human Project ® data (Wilhelms-Tricarico, 2003) Finite Element Mesh made of Hexahedres Adaptation of the mesh to a specific speaker (PB) Wilhelms-Tricarico R.,1995 Gerard et al., ICP Grenoble

Inner muscle structure of the tongue Genioglossus (medium)Genioglossus (anterior)Styloglossus GeniohyoidGenioglossus (posterior)HyoglossusVerticalisTransversusInferior longitudinalisMylohyoidSuperior longitudinalis

Vocal tract structure HYOID BONE MANDIBLE PALATE OTHER MUSCLES TONGUE’S BODY

LinearNon Linear Displacement 0 Force Tongue Indentator Elastical properties of tongue muscles Hyperelastic material (2 nd order Yeoh model) with large deformation hypothesis

Effect of gravity [1s]

[300ms] Dealing with gravity with the EP hypothesis

Activation of GGp and MH  Increase of reflex activity [300ms]

Dealing with gravity with the EP hypothesis GGP activation

[300ms] Dealing with gravity with the EP hypothesis Example of a good choice of control parameters

Conclusions A model of control based on perceptual objectives specified in terms of formants target regions associated with motor commands and on an optimization process using a static model of the motor- perception relations can generate realistic speech movements if it is applying to a realistic physical model of speech production.

Conclusions It supports our hypothesis that there is not need to assume the existence of a central optimization process that would apply to the articulatory trajectories in their whole (i.e. minimum of jerk, minimum of torque…)

Conclusions It gives an interesting account of coarticulation phenomena by separating the effects of planning and those of physics. It permits to test hypotheses about the phonological units (see serial model versus Öhman’s model).

Conclusions However a systematic comparison with data is required (currently in progress for French, German, Chinese, Japanese) No account for time control, or for hypo/hyperspeech No account for gravity

Conclusions Necessity to work on a more complex internal representations that would integrate some aspects of articulatory dynamics.

Influence of elasticity modeling Hyperelastic Small defo. Linear Large defo. Linear Activation of the Hyoglossus (2N)

EP Hypothesis (Feldman, 1966) Perrier, Ostry, Laboissière, 1996

EP Hypothesis (Feldman, 1966) Perrier, Ostry, Laboissière, 1996

Static Internal Models Peripheral motor system Formants Direct Model y i (t) Desired formants Inverse Model d Central Nervous System