Un androïde doué de parole A speech-gifted android Institut de la Communication Parlée Laplace.

Slides:



Advertisements
Similar presentations
A. Hatzis, P.D. Green, S. Howard (1) Optical Logo-Therapy (OLT) : Visual displays in practical auditory phonetics teaching. Introduction What.
Advertisements

Cognitive Systems, ICANN panel, Q1 What is machine intelligence, as beyond pattern matching, classification and prediction. What is machine intelligence,
CSCTR Session 11 Dana Retová.  Start bottom-up  Create cognition based on sensori-motor interaction ◦ Cohen et al. (1996) – Building a baby ◦ Cohen.
From Resonance to Vowels March 8, 2013 Friday Frivolity Some project reports to hand back… Mystery spectrogram reading exercise: solved! We need to plan.
“Connecting the dots” How do articulatory processes “map” onto acoustic processes?
Basic Spectrogram & Clinical Application Lab 9. Spectrographic Features of Vowels n 1st formant carries much information about manner of articulation.
A two dimensional kinematic mapping between speech acoustics and vocal tract configurations : WISP A.Hatzis, P.D.Green1 History of Vowel.
Acoustic Characteristics of Vowels
Chapter Thirteen Conclusion: Where We Go From Here.
Copyright © 2009 Pearson Education Canada6-1 Chapter 6: Cognition in Infants and Toddlers 6.1 Piaget’s Theory 6.2 Information Processing 6.3 Language MODULES.
Infant sensitivity to distributional information can affect phonetic discrimination Jessica Maye, Janet F. Werker, LouAnn Gerken A brief article from Cognition.
Infants - Intellectual Development. Intellectual Development I.D. is how people learn, what they learn and how they express what they know through language.
Speech perception 2 Perceptual organization of speech.
Speech Science XII Speech Perception (acoustic cues) Version
“Speech and the Hearing-Impaired Child: Theory and Practice” Ch. 13 Vowels and Diphthongs –Vowels are formed when sound produced at the glottal source.
Every child talking Nursery Clusters. Supporting speech, language and communication skills Nursery Clusters Cluster 3 Expressive Language.
Charles Spence Department of Experimental Psychology, Oxford University New Perspectives from the Human Sciences Consumer Focus Workshop (November, 2001)
Speech Group INRIA Lorraine
Speech sounds Articulation.
Yiannis Demiris and Anthony Dearden By James Gilbert.
Describing the sounds of language
True music must repeat the thought and inspirations of the people and the time. My people are children and my time is today.
What is Phonetics? Short answer: The study of speech sounds in all their aspects. Phonetics is about describing speech. (Note: phonetics ¹ phonics) Phonetic.
What is Cognitive Science? … is the interdisciplinary study of mind and intelligence, embracing philosophy, psychology, artificial intelligence, neuroscience,
Simulation Models as a Research Method Professor Alexander Settles.
CSE 515 Statistical Methods in Computer Science Instructor: Pedro Domingos.
Biointelligence Laboratory School of Computer Science and Engineering Seoul National University Cognitive Robots © 2014, SNU CSE Biointelligence Lab.,
The Description of Speech
IE 594 : Research Methodology – Discrete Event Simulation David S. Kim Spring 2009.
The biological basis of bird song production. Bird song facts: 1.There are about 4000 species of song birds each of which usually produce 1 to many bird.
Speech Perception 4/6/00 Acoustic-Perceptual Invariance in Speech Perceptual Constancy or Perceptual Invariance: –Perpetual constancy is necessary, however,
INFANCY Cognitive and Language Development. Cognitive Development.
Language PERTEMUAN Communication Psycholinguistics –study of mental processes and structures that underlie our ability to produce and comprehend.
Break-out Group # D Research Issues in Multimodal Interaction.
Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto.
Babies and Computers Are They Related? – Abel Nyamapfene.
Beyond Gazing, Pointing, and Reaching A Survey of Developmental Robotics Authors: Max Lungarella, Giorgio Metta.
Intellectual Development of the Infant
IRCS/CCN Summer Workshop June 2003 Speech Recognition.
Speech Science IX How is articulation organized? Version WS
Assessment of Phonology
Sensation & Perception
Speech Science IX How is articulation organized?.
Sounds and speech perception Productivity of language Speech sounds Speech perception Integration of information.
Acoustic Continua and Phonetic Categories Frequency - Tones.
Introduction to Language Phonetics 1. Explore the relationship between sound and spelling Become familiar with International Phonetic Alphabet (IPA )
A Psycholinguistic Perspective on Child Phonology Sharon Peperkamp Emmanuel Dupoux Laboratoire de Sciences Cognitives et Psycholinguistique, EHESS-CNRS,
Chapter 8. Learning of Gestures by Imitation in a Humanoid Robot in Imitation and Social Learning in Robots, Calinon and Billard. Course: Robots Learning.
Lecture 1 Phonetics – the study of speech sounds
Investigating the basis for conversation between human and robot Experiments using natural, spontaneous speech, speaking to the robot as if it were a small.
Intellectual Development of the Infant
Alternative Essences of Intelligence AIII98. Typical AI System Relies on uniform, explicit, internal representations of capabilities of the system, the.
Bridging the gap between L2 speech perception research and phonological theory Paola Escudero & Paul Boersma (March 2002) Presented by Paola Escudero.
© Goodheart-Willcox Co., Inc. 9 Intellectual Development of the Infant.
Three perspectives of language development Behaviorist Nativist Interactionist.
WP6 Emotion in Interaction Embodied Conversational Agents WP6 core task: describe an interactive ECA system with capabilities beyond those of present day.
GEPPETO 1 : A modeling approach to study the production of speech gestures Pascal Perrier (ICP – Grenoble) with Stéphanie Buchaillard (PhD) Matthieu Chabanas.
V k equals the vector difference between the object and the block across the first and last frames in the image sequence or more formally: Toward Learning.
Infancy & Childhood. Infancy and Childhood When you are finished with the test, read the case study on page 69 and answer the questions at the end of.
Brain Mechanisms in Early Language Acquisition
Acoustic to Articoulatory Speech Inversion by Dynamic Time Warping
Overview of Year 1 Progress Angelo Cangelosi & ITALK team
Phonetics Lauren Dobbs.
Copyright © American Speech-Language-Hearing Association
Show and Tell: imitation by an 10-minute-old
What is Phonetics? Short answer: The study of speech sounds in all their aspects. Phonetics is about describing speech. (Note: phonetics ¹ phonics) Phonetic.
Chapter 2 Phonology.
Speech Perception (acoustic cues)
Motor theory.
Quaid –e- azam university
Presentation transcript:

Un androïde doué de parole A speech-gifted android Institut de la Communication Parlée Laplace

Robotic tools (theories, algorithms, paradigms) applied to a human cognitive system (speech) instead of a human “artefact” (a “robot”) The goal of the project Or: study speech as a robotic system (a speaking android)

Speech: not an information processing system, but a sensori-motor system plugged on language This system deals with control, learning, inversion, adaptation, multisensoriality, communication … hence robotics!

" In studying human intelligence, three common conceptual errors often occur: reliance on monolithic internal models, on monolithic control, and on general purpose processing. A modern understanding of cognitive science and neuroscience refutes these assumptions. « Cog » at MIT (R. Brooks)

Our alternative methodology is based on evidence from cognitive science and neuroscience which focus on four alternative attributes which we believe are critical attributes of human intelligence: embodiment and physical coupling, multimodal integration, developmental organization, and social interaction.

Talking Cog, a speaking android ICP: Speech modelling, speech robotics Laplace: Bayesian Robotics Austin: Speech ontogenesis

« Talking Cog » articulatory model Jaw height Lip protrusion Larynx height Tongue tip Tongue body Tongue dorsum Lip separation

[ u ][ i ] [ a ]

Audition « Talking Cog » sensors Vision Touch Formants F1 F2 F3 F4

« Talking Cog » growth

Learning: Bayesien inference A sensori-motor agent (M, P) learning sensori-motor relationships through active exploration p (M, P) (M) motor(P) perceptual

Acquire controls from percepts : p (M / P) (M) motor(P) perceptual Perceptual input (target ?) ?

Regularise percepts from actions : p (P / M) (M) motor(P) perceptual Incomplete perceptual input ?

Predict one modality from another one : p (P2 / P1) P1 : orosensorialP2 : audio

Coherently fuse two or more modalities : p (M / P1, P2) (P2) (P1) s1s1 s2 (M)

The route towards adult speech: learning control 4 mth : vocalisation, imitation 7 mth : jaw cycles (babbling) Later: control of carried articulators (lips, tongue) for vowels and consonants 0 mth : imitation of the three major speech gestures Exploration & imitation

First experiment: simulating exploration from 4 to 7 months Phonetic data (sounds and formants) on 4- and 7-months babies ’ vocalisations

Acoustical framing F2 F1 Acoustical framing True data F2 F1 F2 Max. acoustical space F1

Results High Front Back Low F2 F1 High Front Back Low F2 Pre-babbling (4 months) Central Mid-high Babbling Onset (7 months) Central High-Low Black: android capacities Color: infant productions

Articulatory framing Which one is the best? Various sub-models:

Method Selection of the BEST M P(M / f1f2) Comparison Theoretical Distribution F2 F1 Real Distribution F2 F1

Too restrictedToo wide The best !

Results F2 Lips and tongue Pre-babbling (4 months) Lips and tongue + Jaw (J) Babbling Onset (7 months) F1 F2 + J

Conclusion I 1. Acoustical framing: cross-validation of the data and model 2. Articulatory framing: articulatory abilities / exploration 4 months: Tongue dorsum / body + Lips 7 months : idem + Jaw 3. More on early sensori-motor maps

Second experiment: simulating imitation at 4 months From visuo-motor imitation at 0 months to audiovisuo-motor imitation at 4 months

Early vocal imitation [Kuhl & Meltzoff, 1996] Hearing/seing Adult speech [a] [a i u] months babies About 60% « good responses »

Questions 1. Is the imitation process visual, auditory, audio-visual? 2. How much exploration is necessary for imitation? 3. Is it possible to reproduce the experimental pattern of performances ?

Testing visual imitation f1 f2 INVERSION 4 mths-model Lips - Tongue Categorisation iau Al_i Al_a Al_u Lip area Al

Visual imitation: simulation results Experimental data Productions Total Simulation data Al uia Total Experimental data do not concord with visual imitation response profiles

Vocal tract Xh, Yh Al Lh Tb Td Articulatory inputs Xh Yh Al Intermediary control variables F1 F2 Auditory outputs Testing audio imitation The three intermediary control variables correspond to crucial parameters for control, connected to orosensorial channels, and able to simplify the control for the 7-parameters articulatory model

Articulatory variables : Lh, Tb & Td -> Gaussian Control variables : Xh, Yh & Al -> Laplace Auditory variables : F1 & F2 -> Gaussian Parametrisation and decomposition P (Xh  Yh  Al ) = P (Xh) * P (Yh) * P(Al) P (Lh  Tb  Td  F1  F2 / Xh  Yh  Al) = P (Lh  Tb  Td / Xh  Yh  Al)* P (F1  F2/ Xh  Yh  Al) P (Lh / Xh  Yh  Al) = P (Lh / Al) P (Tb  Td / Xh  Yh  Al) = P (Tb  Td / Xh  Yh) P ( Lh  Tb  Td  Xh  Yh  Al  F1  F2 ) Joint probability :

Dependance Structure = P (Xh) * P (Yh) * P(Al) * P (Lh / Al)* P(Tb / Xh  Yh)*P(Td / Xh  Yh  Tb) * P (F1 / Xh  Yh  Al) * P (F2 / Xh  Yh  Al) P ( Lh  Tb  Td  Xh  Yh  Al  F1  F2 ) Learning Description of the sensori-motor behaviour

The challenge From the exploration defined by Exp. 1, what is the amount of data (self-vocalisations) necessary for learning enough to produce 60% correct responses in Exp. 2? The idea If your amount of learning data is small, the discretisation of your control space should be rough

Inversion results Size of the control space Size of the learning space RMS Audio error (F1, F2) of the inversion process (Bark)

322564random Optimal learning space size vs. control space size Size of the control space Size of the learning space

Simulating audio-motor imitation F1 F2 a ui Audio targets [i a u] F12_i F12_a F12_u f1 f2 INVERSION 4 mths-model Lips - Tongue Categorisation iau

Simulation results infants

Results Réalité Cibles Audio-Visuels Productions Simulations Cibles Auditives connues

Conclusion II to 30 vocalisations are enough for an infant to learn to produce 60% good vocalisations in the audio-imitation paradigm! 2. Three major factors intervene in the baby android performances : learning size, control size, and variance distribution in the learning set (not shown here)

Final conclusions and perspectives 1. Some of the exploration and imitation of human babies reproduced by their android cousins (Feasibility / Understanding) 2. The developmental path must be further explored, and the baby android must be questioned about what it really learned, and what it can do at the output of the learning process