Download presentation
Presentation is loading. Please wait.
Published byHilda Todd Modified over 9 years ago
2
Un androïde doué de parole A speech-gifted android Institut de la Communication Parlée Laplace
3
Robotic tools (theories, algorithms, paradigms) applied to a human cognitive system (speech) instead of a human “artefact” (a “robot”) The goal of the project Or: study speech as a robotic system (a speaking android)
4
Speech: not an information processing system, but a sensori-motor system plugged on language This system deals with control, learning, inversion, adaptation, multisensoriality, communication … hence robotics!
5
" In studying human intelligence, three common conceptual errors often occur: reliance on monolithic internal models, on monolithic control, and on general purpose processing. A modern understanding of cognitive science and neuroscience refutes these assumptions. « Cog » at MIT (R. Brooks) http://www.ai.mit.edu/projects/cog/methodology.html
6
Our alternative methodology is based on evidence from cognitive science and neuroscience which focus on four alternative attributes which we believe are critical attributes of human intelligence: embodiment and physical coupling, multimodal integration, developmental organization, and social interaction.
7
Talking Cog, a speaking android ICP: Speech modelling, speech robotics Laplace: Bayesian Robotics Austin: Speech ontogenesis
8
« Talking Cog » articulatory model Jaw height Lip protrusion Larynx height Tongue tip Tongue body Tongue dorsum Lip separation
9
[ u ][ i ] [ a ]
10
01000200030004000 5000 -50 0 50 Audition « Talking Cog » sensors Vision Touch Formants F1 F2 F3 F4
11
« Talking Cog » growth
12
Learning: Bayesien inference A sensori-motor agent (M, P) learning sensori-motor relationships through active exploration p (M, P) (M) motor(P) perceptual
13
Acquire controls from percepts : p (M / P) (M) motor(P) perceptual Perceptual input (target ?) ?
14
Regularise percepts from actions : p (P / M) (M) motor(P) perceptual Incomplete perceptual input ?
15
Predict one modality from another one : p (P2 / P1) P1 : orosensorialP2 : audio
16
Coherently fuse two or more modalities : p (M / P1, P2) (P2) (P1) s1s1 s2 (M)
17
The route towards adult speech: learning control 4 mth : vocalisation, imitation 7 mth : jaw cycles (babbling) Later: control of carried articulators (lips, tongue) for vowels and consonants 0 mth : imitation of the three major speech gestures Exploration & imitation
18
First experiment: simulating exploration from 4 to 7 months Phonetic data (sounds and formants) on 4- and 7-months babies ’ vocalisations
19
Acoustical framing F2 F1 Acoustical framing True data F2 F1 F2 Max. acoustical space F1
20
Results High Front Back Low F2 F1 High Front Back Low F2 Pre-babbling (4 months) Central Mid-high Babbling Onset (7 months) Central High-Low Black: android capacities Color: infant productions
21
Articulatory framing Which one is the best? Various sub-models:
22
Method Selection of the BEST M P(M / f1f2) Comparison Theoretical Distribution F2 F1 Real Distribution F2 F1
23
Too restrictedToo wide The best !
24
Results F2 Lips and tongue Pre-babbling (4 months) Lips and tongue + Jaw (J) Babbling Onset (7 months) F1 F2 + J
25
Conclusion I 1. Acoustical framing: cross-validation of the data and model 2. Articulatory framing: articulatory abilities / exploration 4 months: Tongue dorsum / body + Lips 7 months : idem + Jaw 3. More on early sensori-motor maps
26
Second experiment: simulating imitation at 4 months From visuo-motor imitation at 0 months to audiovisuo-motor imitation at 4 months
27
Early vocal imitation [Kuhl & Meltzoff, 1996] Hearing/seing Adult speech [a] [a i u] 3 - 5 months babies About 60% « good responses »
28
Questions 1. Is the imitation process visual, auditory, audio-visual? 2. How much exploration is necessary for imitation? 3. Is it possible to reproduce the experimental pattern of performances ?
29
Testing visual imitation f1 f2 INVERSION 4 mths-model Lips - Tongue Categorisation iau Al_i Al_a Al_u Lip area Al
30
Visual imitation: simulation results Experimental data Productions Total Simulation data Al uia Total Experimental data do not concord with visual imitation response profiles
31
Vocal tract Xh, Yh Al Lh Tb Td Articulatory inputs Xh Yh Al Intermediary control variables F1 F2 Auditory outputs Testing audio imitation The three intermediary control variables correspond to crucial parameters for control, connected to orosensorial channels, and able to simplify the control for the 7-parameters articulatory model
32
Articulatory variables : Lh, Tb & Td -> Gaussian Control variables : Xh, Yh & Al -> Laplace Auditory variables : F1 & F2 -> Gaussian Parametrisation and decomposition P (Xh Yh Al ) = P (Xh) * P (Yh) * P(Al) P (Lh Tb Td F1 F2 / Xh Yh Al) = P (Lh Tb Td / Xh Yh Al)* P (F1 F2/ Xh Yh Al) P (Lh / Xh Yh Al) = P (Lh / Al) P (Tb Td / Xh Yh Al) = P (Tb Td / Xh Yh) P ( Lh Tb Td Xh Yh Al F1 F2 ) Joint probability :
33
Dependance Structure = P (Xh) * P (Yh) * P(Al) * P (Lh / Al)* P(Tb / Xh Yh)*P(Td / Xh Yh Tb) * P (F1 / Xh Yh Al) * P (F2 / Xh Yh Al) P ( Lh Tb Td Xh Yh Al F1 F2 ) Learning Description of the sensori-motor behaviour
34
The challenge From the exploration defined by Exp. 1, what is the amount of data (self-vocalisations) necessary for learning enough to produce 60% correct responses in Exp. 2? The idea If your amount of learning data is small, the discretisation of your control space should be rough
35
Inversion results 4 32 256 2048 Size of the control space Size of the learning space RMS Audio error (F1, F2) of the inversion process (Bark) 4 32 256 2048
36
322564random Optimal learning space size vs. control space size Size of the control space Size of the learning space
37
Simulating audio-motor imitation F1 F2 a ui Audio targets [i a u] F12_i F12_a F12_u f1 f2 INVERSION 4 mths-model Lips - Tongue Categorisation iau
38
Simulation results 4 32 256 2048 infants
39
Results Réalité Cibles Audio-Visuels Productions Simulations Cibles Auditives connues
40
Conclusion II 1. 10 to 30 vocalisations are enough for an infant to learn to produce 60% good vocalisations in the audio-imitation paradigm! 2. Three major factors intervene in the baby android performances : learning size, control size, and variance distribution in the learning set (not shown here)
41
Final conclusions and perspectives 1. Some of the exploration and imitation of human babies reproduced by their android cousins (Feasibility / Understanding) 2. The developmental path must be further explored, and the baby android must be questioned about what it really learned, and what it can do at the output of the learning process
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.