Presentation is loading. Please wait.

Presentation is loading. Please wait.

Un androïde doué de parole A speech-gifted android Institut de la Communication Parlée Laplace.

Similar presentations


Presentation on theme: "Un androïde doué de parole A speech-gifted android Institut de la Communication Parlée Laplace."— Presentation transcript:

1

2 Un androïde doué de parole A speech-gifted android Institut de la Communication Parlée Laplace

3 Robotic tools (theories, algorithms, paradigms) applied to a human cognitive system (speech) instead of a human “artefact” (a “robot”) The goal of the project Or: study speech as a robotic system (a speaking android)

4 Speech: not an information processing system, but a sensori-motor system plugged on language This system deals with control, learning, inversion, adaptation, multisensoriality, communication … hence robotics!

5 " In studying human intelligence, three common conceptual errors often occur: reliance on monolithic internal models, on monolithic control, and on general purpose processing. A modern understanding of cognitive science and neuroscience refutes these assumptions. « Cog » at MIT (R. Brooks) http://www.ai.mit.edu/projects/cog/methodology.html

6 Our alternative methodology is based on evidence from cognitive science and neuroscience which focus on four alternative attributes which we believe are critical attributes of human intelligence: embodiment and physical coupling, multimodal integration, developmental organization, and social interaction.

7 Talking Cog, a speaking android ICP: Speech modelling, speech robotics Laplace: Bayesian Robotics Austin: Speech ontogenesis

8 « Talking Cog » articulatory model Jaw height Lip protrusion Larynx height Tongue tip Tongue body Tongue dorsum Lip separation

9 [ u ][ i ] [ a ]

10 01000200030004000 5000 -50 0 50 Audition « Talking Cog » sensors Vision Touch Formants F1 F2 F3 F4

11 « Talking Cog » growth

12 Learning: Bayesien inference A sensori-motor agent (M, P) learning sensori-motor relationships through active exploration p (M, P) (M) motor(P) perceptual

13 Acquire controls from percepts : p (M / P) (M) motor(P) perceptual Perceptual input (target ?) ?

14 Regularise percepts from actions : p (P / M) (M) motor(P) perceptual Incomplete perceptual input ?

15 Predict one modality from another one : p (P2 / P1) P1 : orosensorialP2 : audio

16 Coherently fuse two or more modalities : p (M / P1, P2) (P2) (P1) s1s1 s2 (M)

17 The route towards adult speech: learning control 4 mth : vocalisation, imitation 7 mth : jaw cycles (babbling) Later: control of carried articulators (lips, tongue) for vowels and consonants 0 mth : imitation of the three major speech gestures Exploration & imitation

18 First experiment: simulating exploration from 4 to 7 months Phonetic data (sounds and formants) on 4- and 7-months babies ’ vocalisations

19 Acoustical framing F2 F1 Acoustical framing True data F2 F1 F2 Max. acoustical space F1

20 Results High Front Back Low F2 F1 High Front Back Low F2 Pre-babbling (4 months) Central Mid-high Babbling Onset (7 months) Central High-Low Black: android capacities Color: infant productions

21 Articulatory framing Which one is the best? Various sub-models:

22 Method Selection of the BEST M P(M / f1f2) Comparison Theoretical Distribution F2 F1 Real Distribution F2 F1

23 Too restrictedToo wide The best !

24 Results F2 Lips and tongue Pre-babbling (4 months) Lips and tongue + Jaw (J) Babbling Onset (7 months) F1 F2 + J

25 Conclusion I 1. Acoustical framing: cross-validation of the data and model 2. Articulatory framing: articulatory abilities / exploration 4 months: Tongue dorsum / body + Lips 7 months : idem + Jaw 3. More on early sensori-motor maps

26 Second experiment: simulating imitation at 4 months From visuo-motor imitation at 0 months to audiovisuo-motor imitation at 4 months

27 Early vocal imitation [Kuhl & Meltzoff, 1996] Hearing/seing Adult speech [a] [a i u] 3 - 5 months babies About 60% « good responses »

28 Questions 1. Is the imitation process visual, auditory, audio-visual? 2. How much exploration is necessary for imitation? 3. Is it possible to reproduce the experimental pattern of performances ?

29 Testing visual imitation f1 f2 INVERSION 4 mths-model Lips - Tongue Categorisation iau Al_i Al_a Al_u Lip area Al

30 Visual imitation: simulation results Experimental data Productions Total Simulation data Al uia Total Experimental data do not concord with visual imitation response profiles

31 Vocal tract Xh, Yh Al Lh Tb Td Articulatory inputs Xh Yh Al Intermediary control variables F1 F2 Auditory outputs Testing audio imitation The three intermediary control variables correspond to crucial parameters for control, connected to orosensorial channels, and able to simplify the control for the 7-parameters articulatory model

32 Articulatory variables : Lh, Tb & Td -> Gaussian Control variables : Xh, Yh & Al -> Laplace Auditory variables : F1 & F2 -> Gaussian Parametrisation and decomposition P (Xh  Yh  Al ) = P (Xh) * P (Yh) * P(Al) P (Lh  Tb  Td  F1  F2 / Xh  Yh  Al) = P (Lh  Tb  Td / Xh  Yh  Al)* P (F1  F2/ Xh  Yh  Al) P (Lh / Xh  Yh  Al) = P (Lh / Al) P (Tb  Td / Xh  Yh  Al) = P (Tb  Td / Xh  Yh) P ( Lh  Tb  Td  Xh  Yh  Al  F1  F2 ) Joint probability :

33 Dependance Structure = P (Xh) * P (Yh) * P(Al) * P (Lh / Al)* P(Tb / Xh  Yh)*P(Td / Xh  Yh  Tb) * P (F1 / Xh  Yh  Al) * P (F2 / Xh  Yh  Al) P ( Lh  Tb  Td  Xh  Yh  Al  F1  F2 ) Learning Description of the sensori-motor behaviour

34 The challenge From the exploration defined by Exp. 1, what is the amount of data (self-vocalisations) necessary for learning enough to produce 60% correct responses in Exp. 2? The idea If your amount of learning data is small, the discretisation of your control space should be rough

35 Inversion results 4 32 256 2048 Size of the control space Size of the learning space RMS Audio error (F1, F2) of the inversion process (Bark) 4 32 256 2048

36 322564random Optimal learning space size vs. control space size Size of the control space Size of the learning space

37 Simulating audio-motor imitation F1 F2 a ui Audio targets [i a u] F12_i F12_a F12_u f1 f2 INVERSION 4 mths-model Lips - Tongue Categorisation iau

38 Simulation results 4 32 256 2048 infants

39 Results Réalité Cibles Audio-Visuels Productions Simulations Cibles Auditives connues

40 Conclusion II 1. 10 to 30 vocalisations are enough for an infant to learn to produce 60% good vocalisations in the audio-imitation paradigm! 2. Three major factors intervene in the baby android performances : learning size, control size, and variance distribution in the learning set (not shown here)

41 Final conclusions and perspectives 1. Some of the exploration and imitation of human babies reproduced by their android cousins (Feasibility / Understanding) 2. The developmental path must be further explored, and the baby android must be questioned about what it really learned, and what it can do at the output of the learning process


Download ppt "Un androïde doué de parole A speech-gifted android Institut de la Communication Parlée Laplace."

Similar presentations


Ads by Google