VoiceUNL : a proposal to represent speech control mechanisms within the Universal Networking Digital Language Mutsuko Tomokiyo (GETA-CLIPS-IMAG) & Gérard.

1 VoiceUNL : a proposal to represent speech control mechanisms within the Universal Networking Digital Language Mutsuko Tomokiyo (GETA-CLIPS-IMAG) & Gérard Chollet (ENST) reviewed by Christian Boitet (GETA-CLIPS-IMAG)

2 Content Background of this work Proposal of extension of UNL - Speech to speech MT Emotion representation

3 Background Normalangue - normalization of linguistic resources (2002) TECHNOVOC and RNIL (2002) - normalization of technologies applied in the domain of the engineering of written and spoken language, SIEMENS, TELISMA, IDYLIC, DIALOCA, ELAN Speech, ST Microelectronics, LORIA, ENST Paris Lingtour (2002) - multilingual-multimedia MT, TsingHua University (China), Paris 8 University (France), INT (France), ENST-Paris and Bretagne (France) and CLIPS (France)

4 Extension of UNL (1) Example :- May I smoke? - No! You may not, Victor. [S:01] {org:e1} - May I smoke? {/org} {unl} agt(smoke(icl>do).@entry.@present.@may.@interrogative, I) {/unl} [/S] [1]

5 Extension of UNL (2) [S:02] {org:e2} - No! you may not, Victor [arte] {/org} {unl} agt(smoke(icl>do).@entry.@present.@may.@not, you) mod(smoke(icl>do).@entry.@present.@may.@not, no) mod(no, !(icl>symbol).@interjection) mod(smoke(icl>do).@entry.@present.@may.@not, Victor(icl>name).@vocative) {/unl} [/S]

6 Speech to Speech Machine Translation (SSMT) 1. Speaker recognition 2. Gestures, facial movement and speech recognition 3. Transcription and text transfer (UNL) 4. Target language generation (Ariane-G5) 5. Voice, speech, gestures synthesis [Furui,03,Blanchon,02]

7 Emotion representation (1) Classification of emotions : (1) happiness, (2) sadness, (3) disgust, (4) surprise, (5) fear, (6) anger, (7) irritation, (8) hesitation, (9) uncertainty, (10) neutral [morita,89; Ekman,79, 03; OOC,90; ESPIRE, 00]

8 Emotion representation (2) Emotion eliciting factors and task facets in SSMT: lexicon (sad, happy, etc) phatics (ah, hein, etc.) prosodies (fast, slow, strong, etc.) voice (noisy, soft, young, etc.) gestures (movements of hands, mouth, eyes, etc.)

9 Emotion representation (3) 1 2 3 4 5 6 7 8 9 10 lexicon * * * * * * * * * * phatics * * * * * * * * * * prosodies * * * * * * * * * voice * * * * * * * * hands * * mouth * eyes * * * * * * eyebrows * * * * head * shoulders * * *

10 Emotion representation (4) Speaker recognition and voice synthesis : gender, age, Variant (natural, artificial, etc.), voice name (high-pitched, husky, etc.) [BMC,02; W3C rec, 02]

11 Emotion representation (5) Prosody : Pitch : x-high, high, medium, low, x-low, default Range, Rate : x-fast, fast, medium, slow, x-slow, default Duration, Volume : silent, x-soft, soft, medium, loud, x- loud, default Emphasis, Break [BMC,02; W3C rec.,02]

12 Emotion representation (6) Lexicon and Speech acts : Inform, Offer, Offer-follow-up, Promise, Yn-question, Action-request, Confirmation-question, Do-you- understand-question, Permission-request, Wh-question, Yes, No, Acknowledge, Thanks, Thanks-response, Farewell, Good-wishes, Good-wishes-response, Greet, Apology, Apology-response, Alert, Instruct, Confirmation- question-to-self, Invite, Vocative, Topic, Expressive [tomokiyo, 00]

13 Emotion representation (7) Facial movements : left, right, up, down mouth eyes eyebrows Body movements : left, right, up, down hands shoulder heads [ACE, 02; BMC, 02; MPEG-4, 00]

14 Emotion representation (1)

15 --> May I smoke? agt(smoke(icl>do).@entry.@present.@may.@interrogative, I) type=”Yn-question” may I smoke ?

16 Emotion representation (2)

17 --> No!, you may not, Victor. aoj(smoke(icl>do).@entry.@present.@may.@not, you) mod(smoke(icl>do).@entry.@present.@may.@not, no) mod(no, !(icl>symbol)) mod(smoke(icl>do).@entry.@present.@may.@not, Victor(icl>name).@vocative) type=”Expressive” No! type=”Inform” you may not, type=”Vocative” Victor No! you may not Victor type =”surprise” lexicon=”No!” eyebrows=”left-and right raised” No! you may not

18 Reflections and Next step Extension of UNL –from written text processing to SSMT in multimodality and multilingualism, focussing on emotion representation Visual corpus development Development of a prototype with speech and image interface


