Presentation is loading. Please wait.

Presentation is loading. Please wait.

Nespole!’s Experiment on Multimodality (Summer 2001) Erica Costantini (University of Trieste) Fabio Pianesi (ITC-irst, Trento) Susanne Burger (CMU)

Similar presentations


Presentation on theme: "Nespole!’s Experiment on Multimodality (Summer 2001) Erica Costantini (University of Trieste) Fabio Pianesi (ITC-irst, Trento) Susanne Burger (CMU)"— Presentation transcript:

1 Nespole!’s Experiment on Multimodality (Summer 2001) Erica Costantini (University of Trieste) Fabio Pianesi (ITC-irst, Trento) Susanne Burger (CMU)

2 Testing hypothesis concerning the added value of multimodality in a ‘true’ speech-to- speech translation environment Usability evaluation of the Nespole! System Experiment Objectives E. Costantini, S. Burger, and F. Pianesi, NESPOLE! Multilingual and Multimodal Corpus, in Proceedings of LREC 2002, Las Palmas, Spain. E. Costantini, F. Pianesi, S. Burger. The Added Value of multimodality in the NESPOLE! Speech-to-Speech Translation System: an Experimental Study. To be published in proceedings of ICMI 2002. S. Burger, E. Costantini, and F. Pianesi, NESPOLE! Deliverable D5 - Study on Multimodality, part 1, 2002. In NESPOLE! Project website:.

3 Multimodality Speech + Pen-based Gestures

4 Multimodality Previous Research  The advantages of multimodal input over spoken input includes fewer errors, fewer spontaneous disfluences, briefer and less complex language, greater satisfaction  Multimodal interaction occurs more frequently in case of spatial location commands  When combined with spoken input, pen-based input can disambiguate badly understood sentences S. L.. Oviatt, “Multimodal interactive maps: Designing for human performance”, Human-Computer Interaction, 1997, pp. 93-129 (special issue on "Multimodal interfaces"). S. L.. Oviatt, A. De Angeli, and K. Kuhn, “Integration and synchronization of input modes during multimodal human computer interaction”, in Proc. of CHI ‘97. ACM Press, New York, 1997, pp. 415-422. S. L.. Oviatt, “Mutual disambiguation of recognition errors in a multimodal architecture”, in Proc. of CHI '99, ACM Press, New York, 1999, pp. 576-583.

5 Multimodality Our Hypothesis  Multimodality can increase the probability of successful interaction even with prototypes of “real” multilingual systems, when spatial information is the focus of the communicative exchange  Multimodality can support a faster recovery from recognition and translation errors.

6 The Experiment - Method Experimental Design Comparison between the performances of two versions of the system:  SO (Speech-Only) version Multilingua Multimedia Only spoken input  MM (Multimodal) version: Multilingua Multimedia Spoken and pen-based input

7 The Experiment - Method Participants  CUSTOMERS: 28 14 English and 14 German speakers similar level of computer literacy and web expertise paid volunteers  AGENTS: 7 Italian speakers trained to act as Trentino tourist board agents researchers univolved with Nespole! Project  All participants: same level of computer literacy and web expertise sex balanced across conditions and languages

8 The Experiment - Method Customers and agents received written instructions concerning the system and the task.  Customer’s task: to ask for information in order to choose an appropriate location and a hotel within constraints specified a priori.  Agent’s task: to provide the necessary information on the basis of the available materials an following the instructions. Task and Instructions

9 The Experiment - Data Recordings and Transcriptions  56 audio stereo files were recorded  Dialogue transcriptions have been done: in accordance to VERBMOBIL conventions using the TransEdit annotation tool  Besides orthographic words, dialogue transcriptions contain: annotations for spontaneous phenomena annotations for gestures (as a three-line comment added at the end of the corresponding turn)

10 Italian agent German customer English customer Turns per dialogue373933 Tokens per dialogue258254218 Types per dialogue10110382 Tokens per turn6.986.506.60 The Experiment - Data Spoken input  Average values and variance for turns, tokens, types, disfluences and spontaneous phenomena are similar for agents and customers and across Languages and experimental conditions (SO and MM).  Average dialogue duration: 35 minutes (Range 19–59 minutes)

11 The Experiment - Data Annotation for Gestures Gesture annotation include:  gesture identification: Progressive number User (agent or customer) Temporal relationship with the spoken turn (before, during, after)  Gesture description: (based on the used White Board commands)  Gesture goal: selection pointing connection words

12 The Experiment - Data Pen-Based Input  Low number of performed gestures (one every 8 speech turns)  Almost all gestures were performed by the agents (98,1%)  Most of the drawing gesture (79%) were performed immediately before speech (no one during the speech).  due to push-to-talk procedure?  Few or no deictics were used  due to time-lag between speech and gestures?

13 Drawings % on allMode % on class Selection61% free-hand65% elliptical31% rectangular4% Pointing19%free-hand100% Connection 12% free-hand47% line53% Words8%free-hand100% The Experiment - Data Pen-Based Input

14 The Experiment - Data Transcriptions alignment The two halves of each dialogue transcriptions were aligned in order to:  compare original and translated turns;  classify turns into: Successful turns (good translation) Partially successful turns (poor or bad translation, but the target could react properly) Non-successful turns (incomprehensible output)

15 The Experiment - Results Turn Successfulness  There is a tendency towards fewer non-successful turns in spatial turns all turns legal turns spatial turns Successful turns29%31%29% Partially successful turns32%35%41% Non-successful turns39%34%29%

16 The Experiment - Results Turn Successfulness  There is a tendency towards less non-successful turns in MM condition  The tendency is more evident in case of spatial turns Percentages of non-successful turns Eng. SO Eng. MM Ger. SO Ger. MM All turns33%27%34% Legal turns33%26%29%25% Spatial turns30%19%31%18%

17 The Experiment - Results Turn Repetitions  Each repeated turn was repeated on average 2 times  There is a tendency towards less repeated turns in MM condition  The tendency is more evident in case of spatial turns Percentages of repeated turns Eng. SO Eng. MM Ger. SO Ger. MM All turns16% 20%18% Legal turns15% 20%17% Spatial turns17%11%23%18%

18 The Experiment - Results Other results  86% of achieved goals: (customers chose a target hotel)  Ambiguities: Fewer dialogue segments containing ambiguities in MM conditions (3 vs. 7); In addition the misunderstandings in MM condition were immediately solved by resorting to MM resources  Strong preference for MM

19 The Experiment - Conclusions Summary of the results  Multimodality seems not to affect the dialogue length, the number of turns and words and the number of disfluences and spontaneous phenomena.  Multimodality seems to enhance dialogue effectiveness: when spatial information is conveyed, it is better in decreasing the number of non-successful turns and repetitions; it helps in solving misunderstandings; it provides for a better dialogue fluency; it is preferred by users.

20 The Experiment - Conclusions Discussion  We used a real system prototype instead of applying a Wizard of Oz procedure (weaker evidences)  The task was the best compromise between the system capabilities and the need to provide for true pen-based gestures  The Nespole! Corpus is a relevant source of information in the field of multilingual and multimodal interaction in realistic scenario


Download ppt "Nespole!’s Experiment on Multimodality (Summer 2001) Erica Costantini (University of Trieste) Fabio Pianesi (ITC-irst, Trento) Susanne Burger (CMU)"

Similar presentations


Ads by Google