MUSCLE Multimodal e-team related activity Technical University of Crete Speech Processing and Dialog Systems Group Presenter: Prof. Alex Potamianos Technical University of Crete Speech Processing and Dialog Systems Group Presenter: Prof. Alex Potamianos
Goals Develop domain-independent algorithms and tools for rapid development by non-experts of state-of-the-art multi-modal dialogue systems Investigate the optimal modality mix (optimal = maximize UI efficiency and user satisfaction) Demonstrate the synergies between modalities and built a state-of-the-art MM-UI module
Multi-Modal User Interface Emphasis on synergies between modalities: Value(s) of attributes are displayed graphically Erroneous values can be easily corrected via the GUI Focus (aka context) of speech modality is highlighted Position and value ambiguity are shown (and typically resolved) via the GUI Voice prompts are significantly shorter GUI takes full advantage of intelligence of voice UI Three interaction modes implemented: click-to-talk, open-mike and modality selection
GUI examples Button Disabled
GUI Ambiguity Resolution
Click-to-Talk Examples Click to Talk Speech Interface Enabled GUI Disabled Beginning of Next Turn GUI Enabled
Open-Mike Examples Waiting for input via Speech or GUI (mouse and keyboard) Speech has been detected Beginning of Next turn
Modality Tracking Examples Click To Talk Mode Open Mike Mode
Experiments 15 naïve non-native users with varying level of English language knowledge and accent Application: form-filling, travel reservation (flight, hotel, car) 5 scenarios: one/two/three leg flight, round-trip flight with car, round-trip with hotel 5 systems: speech, GUI, click-to-talk, open-mike, modality selection 5x5 = 25 runs per user Scenarios and system tested in random order
Results: Objective Metrics
Results: Subjective Metrics
Conclusions UI efficiency (task completion, task duration) and subjective metrics : GUI-only is the most efficient mode Speech-only is the least efficient mode No differences in efficiency among the three multi-modal modes Repeating experiments on PDA Different ASR recognition rates Different ASR recognition speed