Presentation is loading. Please wait.

Presentation is loading. Please wait.

Speech & Multimodal Scott Klemmer · 16 November 2006.

Similar presentations

Presentation on theme: "Speech & Multimodal Scott Klemmer · 16 November 2006."— Presentation transcript:

1 Speech & Multimodal Scott Klemmer · 16 November 2006

2 Some hci definitions Multimodal generally refers to an interface that can accept input from two or more combined modes Multimedia generally refers to an interface that produces output in two or more modes The vast majority of multimodal systems have been speech + pointing (pen or mouse) input, with graphical (and sometimes voice) output

3 Canonical App: Maps Why are maps so well-suited?
A visual artifact for computation (Hutchins)

4 What is an interface Is it an interface if there’s no method for a user to tell if they’ve done something? What might an example be? Is it an interface if there’s no method for explicit user input? example: health monitoring apps

5 Sensor Fusion multimodal = multiple human channels
sensor fusion = multiple sensor channels Example app: Tracking people (1 human channel) might use: RFID + vision + keyboard activity + … I disagree with the Oviatt paper Speech + lips is sensor fusion, not multimodality

6 What constitutes a modality?
To some extent, it’s a matter of semantics Is pen a different modality than a mouse? Are two mice different modalities if one is controlling a gui, and the other controls a tablet-like ui? Is a captured modality the same as an input modality? How does the audio notebook fit into this?

7 Input modalities mouse pen: recognized or unrecognized speech
non-speech audio tangible object manipulation gaze, posture, body-tracking Each of these experiences has different implementing technologies e.g., gaze tracking could be laser-based or vision-based

8 Output modalities Visual displays Haptics: Force Feedback Audio Smell
Raster graphics, Oscilloscope, paper printer, … Haptics: Force Feedback Audio Smell Taste

9 Dual Purpose Speech

10 Why multimodal? Hands busy / eyes busy Mutual disambiguation
Faster input “More natural”

11 On Anthropomorphism The multimodal community grew out of the AI and speech communities Should human communication with computers be as similar as possible to human-human communication?

12 Multimodal Software Architectures

13 Next Time… Vision-Based Interaction
Computer Vision for Interactive Computer Graphics, William T. Freeman, Yasunari Miyake, Ken-ichi Tanaka, David B. Anderson, Paul A. Beardsley, Chris N. Dodge, Michal Roth, Craig D. Weissman, William S. Yerazunis, Hiroshi Kage, Kazuo Kyuma A Design Tool for Camera-based Interaction, Jerry Alan Fails and Dan R. Olsen

Download ppt "Speech & Multimodal Scott Klemmer · 16 November 2006."

Similar presentations

Ads by Google