Presentation is loading. Please wait.

Presentation is loading. Please wait.

Stanford hci group / cs376 u Scott Klemmer · 16 November 2006 Speech & Multimod al.

Similar presentations


Presentation on theme: "Stanford hci group / cs376 u Scott Klemmer · 16 November 2006 Speech & Multimod al."— Presentation transcript:

1 stanford hci group / cs376 http://cs376.stanford.ed u Scott Klemmer · 16 November 2006 Speech & Multimod al

2 2 Some hci definitions  Multimodal generally refers to an interface that can accept input from two or more combined modes  Multimedia generally refers to an interface that produces output in two or more modes  The vast majority of multimodal systems have been speech + pointing (pen or mouse) input, with graphical (and sometimes voice) output

3 3 Canonical App: Maps  Why are maps so well-suited?  A visual artifact for computation (Hutchins)

4 4 What is an interface  Is it an interface if there’s no method for a user to tell if they’ve done something?  What might an example be?  Is it an interface if there’s no method for explicit user input?  example: health monitoring apps

5 5 Sensor Fusion  multimodal = multiple human channels  sensor fusion = multiple sensor channels  Example app: Tracking people (1 human channel)  might use: RFID + vision + keyboard activity + …  I disagree with the Oviatt paper  Speech + lips is sensor fusion, not multimodality

6 6 What constitutes a modality?  To some extent, it’s a matter of semantics  Is pen a different modality than a mouse?  Are two mice different modalities if one is controlling a gui, and the other controls a tablet-like ui?  Is a captured modality the same as an input modality?  How does the audio notebook fit into this?

7 7 Input modalities  mouse  pen: recognized or unrecognized  speech  non-speech audio  tangible object manipulation  gaze, posture, body-tracking  Each of these experiences has different implementing technologies  e.g., gaze tracking could be laser-based or vision-based

8 8 Output modalities  Visual displays  Raster graphics, Oscilloscope, paper printer, …  Haptics: Force Feedback  Audio  Smell  Taste

9 9 Dual Purpose Speech

10 10 Why multimodal?  Hands busy / eyes busy  Mutual disambiguation  Faster input  “More natural”

11 11 On Anthropomorphism  The multimodal community grew out of the AI and speech communities  Should human communication with computers be as similar as possible to human-human communication?

12 12 Multimodal Software Architectures  OAA, AAA, OOPS

13 13 Next Time… Vision-Based Interaction Computer Vision for Interactive Computer Graphics, William T. Freeman, Yasunari Miyake, Ken- ichi Tanaka, David B. Anderson, Paul A. Beardsley, Chris N. Dodge, Michal Roth, Craig D. Weissman, William S. Yerazunis, Hiroshi Kage, Kazuo Kyuma A Design Tool for Camera-based Interaction, Jerry Alan Fails and Dan R. Olsen

14 CS547 Tomorrow  Ben Shneiderman, University of Maryland – Science 2.0: The Design Science of Collaboration 14


Download ppt "Stanford hci group / cs376 u Scott Klemmer · 16 November 2006 Speech & Multimod al."

Similar presentations


Ads by Google