© 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property. Multimodal Interaction in.

Slides:

Advertisements

Similar presentations

Recuperação de Informação B Cap. 10: User Interfaces and Visualization , , , November 29, 1999.

Advertisements

CHART or PICTURE INTEGRATING SEMANTIC WEB TO IMPROVE ONLINE Marta Gatius Meritxell González TALP Research Center (UPC) They are friendly and easy to use.

Rethinking Search in the era of new devices and services Rethinking Search in the era of new devices and services SREE HARI NAGARALU | DEC 19, 2012 DEC.

VIPI - VIRTUAL PORTAL FOR INTERACTION AND ICT TRAINING FOR PEOPLE WITH DISABILITIES National ViPi Workshop 03/10/2011, Larnaca, Cyprus

In-Text Ads the Mobile Web Supervised & Guided By: Avishai Weis Yaron Zakai-Or Noam Lampert Delivered By: Adi Guberman Gal Ben Ami.

© 2007 Avaya Inc. All rights reserved. Interactive Voice and Video Response Applications Dr. Valentine C. Matula

Probabilistic Adaptive Real-Time Learning And Natural Conversational Engine Seventh Framework Programme FP7-ICT

Richard Yu.  Present view of the world that is: Enhanced by computers Mix real and virtual sensory input  Most common AR is visual Mixed reality virtual.

1 CS 430: Information Discovery Lecture 22 Non-Textual Materials 2.

John Hu Nov. 9, 2004 Multimodal Interfaces Oviatt, S. Multimodal interfaces Mankoff, J., Hudson, S.E., & Abowd, G.D. Interaction techniques for ambiguity.

Stanford hci group / cs376 research topics in human-computer interaction Multimodal Interfaces Scott Klemmer 15 November 2005.

Multimodal Interaction. Modalities vs Media Modalities are ways of encoding information e.g. graphics Media are instantiations of modalities e.g. a particular.

Multimodal Architecture for Integrating Voice and Ink XML Formats Under the guidance of Dr. Charles Tappert By Darshan Desai, Shobhana Misra, Yani Mulyani,

MUSCLE Multimodal e-team related activity Technical University of Crete Speech Processing and Dialog Systems Group Presenter: Prof. Alex Potamianos Technical.

Multi-Modal Dialogue in Personal Navigation Systems Arthur Chan.

Prototyping and Designing New Assistive Technologies for People with Disabilities Shaun Kane Human-Centered UMBC

Prof. James A. Landay University of Washington CSE 441 Spring 2012 Mobile UI Design * Based in part on content in Chapter 9-10 of Designing the iPhone.

1 CGS1060 Mobile UIs Copyright 2012 by Janson Industries.

Mobile Multimodal Applications. Dr. Roman Englert, Gregor Glass March 23 rd, 2006.

Should Intelligent Agents Listen and Speak to Us? James A. Larson Larson Technical Services

VoiceXML Builder Arturo Ramirez ACS 494 Master’s Graduate Project May 04, 2001.

Redefining Disability Mobile Accessibility Testing By Priti Rohra Head Accessibility Testing BarrierBreak Technologies.

Data collection and experimentation. Why should we talk about data collection? It is a central part of most, if not all, aspects of current speech technology.

Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.

Improving Speech Applications with Video August 9 th 2006 David Multimedia Research Department Avaya Labs Research.

Center for Human Computer Communication Department of Computer Science, OG I 1 Designing Robust Multimodal Systems for Diverse Users and Mobile Environments.

` Tangible Interaction with the R Software Environment Using the Meuse Dataset Rachel Bradford, Landon Rogge, Dr. Brygg Ullmer, Dr. Christopher White `

Author: James Allen, Nathanael Chambers, etc. By: Rex, Linger, Xiaoyi Nov. 23, 2009.

User Interface in the Digital Decade Kai-Fu Lee Corporate Vice President Microsoft Corporation.

Department of Mechanical Engineering, LSUSession VII MATLAB Tutorials Session VIII Graphical User Interface using MATLAB Rajeev Madazhy

ITCS 6010 SALT. Speech Application Language Tags (SALT) Speech interface markup language Extension of HTML and other markup languages Adds speech and.

11.10 Human Computer Interface www. ICT-Teacher.com.

© 2015 Nuance Communications, Inc. All rights reserved. Dragon Drive The Contextual Wave May 13th, 2015.

Multimodal Information Access Using Speech and Gestures Norbert Reithinger

Stanford hci group / cs376 u Scott Klemmer · 16 November 2006 Speech & Multimod al.

circle Adding Spoken Dialogue to a Text-Based Tutorial Dialogue System Diane J. Litman Learning Research and Development Center & Computer Science Department.

Cloud platforms Lead to Open and Universal access for people with Disabilities and for All Cloud4all User Forum and Focus group 30/08/2012.

Spoken Dialog Systems and Voice XML Lecturer: Prof. Esther Levin.

Page 1 Alliver™ Page 2 Scenario Users Contents Properties Contexts Tags Users Context Listener Set of contents Service Reasoner GPS Navigator.

Dept. of Computer Science University of Rochester Rochester, NY By: James F. Allen, Donna K. Byron, Myroslava Dzikovska George Ferguson, Lucian Galescu,

NLP ? Natural Language is one of fundamental aspects of human behaviors. One of the final aim of human-computer communication. Provide easy interaction.

Issues in Multiparty Dialogues Ronak Patel. Current Trend  Only two-party case (a person and a Dialog system  Multi party (more than two persons Ex.

School of something FACULTY OF OTHER Facing Complexity Using AAC in Human User Interface Design Lisa-Dionne Morris School of Mechanical Engineering

Introduction to Dialogue Systems. User Input System Output ?

TownMUD An Interface for a Text-Based Multiple User Dimension.

1 Human Computer Interaction Week 5 Interaction Devices and Input-Output.

USER INTERFACE DESIGN (UID). Introduction & Overview The interface is the way to communicate with a product Everything we interact with an interface Eg.

GUI Meets VUI: Some Possible Guidelines James A. Larson VP, Larson Technical Services 4/21/20151© 2015 Larson Technical Services.

Multi-Modal Dialogue in Personal Navigation Systems Arthur Chan.

Preparing for the 2008 Beijing Olympics : The LingTour and KNOWLISTICS projects. MAO Yuhang, DING Xiao-Qing, NI Yang, LIN Shiuan-Sung, Laurence LIKFORMAN,

Speech Processing 1 Introduction Waldemar Skoberla phone: fax: WWW:

Stanford hci group / cs376 u Jeffrey Heer · 19 May 2009 Speech & Multimodal Interfaces.

W3C Multimodal Interaction Activities Deborah A. Dahl August 9, 2006.

| Mobile Accessibility Development Making an Accessible App Usable Scott McCormack.

The whole world in the palm of your hand… Daniel A. Smith Alisdair Owens Alistair Russell Max Wilson Daniel A. Smith Alisdair Owens Alistair Russell Max.

Predicting and Adapting to Poor Speech Recognition in a Spoken Dialogue System Diane J. Litman AT&T Labs -- Research

Speech and multimodal Jesse Cirimele. papers “Multimodal interaction” Sharon Oviatt “Designing SpeechActs” Yankelovich et al.

Perceptive Computing Democracy Communism Architecture The Steam Engine WheelFire Zero Domestication Iron Ships Electricity The Vacuum tube E=mc 2 The.

Let’s talk about Conversation Design

Build smarter bots and devices by connecting to the Microsoft Graph

Issues in Spoken Dialogue Systems

Untold Stories of Touch, Gesture, & NUI

Multimodal Interfaces

Web IR: Recent Trends; Future of Web Search

Designing Search for Humans

Optimizing Multimodal Interfaces for Speech Systems in the Automobile

Tomás Murillo-Morales and Klaus Miesenberger

Multimodal Human-Computer Interaction New Interaction Techniques 22. 1

Professor John Canny Spring 2003

Professor John Canny Spring 2004

Presentation transcript:

© 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property. Multimodal Interaction in Speak4it Patrick Ehlen AT&T Rethink Possible

Page 2 This talk will discuss…. Multimodal interaction approaches – mode choice – mode integration Grounding (It’s context!) Grounding in multimodal local search

Page 3 What is multimodal interaction? The most common implementation of “multimodal interaction” – mode choice Let people use more than one mode of input or output – Input: Graphical UI or voice (ASR) – Output: Visual (graphics) or voice (TTS) – Interact using one mode at a time

Page 4 Another approach…. mode integration – Use more than one mode at the same time – Provide simultaneous information using different channels – Combine information from different modes into one interpretation “Italian restaurants near here”

Page 5 Advantages…. It’s natural (underspecification is the norm) Adapt to environment Speech can be shorter and more simple and/or communicate more complex information Complete tasks more quickly “Italian restaurants near here”

Page 6 Advantages…. Some content is better communicated by modes other than speech (e.g., gesturing to communicate spatial information) Information from different modes can complement one other and resolve ambiguities (“mutual compensation”) “Italian restaurants near here”

History of research prototypes – MATCH (Johnston et al 2002) – AdApt (Gustafson et al 2000) – SmartKom mobile (Wahlster 2006) – Multimodal Interactive Maps (Oviatt 1997) Page 7

The Next Big Thing? New technologies (touch screens, GPS, accelerometer data, video-based recognition) will spur an evolution in multimodal interface design – Beyond mode choice to mode integration Speak4it sm – only commercially available product we know of that performs multimodal integration at semantic level – Available for free on iPhone, iPad, Touch Page 8

Multimodal interaction in Speak4It Speak4it gesture inputs – point, line, area (drawn with finger) – when user hits ‘Speak/Draw’ button map display becomes a drawing canvas Page 9

Multimodal integration provides more headaches for designers Problems: – More ‘dimensions’ of context – Demands more focus on “common ground” and aspects of knowledge that have already been grounded with users (Clark 1996)

What is grounding? Mutual knowledge: Things that all parties in a conversation know, and know that other parties in the conversation also know – shared physically, linguistically, or via community When people introduce references, either verbally or by other means, they are grounding those references In dialogue, grounding helps to determine what people say, and what they don’t say – What we do or don’t say reveals a lot about aspects of context we believe are already shared

Grounding in telephony queries – Search queries are very basic dialogue Single exchange of query & response – Telcos have dealt with these queries for a long time…. Cable Car Pizza What listing please? Here’s that number….

Grounding in telephony queries – 411 systems assumed an implicit grounded location because phones had a fixed location (tied to area code) To refer to another location, you called a different area code The area code provided a source of mutual knowledge about the grounded location in a query Cable Car Pizza in San Francisco What listing please? Please call

Then phones lost their tethers (and their implicit grounding mechanisms)…. – With mobile phones, not as much shared knowledge about location – Location became “part of the conversation” again – Spoken query dialogue systems: Google-411, Bing-411, 800-Yellowpages Phone apps etc San Francisco, California What City and state? What listing? Cable Car Pizza

Evidence of grounding problems found in Speak4it Logs Frequency of specific locations in queries: 18% “police department in jessup maryland” “office depot linden boulevard” Most are unlocated: “gas station” “saigon restaurant” Location grounding breaking down: “Serendipity” … followed shortly by “Serendipity Dallas Texas” Page 15 Corrections: “Starbucks Cape Girardeau” … followed six minutes later by “Lowes”.. then right away “Lowes Cape Girardeau”

Location grounding sources in multimodal mobile search Page 16 “italian restaurants” PHYSICAL User’s current location (GPS) GUI Location shown on map display GESTURE Where user touched VERBAL “Sorry I could not find french restaurants in madison” Place spoken in prior query

Example Page 17 “new york, new york”“pizza restaurants”<scroll>

Collecting grounding data in the wild Gathered ground truth from users when they are “in the wild” Present users with a “grounded location disambiguation” screen to collect user-reported intentions Display to ~20% of unlocated queries Use these data to train a context model and to judge model comparisons Page 18

Page % 38.04% 13.59% 69.29% Selected grounded locations (relative to presentation)

Page 20 ncast ncast

Speak4it multimodal architecture Page 21 Gesture Recognition Gesture Recognition ASR NLU Location Grounding Interaction Manager Interaction Manager Multimodal Search Platform Multimodal http data stream (speech, text, ink) Results/ Requests Ink trace Gestures Speech Platform Geo-coder Search Audio Parsed string Location string Lat/Lo n Query Results Listings index NL model Geo index Features Salient Location ASR Gesture Recognition Gesture Recognition SLM

Page 22 Conclusions Multimodal UIs will soon move from mode choice to mode interaction We’ll need richer context models to predict grounding of locations and other references across modes, to align system actions with user expectations Mobile voice searchers don’t always consider their “GPS” location as the grounded one; location shown on the map is considered grounded 37% of the time User groundings from touch are highly salient

Page 23 Acknowledgments Thanks to Jay Lieske, Clarke Retzer, Brant Vasilieff, Diamantino Caseiro, Junlan Feng, Srinivas Bangalore, Claude Noshpitz, Barbara Hollister, Remi Zajac, Mazin Gilbert, Barbara Hollister, and Linda Roberts for their contributions to Speak4it.