An overview of EMMA— Extensible MultiModal Annotation Michael Johnston AT&T Labs Research 8/9/2006.

Slides:



Advertisements
Similar presentations
(2)(2) APNOMS 2003 Introduction Web-Service –A software application identified by a URI –Its public interfaces and bindings are defined and described.
Advertisements

CHART or PICTURE INTEGRATING SEMANTIC WEB TO IMPROVE ONLINE Marta Gatius Meritxell González TALP Research Center (UPC) They are friendly and easy to use.
1 University of Namur, Belgium PReCISE Research Center Using context to improve data semantic mediation in web services composition Michaël Mrissa (spokesman)
Standard Interfaces for FPGA Components Joshua Noseworthy Mercury Computer Systems Inc.
From Model-based to Model-driven Design of User Interfaces.
License: CC BY-NC-SA 3.0 © 2009 ICOPER Best Practice Network Introduction to IMS Learning Design Michael Derntl, Susanne Neumann, Petra.
1 HTML Documents and JavaScript Tom Horton Alfred C. Weaver CS453 Electronic Commerce.
Introduction to WSDL presented by Xiang Fu. Source WSDL 1.1 specification WSDL 1.1 specification – WSDL 1.2 working draft WSDL.
Apache Axis2 SOAP Primer. Agenda What is SOAP? Characteristics SOAP message structure Header blocks Fault notification Exercises.
CIS 375—Web App Dev II SOAP.
XML Technology in E-Commerce
27 January Semantically Coordinated E-Market Semantic Web Term Project Prepared by Melike Şah 27 January 2005.
RDF Tutorial.
WEB SERVICES DAVIDE ZERBINO.
Irek Defée Signal Processing for Multimodal Web Irek Defée Department of Signal Processing Tampere University of Technology W3C Web Technology Day.
XISL language XISL= eXtensible Interaction Sheet Language or XISL=eXtensible Interaction Scenario Language.
Frontiers in Interaction The Power of Multimodal Standards Deborah Dahl Principal, Conversational Technologies Chair, W3C Multimodal Interaction Working.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
2-1 © Prentice Hall, 2007 Chapter 2: Introduction to Object Orientation Object-Oriented Systems Analysis and Design Joey F. George, Dinesh Batra, Joseph.
DCS Architecture Bob Krzaczek. Key Design Requirement Distilled from the DCS Mission statement and the results of the Conceptual Design Review (June 1999):
Multimodal Interaction. Modalities vs Media Modalities are ways of encoding information e.g. graphics Media are instantiations of modalities e.g. a particular.
Architecture & Data Management of XML-Based Digital Video Library System Jacky C.K. Ma Michael R. Lyu.
Multimodal Architecture for Integrating Voice and Ink XML Formats Under the guidance of Dr. Charles Tappert By Darshan Desai, Shobhana Misra, Yani Mulyani,
Support for Palm Pilot Collaboration Including Handwriting Recognition.
Multi-Modal Dialogue in Personal Navigation Systems Arthur Chan.
Mobile Multimodal Applications. Dr. Roman Englert, Gregor Glass March 23 rd, 2006.
RGPS Metamodel Framework for Interaction between cloud and client HE Yangfan, HE Keqing, WANG Jian, WANG Chong SKLSE(WHU), P.R.China 32N1891.
Conversational Applications Workshop Introduction Jim Larson.
Web Services Experience Language Web Services eXperience Language Technical Overview Ravi Konuru e-Business Tools and Frameworks,
Integrating Timing into XML Documents Patrick Schmitz MS Research BARC Telepresence.
Center for Human Computer Communication Department of Computer Science, OG I 1 Designing Robust Multimodal Systems for Diverse Users and Mobile Environments.
1 © 2004 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Media Resource Control Protocol v2 Sarvi Shanmugham, Editor: MRCP v1/v2.
Recognition of meeting actions using information obtained from different modalities Natasa Jovanovic TKI University of Twente.
Working group on multimodal meaning representation Dagstuhl workshop, Oct
An Overview of MPEG-21 Cory McKay. Introduction Built on top of MPEG-4 and MPEG-7 standards Much more than just an audiovisual standard Meant to be a.
ITCS 6010 SALT. Speech Application Language Tags (SALT) Speech interface markup language Extension of HTML and other markup languages Adds speech and.
Introduction to Interactive Media Interactive Media Tools: Software.
Spoken dialog for e-learning supported by domain ontologies Dario Bianchi, Monica Mordonini and Agostino Poggi Dipartimento di Ingegneria dell’Informazione.
Communication within AmI. Gent, 21 maart 2005 Communication as a ubiquitous activity The Three Scenarios:  Instant Messaging  Integrated Traffic Information.
CHAPTER TEN AUTHORING.
Outline Grammar-based speech recognition Statistical language model-based recognition Speech Synthesis Dialog Management Natural Language Processing ©
Towards multimodal meaning representation Harry Bunt & Laurent Romary LREC Workshop on standards for language resources Las Palmas, May 2002.
 Copyright 2008 Digital Enterprise Research Institute. All rights reserved. Semantic on the Social Semantic Desktop.
Spoken Dialog Systems and Voice XML Lecturer: Prof. Esther Levin.
An Ontological Framework for Web Service Processes By Claus Pahl and Ronan Barrett.
ENTERFACE 08 Project 2 “multimodal high-level data integration” Mid-term presentation August 19th, 2008.
A Context Model based on Ontological Languages: a Proposal for Information Visualization School of Informatics Castilla-La Mancha University Ramón Hervás.
March 20, 2006 © 2005 IBM Corporation Distributed Multimodal Synchronization Protocol (DMSP) Chris Cross IETF 65 March 20, 2006 With Contribution from.
A comprehensive framework for multimodal meaning representation Ashwani Kumar Laurent Romary Laboratoire Loria, Vandoeuvre Lès Nancy.
Referring to Objects with Spoken and Haptic Modalities Frédéric LANDRAGIN Nadia BELLALEM & Laurent ROMARY LORIA Laboratory Nancy, FRANCE.
Intelligent Robot Architecture (1-3)  Background of research  Research objectives  By recognizing and analyzing user’s utterances and actions, an intelligent.
Introduction to Interactive Media Interactive Media Tools: Authoring Applications.
Introduction to Web Services. Agenda Motivation History Web service model Web service components A walkthrough examples.
A Mediated Approach towards Web Service Choreography Michael Stollberg, Dumitru Roman, Juan Miguel Gomez DERI – Digital Enterprise Research Institute
1 Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents S. Kawamoto, et al. October 27, 2004.
Providing web services to mobile users: The architecture design of an m-service portal Minder Chen - Dongsong Zhang - Lina Zhou Presented by: Juan M. Cubillos.
Multi-Modal Dialogue in Personal Navigation Systems Arthur Chan.
Standards for representing meeting metadata and annotations in meeting databases Standards for representing meeting metadata and annotations in meeting.
SEESCOASEESCOA SEESCOA Meeting Activities of LUC 9 May 2003.
James A. Larson Developing & Delivering Multimodal Applications 1 EMMA Extensible MultiModal Annotation markup language Canonical structure for semantic.
W3C Multimodal Interaction Activities Deborah A. Dahl August 9, 2006.
Software Architecture for Multimodal Interactive Systems : Voice-enabled Graphical Notebook.
1. Relating Human Markup Language to the Web Services Component Model 1.0 The Human Markup Language-HumanML Codifying Human Characteristics  Basic XML.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Systems Analysis and Design With UML 2
Improving Dialogs with EMMA
What is ebXML? Electronic Business Extensible Markup Language
Model-Driven Analysis Frameworks for Embedded Systems
MUMT611: Music Information Acquisition, Preservation, and Retrieval
Semantic Markup for Semantic Web Tools:
Presentation transcript:

An overview of EMMA— Extensible MultiModal Annotation Michael Johnston AT&T Labs Research 8/9/2006

Overview Introduction to EMMA –EMMA Structural elements and Attributes –Examples of EMMA annotation Support in EMMA for integrating inputs from multiple modes –emma:hook

Introduction to EMMA Spoken and multimodal interactive systems involve communication among different components –Recognition (Speech, Handwriting, Gestures) –Understanding –Multimodal Integration –Dialog/Interaction management

Spoken Input Processing SPEECH RECOGNITION “flight from denver to boston tomorrow” UNDERSTANDING denver boston tomorrow DIALOG MANAGER

Multimodal Input Processing SPEECH RECOGNITION “zoom in here” UNDERSTANDING zoom area tomorrow DIALOG/INTERACTION MANAGER MULTIMODAL INTEGRATION GESTURE RECOGNITION area … UNDERSTANDING area zoom area tomorrow

Introduction to EMMA (cont.) EMMA (Extensible MultiModal Annotation) –Standardized XML markup language for communication among components of spoken and multimodal interactive systems –Provides structural containers for possible interpretations of user inputs and elements and attributes for annotating properties of user inputs (time, confidence, input source etc.) –Critically, does not standardize the representation of interpretations of user input

Spoken Input Processing SPEECH RECOGNITION EMMA UNDERSTANDING EMMA DIALOG MANAGER

Multimodal Input Processing SPEECH RECOGNITION EMMA UNDERSTANDING EMMA DIALOG/INTERACTION MANAGER MULTIMODAL INTEGRATION GESTURE RECOGNITION EMMA UNDERSTANDING EMMA

Introduction to EMMA (cont.) EMMA (Extensible MultiModal Annotation) –Replacement for NLSML –Aims to facilitate interoperation among modality components from different vendors –Will also play a role in logging and corpus annotation

EMMA Standards Effort at W3C EMMA is part of W3C standards effort in the Multimodal Interaction (MMI) Working Group ( EMMA requirements published on Jan. 11, 2003 First EMMA working draft published on August 11, 2003 Last call working draft September 2005 –

EMMA Structural Elements Provide containers for application semantics and for multimodal annotation - E.g …. …. EMMA Elements emma:emma emma:interpretation emma:one-of emma:group emma:sequence emma:lattice

EMMA Annotations Characteristics and processing of input, e.g.: emma:tokenstoken of input emma:processreference to processing emma:no-inputlack of input emma:uninterpreteduninterpretable input emma:langhuman language of input emma:signalreference to signal emma:media-typemedia type emma:confidenceconfidence scores emma:sourceannotation of input source emma:start emma:endTimestamps (absolute/relative) emma:medium emma:mode emma:function medium, mode, and function of input emma:hookhook

Examples of EMMA Annotation <emma:interpretation id="interp1" emma:start=“ ” emma:end=“ ” emma:tokens=“flights from boston to denver” emma:confidence="0.6" emma:medium="acoustic" emma:mode="speech" emma:function="dialog" emma:source=“ emma:signal=“ emma:media-type= "audio/dsr ; rate:8000; maxptime:40"> Boston Denver

Examples of EMMA Annotation Use of EMMA elements:, <emma:interpretation id="int1" emma:confidence="0.75“ emma:tokens=“flights from boston to denver”> Boston Denver <emma:interpretation id="int2" emma:confidence="0.68“ emma:tokens=“flights from austin to denver”> Austin Denver

Multimodal Integration: emma:hook One of the most powerful aspects of multimodal interfaces is the possibility of providing support for user inputs optimally distributed over the available input modes e.g. SPEECH: “zoom in here” INK:

Multimodal Integration: emma:hook In order to support composite multimodality some kind of multimodal integration mechanism is needed –Checks compatibility of content from different modes and combines them The integration mechanism is not standardized EMMA provides an annotation emma:hook which can be used to mark locations in the semantic representation where content from another mode is needed emma:hook can be used to drive different kinds multimodal integration mechanisms

Multimodal integration: emma:hook The value of emma:hook indicates the mode of the content to be integrated –emma:hook=“ink” – indicates that this element needs to be combined with a element from the ink mode – ….

Multimodal integration: emma:hook zoom area zoom area SPEECH: “ZOOM IN HERE”

Multimodal integration: emma:hook zoom area zoom area Speech interpretation can be generated using SRGS rules from the W3C SI specification SPEECH: “ZOOM IN HERE” zoom in here $.command = new Object(); $.command.action = "zoom"; $.command.location = new Object(); $.command.location._attributes = new Object(); $.command.location._attributes.hook = new Object(); $.command.location._attributes.hook._nsprefix = "emma"; $.command.location._attributes.hook._value = "ink"; $.command.location.type = "area";

Multimodal integration: emma:hook zoom area zoom area area area SPEECH: INK: “ZOOM IN HERE”

Multimodal integration: emma:hook SPEECH: “ZOOM IN HERE” zoom area SPEECH: “ZOOM IN HERE” zoom area INK: (CIRCLE ON MAP) area INK: (CIRCLE ON MAP) area MULTIMODAL: “ZOOM IN HERE” + CIRCLE zoom area MULTIMODAL: “ZOOM IN HERE” + CIRCLE zoom area

Conclusion EMMA: Extensible MultiModal Annotation markup language Provides standard representation for communication among the components that make up spoken and multimodal systems –Does not standardize application semantic representation –Provides Containers for application specific semantics Standard set attributes and elements for metadata –Enables interoperability among components from different vendors –Built in ‘hooks’ for multimodal integration (emma:hook)