The Architecture Dream Team Schloss Dagshul, Germany October 2001.

The Architecture Dream Team Schloss Dagshul, Germany October 2001

Would you build your dream house without a blueprint?

What you hope to get

… what you might get

User(s) Information Applications People Today’s Conventional Architecture Presentation Application Interface Dialog Control

CHAMELEON Platform (Intelimedia Workbench) Paul McKevitt Speech synthesizer Speech recognizer Laser pointer Black board NL parser Microphone array Domain model Gesture recognizer Dialogue manager Frame semantics Topsy

Microsoft Derek Jacoby MIPAD Architecture A Typical DrWho App

Harry Bunt Context Input InterpretationOutput Synthesis Context Management Dialogue Management API Application Pending Context linguistic semantic physical perceptual cognitive social

Art Exploration Oliviero Stock explicit input (e.g., pointing) input analyzer composer engine implicit input (e.g., movement) presentation Physical space model Hypermedia information visitor models interaction history Audio message to headphonelinks and image to UI

COLLAGEN Sidner et al.

IBM’s Responsive Information Architect (RIA) Michelle Zhou speech gesture Multimodal Interpreter Conversational Facilitator Presentation Broker Media Producer Visual Designer Language Designer Models of: Design Domain User Conversation Environment user IRIS Info Server

Interact Kristiina Jokinen Input Manager Presentation Manager Dialogue Manager Task Agents/Acts Information Storage Database Dialogue Agents/Acts (e.g., Q, A, State) ASR Language Understanding Topic Recognition TTS Generator Agents

EMBASSI Conceptual Architecture l Z-Axis: -Underlying HW-Infrastructure -Software-Infrastructure (Agent / Distr. Comp. Middleware) -Functional building blocks of conceptual architecture (Multimodal Assistant Componentware, MAC) -Application-level Assistants (not shown) l XY-Plane of MAC -Dialogic Assistance -Effectual Assistance -Situational Assistance -Explicit and implied generic (= application independent) ontologies, defining component interfaces

A n Assistent X 2 Tuner Strg. X 3 EPG Strg. X 5 Display Strg. X 4 VCR Strg. X 1 Embassi Strg. G 1 VCR G 2 Set-top Box G 3 Display S 1 Biometrie S 2 Umgeb. Sensor Umgebungs / Situations DB User DB Applikations DB Resourcen DB I 1 GUI Input I 2 Sprach- erkennung O 1 Audio Ausgabe. O 2 Display F 1 GUI- analyse F 2 Sprach- analyse PMI (Medien- fusion) PMO (Präsen- tation) R 3 Textge- nerierung R 2 GUI Renderer R 1 Avatar- Controller Unimodale I/O Geräte “Lexik.” Ebene Multimod. Datenauf- bereitung “Syntakt.” Ebene Dialog- management Assistenz- methoden “Semant.” Ebene Strategie Ebene Gerätestrgs. Ebene Ausführungs- komponenten Geräteinfra- struktur GiGi XiXi … … D Dialog- mgr. I 3 Gestik- erkennung I 4 Blickricht. erkennung O 3 Avatar- Renderer F 2 Geräte- auswahl Kontext-Manager EMBASSI Architecture “Ich will das auf dem da aufnehmen!”

SMARTKOM Wolfgang Wahlster

DARPA Galaxy Communicator Language Generation Language Generation Text-to-Speech Conversion Text-to-Speech Conversion Audio Server Audio Server Dialogue Management Dialogue Management Application Backend Application Backend Context Tracking Context Tracking Frame Construction Frame Construction Speech Recognition Speech Recognition Hub The Galaxy Communicator Software Infrastructure (GCSI) is a distributed, message-based, hub-and-spoke infrastructure optimized for constructing spoken dialogue systems Open source and documentation available at fofoca.mitre.org and sourceforge.net/projects/communicator

An Example: Communicator-Compliant Emergency Management Interface MITRE I/O podium displays input and output text MITRE I/O podium displays input and output text MIT phone connectivity connects audio to a telephone line MIT phone connectivity connects audio to a telephone line Database MITRE SQL generation converts abstract requests to SQL MITRE dialogue management tracks information, decides what to do, and formulates answers Frame construction extracts information from input text Frame construction extracts information from input text Speech recognition converts speech to text Speech recognition converts speech to text Hub Text-to-speech converts output text to audio Text-to-speech converts output text to audio CMU Festival engine, Colorado wrapper MIT SUMMIT engine and wrapper Colorado Phoenix engine, MITRE wrapper Open source PostGres engine, MITRE wrapper

Communicator Protocol l All communication is in terms of objects, which bear a message type and object type l Messages encoded in XDR (public domain data representation) broker object broker start broker end new message message reply error reply destroy reply postponement disconnect message type broker connection message sizeobject type string integer float frame list integer array (8, 16, 32., 64 bits) float array (32, 64 bits) object data

Frames and Messages l A frame is an attribute-value structure consisting of a name, a frame type (always a clause), and a collection of pairs of keys and associated typed values (string, integer, float, frame, list, etc.) l Frames can be constructed using API calls or parsed from a string representation {c main :utterance_id 0 :domain “travel” } name frame type keys integer valuestring value l A message is a frame passed between the Hub and a server -A message can be new (initiating an action) or a reply

Definitions l Abstract Architecture -Components, connections (protocols), and constraints (IEEE definition) -Data/knowledge structures, data flow and protocols, control flow -Consider use cases, e.g., l In-car navigation system l Desktop, kiosk, mobile device interaction l Media conversion

Requirements l Functional -Modality integration (input and output) -Situation (User, task, application) appropriate real-time sensing/response (e.g., supporting barge-in, perceptual sensing/feedback) -Representation of level of granularity (modules and data structures) -Manage feedback - local and global, when/where? -Support incremental processing -Support incremental development (and scaleability) l System/Technical -Support for processing/fusing multimodal input (e.g., parallel processing) -Modular, composable (possibly distributed processing) -Efficient implementation -Time scale, Temporal and spatial resolution -Accessible (even partial) data structures -Open and extensible protocols

Components l Media/mode Analysis -Multimodal fusion -Mutual disambiguation and reference resolution l Media/mode Design -Content selection, media design, allocation, coordination, layout l Discourse Management -Attention management -Selects dialogue act/interpretation -Error handling l Context Management -physical/spatial, temporal state l User Modeling -Capabilities, beliefs, intentions -User ID l Knowledge sources, states, histories available to all processes

User(s) Information Applications People Media Fusion Interaction Management Intention Recognition Discourse Modeling User Modeling Presentation Design Representation and Inference User Model Discourse Model Domain Model Task Model Media Models Media Analysis Media/Mode Analysis Language Graphics Gesture Biometrics Design Media/Mode Design Language Graphics Gesture Animated Presentation Agent Media Input Processing Media Output Rendering Architecture of the SmartKom Agent (cf. Maybury/Wahlster 1998) Presentation Dialog Control Application Interface Application Interface

Information, Applications, People User(s) User Modeling Discourse Management Intention Recognition Interaction Management Media/ Mode Analysis Language Graphics Gesture Sound Media Input Processing Media Output Rendering Architecture Context Management Lexicon Management User ID Biometrics Application Interface Integrate Respond Request Terminate Initiate T A V G G Mode Coordination Presentation Design Multimodal Reference Resolution Multimodal Fusion A A V G G Media/ Mode Design Language Graphics Gesture Sound Animated Presentation Agent Select Content Design Allocate Coordinate Layout User Model Discourse Model Domain Model Media Models Task Model Representation and Inference, States and Histories Application Models Context Model Reference Resolution Action Planning

The Architecture Dream Team Schloss Dagshul, Germany October 2001

Media Fusion Media Fusion Media Analysis Media/Mode Analysis Spoken Language Lip Reading Gesture Media Fusion S V V

COLLAGEN Sidner et al. Speech interpretation Planning and discourse Agent Application USER Speech Window events Student Model Mel ViaVoice

The Architecture Dream Team Schloss Dagshul, Germany October 2001.

Similar presentations

Presentation on theme: "The Architecture Dream Team Schloss Dagshul, Germany October 2001."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Architecture Dream Team Schloss Dagshul, Germany October 2001.

Similar presentations

Presentation on theme: "The Architecture Dream Team Schloss Dagshul, Germany October 2001."— Presentation transcript:

Similar presentations

About project

Feedback