Download presentation
Presentation is loading. Please wait.
Published bySophie Shaw Modified over 9 years ago
1
Chapter 8. Situated Dialogue Processing for Human-Robot Interaction in Cognitive Systems, Christensen et al. Course: Robots Learning from Humans Sabaleuski Matsvei Interdisciplinary Program in Cognitive Science Seoul National University http://cogsci.snu.ac.kr
2
Contents Introduction Background Multi-level Intergration in Language Processing Language Processing and Situational Experience Talking Talking about What You Can See Talking about Places You Can Visit Talking about Things You Can Do Conclusions 2
3
Local visuo-spatial scenes Spatial organization of an indoor environment DIALOGUE «THE WORLD» Playmate scenarioExplorer scenario Introduction
4
Requirements for the solution Gradual construction Referentiality Persistence Efficiency & Effectiveness LANGUAGEPERCEPTION
5
Winograd's SHRDLU Incremental "left-to-right" linguistic analyses connected to visuo-spatial representations of local scenes. Could understand and execute human commands Had a basic memory to supply context Small virtual world Language consisting of around 50 words
6
Steels's Semiotic Networks Open-ended, adaptive communication system Ability to learn Communicative success above 80% Lexicon of around 50 words Impossible to connect alternative meanings at the same time Sony AIBO robots
7
Bi-directionality hypothesis Gradual construction Use of Combinatory Categorial Grammar (CCG) Referentiality Use of structured discourse representation models with the ability to resolve linguistic reference to situated context Persistence Different referent resolutions can be combined, which is used in visual learning Efficiency & Effectiveness Incremental comprehension model can sort out unlikely word- and meaning hypothesses; Perfomance of speech recognition and parcing is close to 90%
8
Multi-level Intergration in Language Processing Modular model Context-independant representation is constructed first and only then it is intepreted against preceding dialogue context Incremental model Every new word is related to representations of the preceding input Princimple of parsimony: Preferance of the least 'presuppositionally' heavy intepretations e.g. The postman delivered the baby. Mary gave the child the dog bit a bandaid. Incremental model is supported by the results of psycholinguistic research (saccadic eye movement research)
9
Language Processing and Situational Experience Anticipatory effect Disambiguation by scene understanding Temporal projection Focus of psycholinguistic research: How information from situation awareness effects utterance comprehension Interaction between LANGUAGE and VISION is mediated by CATEGORIES The research revlealed:
10
Talking Listening Comprehending Representing an utterance Representing the Interpretation of an Utterance in Context Comprehending an Utterance in Context Picking Up the Right Interpretation Speaking Producing an Utterance in Context Producing Speech
11
Representing an utterance Utterance is represented as ontologically richly sorted, relational structure - a logical form in a decidable fragment of modal logic I want you to put the red mug to the right of the ball
12
Packing Take the ball to the left of the box Packing node Internal relation Packing nominal
13
Packing edge Packing node target
14
Example of incremental parsing and packing of logical forms Here is the ball
15
Representing the Interpretation of an Utterance in Context Co-reference relations - relations between mentions referring to the same objects or events. eg. pronouns ('it'), anaphoric expressions ('the red mug') New referent identifier – [NEW : {antn}] Antecendant referent - [OLD : {anti}], [OLD : anti < {antj,..., antk} < NEW : {antn}]. Reference structure can specify preference orders over sets of old and new referents
16
Decision tree for dialogue moves A dialogue move ('speech act') specifies how an etterance contributes to furthering the dialogue
17
Dialogue context model Put the red ball next to the cube
18
Comprehending an Utterance in Context Cross-modal salience model Visual salience Linguistic salience Word recognition lattice
19
Example of an incremental analysis
20
Utterance interpretation at grammatical level
21
Picking up the right interpretation Parse selectoin system based on a statistical linear model explores a set of relevant acoustic, syntactic, semantic and contextual features of the parses, and computes a likelihood score for each of them. Parse selection is a function F :X →Y, where X is a set of possible input utterances, Y is a set of parses We alos assume: 1. A function GEN(x) which enumerates all possible parses for an input x. 2. A d-dimensional feature vector f (x, y) ∈ Rd, representing specific features of the pair (x, y). 3. A parameter vector w ∈ Rd Where wT · f (x, y) is the inner product, and can be seen as a measure of the 'quality' of the phrase
22
Producing an utterance in context http://mary.dfki.de:59125http://mary.dfki.de:59125/ Producing of an utterance is triggered by a communicative goal. Communicative goal specifies a dialogue move, and content which is to be communicated. The utterance realizer uses the same grammar as the parser. The MARY speech synthesis engine then produces audio output. References are generated by the use of incremental algorithm of Dale and Reiter. The algorithm is initialized with the intended referent, a contrast set and a list of prefered attributes. It incrementally tries to rule out members of the set for which a given property of the intended referent foes not hold.
23
Thank you for your attention
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.