1 Viewing Vision-Language Integration as a Double-Grounding case Katerina Pastra Department of Computer Science, Natural Language Processing Group, University.

Slides:



Advertisements
Similar presentations
Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California USA
Advertisements

Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California USA
Pat Langley School of Computing and Informatics Arizona State University Tempe, Arizona A Cognitive Architecture for Integrated.
Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California A Cognitive Architecture for Complex Learning.
GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
National Technical University of Athens Department of Electrical and Computer Engineering Image, Video and Multimedia Systems Laboratory
Structured Design The Structured Design Approach (also called Layered Approach) focuses on the conceptual and physical level. As discussed earlier: Conceptual.
Cognitive Systems, ICANN panel, Q1 What is machine intelligence, as beyond pattern matching, classification and prediction. What is machine intelligence,
ARCHITECTURES FOR ARTIFICIAL INTELLIGENCE SYSTEMS
Chapter Thirteen Conclusion: Where We Go From Here.
ICS 101 Fall 2011 Introduction to Artificial Intelligence Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa.
SSP Re-hosting System Development: CLBM Overview and Module Recognition SSP Team Department of ECE Stevens Institute of Technology Presented by Hongbing.
Provisional draft 1 ICT Work Programme Challenge 2 Cognition, Interaction, Robotics NCP meeting 19 October 2006, Brussels Colette Maloney, PhD.
Requirements Analysis Concepts & Principles
A production workflow for situating examples of knowledge in video lectures How to secure the reusability of knowledge Harald Kjellin, Gunnar Wettergren.
Graphics Annotation Usability in eLearning Applications Dorian Gorgan, Teodor Ştefănuţ Computer Science Department Technical University of Cluj-Napoca.
Common Core State Standards Professional Learning Module Series
1212 Management and Communication of Distributed Conceptual Design Knowledge in the Building and Construction Industry Dr.ir. Jos van Leeuwen Eindhoven.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 5 Slide 1 Requirements engineering l The process of establishing the services that the.
Emotional Intelligence and Agents – Survey and Possible Applications Mirjana Ivanovic, Milos Radovanovic, Zoran Budimac, Dejan Mitrovic, Vladimir Kurbalija,
Image-Language Association: are we looking at the right features? Katerina Pastra Language Technology Applications, Institute for Language and Speech Processing,
Smart Learning Services Based on Smart Cloud Computing
ARTIFICIAL INTELLIGENCE [INTELLIGENT AGENTS PARADIGM] Professor Janis Grundspenkis Riga Technical University Faculty of Computer Science and Information.
Vision-Language Integration in AI: a reality check Katerina Pastra and Yorick Wilks Department of Computer Science, Natural Language Processing Group,
ICS 101 Fall 2011 Introduction to Artificial Intelligence Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa.
The Newell Test for a Theory of Mind Anderson, John R., & Lebiere, Christian (forthcoming), “The Newell Test for a Theory of Mind”, Behavioral & Brain.
Exploring Design Innovation: The AI Method and Some Results Ashok Goel Georgia Tech May 18, 2006.
The Electronic Geometry Textbook Project Xiaoyu Chen LMIB - Department of Mathematics Beihang University, China.
Design Science Method By Temtim Assefa.
Towards Cognitive Robotics Biointelligence Laboratory School of Computer Science and Engineering Seoul National University Christian.
Sampletalk Technology Presentation Andrew Gleibman
Pastra et al., LREC 2002 How feasible is the reuse of grammars for Named Entity Recognition? Katerina Pastra, Diana Maynard, Oana Hamza, Hamish Cunningham.
Synthetic Cognitive Agent Situational Awareness Components Sanford T. Freedman and Julie A. Adams Department of Electrical Engineering and Computer Science.
The strategy of internationalization in universities Authors: Rami M.Ayoubi & Hiba K. Massoud Year: 2007 Volume: 21 No: 4 Published by: international journal.
ARTIFICIAL INTELLIGENCE [INTELLIGENT AGENTS PARADIGM] Professor Janis Grundspenkis Riga Technical University Faculty of Computer Science and Information.
1 Introduction to Software Engineering Lecture 1.
Virtual Canada 2.0. » Knowledge is not just information » Knowledge is not philosophy (but it can be approached through philosophical inquiry) » There.
ARTIFICIAL INTELLIGENCE Human like intelligence Definitions: 1. Focus on intelligent Behaviour “Behaviour by a machine that, if performed by a human.
Artificial Intelligence Research Center Pereslavl-Zalessky, Russia Program Systems Institute, RAS.
FDT Foil no 1 On Methodology from Domain to System Descriptions by Rolv Bræk NTNU Workshop on Philosophy and Applicablitiy of Formal Languages Geneve 15.
Course Instructor: K ashif I hsan 1. Chapter # 1 Kashif Ihsan, Lecturer CS, MIHE2.
An Evaluation Tool for Natural Language Processing Systems Audrey N. Mbeje Department of Computer Science Ball State University November 09, 2000.
Intelligent Robot Architecture (1-3)  Background of research  Research objectives  By recognizing and analyzing user’s utterances and actions, an intelligent.
Giuliana Dettori ITD CNR, Genoa, Italy  researcher of Italy’s National Research Council  formative studies in mathematics  initial research experience.
Cognitive Systems Foresight Language and Speech. Cognitive Systems Foresight Language and Speech How does the human system organise itself, as a neuro-biological.
ARD Prasad Indian Statistical Institute, Bangalore.
Plans and Situated Actions
1 Knowledge Acquisition and Learning by Experience – The Role of Case-Specific Knowledge Knowledge modeling and acquisition Learning by experience Framework.
University of Kurdistan Artificial Intelligence Methods (AIM) Lecturer: Kaveh Mollazade, Ph.D. Department of Biosystems Engineering, Faculty of Agriculture,
What is Artificial Intelligence?
RULES Patty Nordstrom Hien Nguyen. "Cognitive Skills are Realized by Production Rules"
Anne Watson Hong Kong  grasp formal structure  think logically in spatial, numerical and symbolic relationships  generalise rapidly and broadly.
Computer Vision as an Engineering Problem. A Hierarchical Layer Model. A presentation by Amit Benbassat.
“Intelligent User Interfaces” by Hefley and Murray.
Cognitive Architectures and General Intelligent Systems Pay Langley 2006 Presentation : Suwang Jang.
Slide no 1 Cognitive Systems in FP6 scope and focus Colette Maloney DG Information Society.
Chapter 6 Guidelines for Modelling. 1. The Modelling Process 1. Modelling as a Transformation Process 2. Basic Modelling Activities 3. Types of Modelling.
1 Artificial Intelligence & Prolog Programming CSL 302.
Visual culture. What is it? Visual culture would be the symbols and signs we encounter every day and the importance and interpretations that our collective.
INSTRUCTIONAL DESIGN Many definitions exist for instructional design 1. Instructional Design as a Process: 2. Instructional Design as a Discipline: 3.
Analysis of Computing Options at ISU
Lecture #1 Introduction
Formalizations of Commonsense Psychology
Interdisciplinary research on language & speech
Interdisciplinary research on language & speech
Course Instructor: knza ch
Introduction Artificial Intelligent.
Tomás Murillo-Morales and Klaus Miesenberger
Informational Networks
Institute of Computing Technology
Presentation transcript:

1 Viewing Vision-Language Integration as a Double-Grounding case Katerina Pastra Department of Computer Science, Natural Language Processing Group, University of Sheffield, U.K. Language Technology Group, Institute for Language and Speech Processing (ILSP), Athens, Greece

2 Vision-Language Integration  What is computational V-L integration? (definition)  How is it achieved? (state of the art, trends, needs)  Why is it needed in AI agents? (explanatory/theoretical ground)  How far can we go? (implementation suggestions, the VLEMA prototype)  Content integration vs. technical integration à Small part within cognitive architectures à Small part of the integration story still, lack of an AI study of V-L integration 

3 The notion of integration State of the art V-L integration prototypes: a)Deal with blocksworlds or miniworlds (scalability issues) b)Work with already abstracted/analysed visual input c)Rely on integration resources for V-L association Descriptive Definition  Computational V-L integration is a process of associating visual and corresponding linguistic representations  It may take the form of one of four integration processes according to the integration purpose to be served

4 Classification of V-L integration prototypes System typeIntegration Process Performance Enhancement Medium x analysis  Medium y analysis (NL  IU, or NL  IU) Medium Translation Source medium analysis  Target medium gen. (image  language or image  language) Multimedia Generation Abstracted data  Multimedia generation (tabular data or knowledge representation) Situated Dialogue Multimedia analysis  Medium/multimedia gen. (NL analysis and shared visual scene  action/MM)

5 Why do agents need V-L integration ?  Inherent characteristics of integrated media: Does each one of them lack something that the other can compensate for?  Gains for an agent in communication: Are agents with V-L integration abilities more intelligent? Why do we need to know ?  to decide on the significance of such a mechanism for an artificial agent  to get the theoretical ground needed for research that is currently done mostly ad hoc and in isolation within different AI sub-areas

6 Inherent Characteristics  Images: - reference object: physical or mental - lack inherent means of indicating focus/salience (cf. indexical-deictic mechanisms in vision theories) - lack inherent means of indicating type:token distinctions (i.e. level of abstraction)  Language: - reference object: mental - has subtle mechanisms for indicating level of abstraction - has mechanisms for controlling attendance to details, focus etc. - lacks direct access to the physical world (cf. indexicals)

7 From Symbol Grounding… From the Symbol Grounding debate we get the following : - Language lacks direct access to the physical world - Language needs such access to express intentionality - Symbol grounding is a process of associating symbols/language with percepts (visual percepts) - Symbol grounding provides language direct access to the physical world - An agent must perform symbol grounding on its own to be intrinsically intentional (must go beyond instantiation of associations to inference)

Visual Perception Representations Linguistic Representations Association Direct Access Grounding

9 Shifting the focus from symbols… Relying on the inherent characteristics of images, one may argue that : - Images lack controlled access to mental aspects of the world - Images need such access to express intentionality - Image grounding is a process of associating images with language - Image grounding provides images controlled access to the mental world - An agent must perform image-grounding on its own to be intrinsically intentional

Visual Representations Linguistic Representations Association Direct Access Uncontrolled Access Grounding

11 From Symbol Grounding to Double Grounding The Double-Grounding Theory: - Double-grounding is a process of associating symbolic with iconic representations - Double-grounding provides language a direct access to the physical world, and at the same time it provides vision a controlled access to mental aspects of the world - Vision-language integration is a case of double-grounding - V-L integration compensates for features images and language inherently lack on their own – it is necessary for expressing and understanding intentionality in V-L MM situations

12  V-L integration abilities are needed for an agent to be intentional in MM situations  Exploring how V-L integration can be achieved computationally, one realises that this research issue involves not only the perceptual and linguistic modules of a cognitive architecture, but also the learning and reasoning ones.  The corresponding AI communities need, therefore, to join forces for addressing the challenges in endowing agents with their own V-L integration abilities Viewing integration as a double-grounding case

13 The AI quest for V-L Integration In relying on human created data, state of the art V-L integration systems avoid core integration challenges and therefore fail to perform real integration Can we do better? How far can we go ??? Challenging current practices means that a prototype should:  work with real visual scenes  analyse its visual data automatically  associate images and language automatically Is it feasible to develop such a prototype ???

14 An optimistic answer VLEMA: A Vision-Language intEgration MechAnism  Input: automatically re-constructed static scenes in 3D (VRML format) from RESOLV (robot-surveyor)  Integration task: Medium Translation from images (3D sitting rooms) to text (what and where in EN)  Domain: estates surveillance  Horizontal prototype  Implemented in shell programming and ProLog