Joost Broekens, Doug DeGroot, LIACS, University of Leiden, The Netherlands Emergent Representations and Reasoning in Adaptive Agents Joost Broekens, Doug.

Slides:



Advertisements
Similar presentations
Ulams Game and Universal Communications Using Feedback Ofer Shayevitz June 2006.
Advertisements

Affective Facial Expressions Facilitate Robot Learning Joost Broekens Pascal Haazebroek LIACS, Leiden University, The Netherlands.
Computational Aspects of Emotion in Adaptive Behavior Joost Broekens, Walter Kosters, Fons Verbeek LIACS, Leiden University, The Netherlands.
Learning from Observations Chapter 18 Section 1 – 3.
Machine Learning: Intro and Supervised Classification
ARCHITECTURES FOR ARTIFICIAL INTELLIGENCE SYSTEMS
An Adaptive System for User Information needs based on the observed meta- Knowledge AKERELE Olubunmi Doctorate student, University of Ibadan, Ibadan, Nigeria;
Consistent Dynamic-group Emotions for Virtual Agents. Abstract The use of computational models of emotion in virtual agents enhances the realism of these.
Bayesian Network and Influence Diagram A Guide to Construction And Analysis.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Thinking: Concept Formation Concept formation: identifying commonalities across stimuli that unite them into a common category Rule learning: identifying.
Stephen McCray and David Courard-Hauri, Environmental Science and Policy Program, Drake University Introduction References 1.Doran, P. T. & Zimmerman,
Template design only ©copyright 2008 Ohio UniversityMedia Production Spring Quarter  A hierarchical neural network structure for text learning.
Dynamic Bayesian Networks (DBNs)
Kostas Kontogiannis E&CE
New Mexico Computer Science for All Agent-based modeling By Irene Lee December 27, 2012.
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Reinforcement Learning.
Planning for Inquiry The Learning Cycle. What do I want the students to know and understand? Take a few minutes to observe the system to be studied. What.
Evolutionary Computational Intelligence Lecture 10a: Surrogate Assisted Ferrante Neri University of Jyväskylä.
Reinforcement Learning
EE141 1 Broca’s area Pars opercularis Motor cortexSomatosensory cortex Sensory associative cortex Primary Auditory cortex Wernicke’s area Visual associative.
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Mental Development and Representation Building through Motivated Learning Janusz A. Starzyk, Ohio University, USA, Pawel Raif, Silesian University of Technology,
1 Emulating AQM from End Hosts Presenters: Syed Zaidi Ivor Rodrigues.
Chapter Three THE RESEARCH PROCESS
IJCNN, International Joint Conference on Neural Networks, San Jose 2011 Pawel Raif Silesian University of Technology, Poland, Janusz A. Starzyk Ohio University,
Reinforcement Learning and Soar Shelley Nason. Reinforcement Learning Reinforcement learning: Learning how to act so as to maximize the expected cumulative.
Data and Data Collection Quantitative – Numbers, tests, counting, measuring Fundamentally--2 types of data Qualitative – Words, images, observations, conversations,
Bottom-Up Coordination in the El Farol Game: an agent-based model Shu-Heng Chen, Umberto Gostoli.
1 Lesson 1 Introduction to Social Psychology and Some Research Methods.
Chapter 11: Artificial Intelligence
Chapter 14: Artificial Intelligence Invitation to Computer Science, C++ Version, Third Edition.
REINFORCEMENT LEARNING LEARNING TO PERFORM BEST ACTIONS BY REWARDS Tayfun Gürel.
Computational Investigations of the Regulative Role of Pleasure in Adaptive Behavior Action-Selection Biased by Pleasure-Regulated Simulated Interaction.
Swarm Computing Applications in Software Engineering By Chaitanya.
How to write your special study Step by step guide.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
6.1 WELCOME TO COMMON CORE HIGH SCHOOL MATHEMATICS LEADERSHIP SUMMER INSTITUTE 2014 SESSION 6 23 JUNE 2014 TWO-WAY TABLES AND ASSOCIATION.
Universit at Dortmund, LS VIII
Learning from Observations Chapter 18 Through
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
Testing Theories: The Problem of Sampling Error. The problem of sampling error It is often the case—especially when making point predictions—that what.
Recuperação de Informação B Cap. 10: User Interfaces and Visualization , , 10.9 November 29, 1999.
Can first grade students’ interest levels and skills in mathematics increase when they are exposed to engaging real world mathematical tasks? By: Valerie.
Inga ZILINSKIENE a, and Saulius PREIDYS a a Institute of Mathematics and Informatics, Vilnius University.
Academic Research Academic Research Dr Kishor Bhanushali M
Statistical Inference An introduction. Big picture Use a random sample to learn something about a larger population.
Transfer in Variable - Reward Hierarchical Reinforcement Learning Hui Li March 31, 2006.
1 Real-Time Parking Information on Parking-Related Travel Cost TRIP Internship Presentation 2014 Kory Harb July 24, 2014 Advisor: Dr. Yafeng Yin Coordinator:
Working Memory and Learning Underlying Website Structure
Model Minimization in Hierarchical Reinforcement Learning Balaraman Ravindran Andrew G. Barto Autonomous Learning Laboratory.
PHANTOMS: A Method of Testing Hypotheses
International Conference on Fuzzy Systems and Knowledge Discovery, p.p ,July 2011.
Will Britt and Bryan Silinski
Chapter 18 Section 1 – 3 Learning from Observations.
An argument-based framework to model an agent's beliefs in a dynamic environment Marcela Capobianco Carlos I. Chesñevar Guillermo R. Simari Dept. of Computer.
From NARS to a Thinking Machine Pei Wang Temple University.
Reinforcement Learning. Overview Supervised Learning: Immediate feedback (labels provided for every input). Unsupervised Learning: No feedback (no labels.
McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. Part One INTRODUCTION TO BUSINESS RESEARCH.
Chapter 10 (3.8) Marketing Research.  What is Marketing Research? Marketing research is the systematic design, collection, analysis, and reporting of.
Done by Fazlun Satya Saradhi. INTRODUCTION The main concept is to use different types of agent models which would help create a better dynamic and adaptive.
Part One INTRODUCTION TO BUSINESS RESEARCH
Introduce to machine learning
Chapter 11: Artificial Intelligence
Chapter 11: Artificial Intelligence
The Matching Hypothesis
Artificial Intelligence
Computational Aspects of Emotion in Adaptive Behavior
Presentation transcript:

Joost Broekens, Doug DeGroot, LIACS, University of Leiden, The Netherlands Emergent Representations and Reasoning in Adaptive Agents Joost Broekens, Doug DeGroot Leiden University, LIACS, Leiden. {broekens,

Joost Broekens, Doug DeGroot, LIACS, University of Leiden, The Netherlands Overview Introduction Interactivism Hypothesis Computational Model based on Interactivist Concepts Experiments Results Conclusion Questions?

Joost Broekens, Doug DeGroot, LIACS, University of Leiden, The Netherlands Introduction Adaptive Agents: –Flexible models of the world. (continuous online learning). –Efficient memory retrieval. –Efficient relevant reasoning context (how to select relevant information from a large collection of beliefs) –How to represent knowledge? –What is reasoning?

Joost Broekens, Doug DeGroot, LIACS, University of Leiden, The Netherlands Interactivism (1/3) Interactivism (Bickhard) proposes: –Coupling of (properties of) situations and actions possible in that situation: Interaction Potential (IP) –IP concept as primitive for representations. –Potential Interactions are prepared by prior interactions.  An IP is conditional on prior interactions Example: brush –IPs are organized in a hierarchical web-like fashion. –Parts of this web remain invariant under many other interactions Example: brush –IPs stabilize and destabilize based on correct prediction/preparation

Joost Broekens, Doug DeGroot, LIACS, University of Leiden, The Netherlands Interactivism (2/3) shower dry work got home brush/desk brushPut away brush/desk Put awaybrush Time

Joost Broekens, Doug DeGroot, LIACS, University of Leiden, The Netherlands Interactivism (3/3) Interactivism and Reasoning. –Model-learning: (de)stabilization of IPs through continuous interaction with the world constructs representations of the world. Representations have implicit content (certain properties of a situation a allows for x,y interactions, making a different from situation b lacking these properties). Truth value (I tried an interaction x, but y happened, so it was not x). –Task-learning: preference between at least two interactions based on bias. Reinforcement signal. So: an IP has (at least) two properties: stability and expected return.

Joost Broekens, Doug DeGroot, LIACS, University of Leiden, The Netherlands Hypothesis Reasoning and Decision making are emergent properties of interactivist representational systems. –Create a computational model strictly based on interactivist assumptions. –Create a task that needs a decision by the agent. Minimal reasoning: –“any observable behavior that reflects a beneficial decision between at least two possibilities that is neither explicable due to chance, nor without representations”.

Joost Broekens, Doug DeGroot, LIACS, University of Leiden, The Netherlands Computational model (1/3) Basis: hierarchical directed graph. The agent’s actions and stimuli from the world are assumed to be the same kind of information. Nodes represent interactions. –Nodes can be active (used) or prepared (hypothesized). –Primary nodes: stimulus (action or stimulus from the world). –Secondary nodes: interaction potentials. –Hierarchy of secondary nodes: IP hierarchy.

Joost Broekens, Doug DeGroot, LIACS, University of Leiden, The Netherlands Computational Model (2/3) Example (1, 2 = location in a maze, d = down): –Model is empty at startup. –a: agent goes down, and builds node for “down” –b: agent arrives at location 1, and builds interaction –c: agent goes down, and builds interaction. –d: agent arrives at location 2, and builds interactions d D12 1-DD-1 (D-1)-D ((D-1)-D)-2 (1-D)-2 D-2 D c D1 1-DD-1 (D-1)-D D b D1 D-1 a D

Joost Broekens, Doug DeGroot, LIACS, University of Leiden, The Netherlands Computational Model (3/3) Model learning and task-learning: exposure (continuous interaction) and reinforcement. Exposure (local): –Build conditional probabilistic model of the environment, but only adapt locally: count activations of IPs. –If usage of IP is lower than arbitrary threshold, throw away node. Reinforcement (local): –Update active IPs with current reinforcement signal. –Propagate reinforcement through IP hierarchy based on local probabilities of the environment, only use prepared IPs. Biased selection: –Propose action based on WTA selection of proposed interactions.

Joost Broekens, Doug DeGroot, LIACS, University of Leiden, The Netherlands Experiments (1/3) Model learning: does the agent learn an adaptive model of the environment? –Test for reuse of old information in new situation (a, b,c,d, e). –Test for quick adaptation to a new maze (a, b, e). Maze setup: c eabcd Black: agent Red: lava (Rf=-1) Yellow: food (Rf=1)

Joost Broekens, Doug DeGroot, LIACS, University of Leiden, The Netherlands Experiments (2/3) Selection task (simple reasoning): Is the agent able to make a beneficial informed decision. –Chose between two options, choice can be made only if there is knowledge (representation) about the other option (informed choice). (d, b, f) –Test for convergence in a randomly changing situation (g). Maze setup: dbf g Black: agent Red: lava (Rf=-1) Yellow: food (Rf=1)

Joost Broekens, Doug DeGroot, LIACS, University of Leiden, The Netherlands Experiments (3/3) Ran experiments for different maze setups : –30 runs per setup. –In every run the agent has 100 trials to find the food. –Max 1000 steps per trial. Plotted average learning curves of the trails over the 30 runs.

Joost Broekens, Doug DeGroot, LIACS, University of Leiden, The Netherlands Results (1/3) Agent learns adaptive model of the environment and reuses information:

Joost Broekens, Doug DeGroot, LIACS, University of Leiden, The Netherlands Results (2/3) Agent learns to make a beneficial decision at the crossing.

Joost Broekens, Doug DeGroot, LIACS, University of Leiden, The Netherlands Results (3/3) A representation of a potential food location is learned: –The agent is able to try one location, and if the food is not there, try a second one. –This means the agent has a stable representation of “food is not here”. –Representation: content (food), truth value (food not here). The ability to make an informed choice indeed emerges from an Interactivism based model: –The agent learns what a crossing is and how to use it: The concept of a crossing is not introduced in the model. The agent chooses a different action the second time it arrives at the crossing only if food has not been found earlier (informed choice).

Joost Broekens, Doug DeGroot, LIACS, University of Leiden, The Netherlands Conclusion Interactivist based models are useful for the computational investigation of knowledge representation and reasoning in agents. Representations and reasoning can indeed emerge from a computational model based on interactivist assumptions when used in an agent that continuously interacts with the environment. Future work: –literature search into machine learning mechanisms –“imagination”. –Neuronal implementation.

Joost Broekens, Doug DeGroot, LIACS, University of Leiden, The Netherlands Questions?