TRIALS AND TRIBULATIONS Architectural Constraints on Modeling a Visuomotor Task within the Reinforcement Learning Paradigm.

Slides:



Advertisements
Similar presentations
IMA 2.5: Software Architecture and Development Environment Roberto Olivares M.S. Electrical Engineering Vanderbilt University, Spring 2003.
Advertisements

Dougal Sutherland, 9/25/13.
Signals and Systems March 25, Summary thus far: software engineering Focused on abstraction and modularity in software engineering. Topics: procedures,
Ch. Eick: More on Machine Learning & Neural Networks Different Forms of Learning: –Learning agent receives feedback with respect to its actions (e.g. using.
Deep Learning and Neural Nets Spring 2015
Intelligent Agents Russell and Norvig: 2
Eick: Reinforcement Learning. Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning.
Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning Applications Summary.
1. Algorithms for Inverse Reinforcement Learning 2
Motion Planning for Tower Crane Operation. Motivation  Tower crane impacts the schedule greatly  Safety of tower crane operation is critical.
Modular Neural Networks CPSC 533 Franco Lee Ian Ko.
July Terry Jones, Integrated Computing & Communications Dept Fast-OS.
Integrating POMDP and RL for a Two Layer Simulated Robot Architecture Presented by Alp Sardağ.
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Microprocessors Introduction to ia64 Architecture Jan 31st, 2002 General Principles.
The Importance of Architecture for Achieving Human-level AI John Laird University of Michigan June 17, th Soar Workshop
Lecture 5 Today’s Topics and Learning Objectives Quinn Chapter 7 Predict performance of parallel programs Understand barriers to higher performance.
Ai in game programming it university of copenhagen Reinforcement Learning [Intro] Marco Loog.
Reinforcement Learning and Soar Shelley Nason. Reinforcement Learning Reinforcement learning: Learning how to act so as to maximize the expected cumulative.
An Introduction to Social Simulation Andy Turner Presentation as part of Social Simulation Tutorial at the.
Biologically Inspired Robotics Group,EPFL Associative memory using coupled non-linear oscillators Semester project Final Presentation Vlad TRIFA.
Radial Basis Function Networks
JOB DESIGN,. JOB DESIGN Job design is a way of organising tasks, duties and responsibilities into a productive unit of the work. An outgrowth of job analysis.
COMPUTATIONAL MODELING OF INTEGRATED COGNITION AND EMOTION Bob MarinierUniversity of Michigan.
Neural mechanisms of Spatial Learning. Spatial Learning Materials covered in previous lectures Historical development –Tolman and cognitive maps the classic.
An Architecture for Empathic Agents. Abstract Architecture Planning + Coping Deliberated Actions Agent in the World Body Speech Facial expressions Effectors.
Techniques for Analysis and Calibration of Multi- Agent Simulations Manuel Fehler Franziska Klügl Frank Puppe Universität Würzburg Lehrstuhl für Künstliche.
New SVS Implementation Joseph Xu Soar Workshop 31 June 2011.
Neural Networks Chapter 6 Joost N. Kok Universiteit Leiden.
Neural Networks AI – Week 23 Sub-symbolic AI Multi-Layer Neural Networks Lee McCluskey, room 3/10
Chapter 11 – Neural Networks COMP 540 4/17/2007 Derek Singer.
Information Visualization Using 3D Interactive Animation Meng Tang 05/17/2001 George G. Robertson, Stuart K. Card, and Jock D. Mackinlay.
ANTs PI Meeting, Nov. 29, 2000W. Zhang, Washington University1 Flexible Methods for Multi-agent distributed resource Allocation by Exploiting Phase Transitions.
Chapter 13 Architectural Design
Eick: Reinforcement Learning. Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning.
Modelling Language Evolution Lecture 1: Introduction to Learning Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit.
Design Principles for Creating Human-Shapable Agents W. Bradley Knox, Ian Fasel, and Peter Stone The University of Texas at Austin Department of Computer.
Project funded by the Future and Emerging Technologies arm of the IST Programme FET-Open scheme Project funded by the Future and Emerging Technologies.
© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.
Curiosity-Driven Exploration with Planning Trajectories Tyler Streeter PhD Student, Human Computer Interaction Iowa State University
Neural Network Implementation of Poker AI
Intro. ANN & Fuzzy Systems Lecture 14. MLP (VI): Model Selection.
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
Eick: Reinforcement Learning. Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning.
Reinforcement Learning with Laser Cats! Marshall Wang Maria Jahja DTR Group Meeting October 5, 2015.
A Roadmap towards Machine Intelligence
“Architecture” The outcome of top-level design, reflecting principal design decisions Can (and should) be modified and updated Analogous to architecture.
CSC321 Lecture 5 Applying backpropagation to shape recognition Geoffrey Hinton.
RESEARCH METHODS IN INDUSTRIAL PSYCHOLOGY & ORGANIZATION Pertemuan Matakuliah: D Sosiologi dan Psikologi Industri Tahun: Sep-2009.
Data Resource Management Agenda What types of data are stored by organizations? How are different types of data stored? What are the potential problems.
Goals for Today’s Class
Page 1 2P13 Week 1. Page 2 Page 3 Page 4 Page 5.
SENG521 (Fall SENG 521 Software Reliability & Testing Preparing for Test (Part 6a) Department of Electrical & Computer Engineering,
Dynamics of Reward Bias Effects in Perceptual Decision Making Jay McClelland & Juan Gao Building on: Newsome and Rorie Holmes and Feng Usher and McClelland.
Naifan Zhuang, Jun Ye, Kien A. Hua
Machine Learning Supervised Learning Classification and Regression
End-To-End Memory Networks
Heuristic Optimization Methods
A Comparison of Learning Algorithms on the ALE
Reinforcement learning (Chapter 21)
Compositional Human Pose Regression
It is more than a chronological list of classroom activities
Human-level control through deep reinforcement learning
Very Deep Convolutional Networks for Large-Scale Image Recognition
It is more than a chronological list of classroom activities
Designing Neural Network Architectures Using Reinforcement Learning
CHAPTER I. of EVOLUTIONARY ROBOTICS Stefano Nolfi and Dario Floreano
CS 188: Artificial Intelligence Spring 2006
Background “Structurally dynamic” cellular automata (Ilachinski, Halpern 1987) have been shown to simulate biological functions with emergent behavior.
Visual Grounding.
Presentation transcript:

TRIALS AND TRIBULATIONS Architectural Constraints on Modeling a Visuomotor Task within the Reinforcement Learning Paradigm

SUBJECT OF INVESTIGATION  How humans integrate visual object properties into their action policy when learning a novel visuomotor task. BubblePop!  Problem: Too many possible questions…  Solution: Motivate behavioral research by looking at modeling difficulties. Nonobvious crossroads

APPROACH  Since the task has a scalar performance signal, model must utilize reinforcement learning. Temporal Difference Back Propagation  Start with an extremely simplified version of the task and add back the complexity once you have a successful model.  Analyze the representational and architectural constraints necessary for each model.

 5x5 grid-world  4 possible actions Up, down, left, right  1 unmoving target  Starting locations of target and agent randomly assigned  Fixed reward upon reaching target and a new target generated  Epoch ends after fixed number of steps FIRST STEPS: DUMMY WORLD

DUMMY WORLD ARCHITECTURES 25 units for the grid 4 Actions 8 Hidden Layer 1 Expected Reward (ego only) The whole grid (allocentric), or agent centered (egocentric)

 Current architectures learn each action independently.  ‘Up’ is like ‘Down’, but different. It shifts the world  1 action, 4 different inputs “In which rotation of the world would you rather go ‘up’ in?” BUILDING IN SYMMETRY

 Scaled grid size up to 10x10 Not as unrealistic as one might think… (tile coding)  Scaled number of targets Difference from 1 to 2, but not from 2 to many.  Confirmed ‘winning-est’ representation  Added memory WORLD SCALING

 Added a ‘ripeness’ dimension to target, and changed the reward function: If target.ripeness >.60 reward = 1; Else reward = ; NO LOW HANGING FRUIT: THE RIPENESS PROBLEM How the problem occurs: 1.At a high temperature you move randomly. 2.The random pops net zero reward. 3.The temperature lowers and you ignore the target entirely.

ANNEALING AWAY THE CURSE OF PICKINESS

 No feedback for almost ripe  So how could we anneal our ripeness criterion?  Anneal the amount you care about unripe pops.  Differentiate internal and extern reward functions A PSYCHOLOGICALLY PLAUSIBLE SOLUTION

FUTURE DIRECTIONS  Investigate how the type of ripeness difficulty impacts computational demands. Difficulty due to reward schedule vs. perceptual acuity vs. redundancy vs. conjunctive-ness vs. ease of prediction  How to handle the ‘Feature Binding ‘Problem’ in this context Emergent binding through deep learning?  Just keep increasing complexity and see what problems crop up. If the model gets to human level performance without a hitch, then that’d be pretty good to.

SUMMARY& DISCUSSION  Egocentric representations pay off in this domain, even with the added memory cost. In any domain with a single agent?  Symmetries in the action space can be exploited to greatly expedite learning Could there be a general mechanism for detecting such symmetries?  Difficult reward functions might be learnt via annealing internal reward signals. How could we have this annealing emerge from the model?

QUESTIONS?