Learning To Use Memory Nick Gorski & John Laird Soar Workshop 2011.

Slides:



Advertisements
Similar presentations
TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST
Advertisements

LEUCEMIA MIELOIDE AGUDA TIPO 0
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
You have been given a mission and a code. Use the code to complete the mission and you will save the world from obliteration…
Bellwork If you roll a die, what is the probability that you roll a 2 or an odd number? P(2 or odd) 2. Is this an example of mutually exclusive, overlapping,
Advanced Piloting Cruise Plot.
Analysis of Computer Algorithms
Chapter 24 Quality Management.
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 5 Author: Julia Richards and R. Scott Hawley.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.
1 Chapter 40 - Physiology and Pathophysiology of Diuretic Action Copyright © 2013 Elsevier Inc. All rights reserved.
By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.
Document #07-12G 1 RXQ Customer Enrollment Using a Registration Agent Process Flow Diagram (Switch) Customer Supplier Customer authorizes Enrollment.
Document #07-12G 1 RXQ Customer Enrollment Using a Registration Agent Process Flow Diagram (Switch) Customer Supplier Customer authorizes Enrollment.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
My Alphabet Book abcdefghijklm nopqrstuvwxyz.
Multiplying binomials You will have 20 seconds to answer each of the following multiplication problems. If you get hung up, go to the next problem when.
0 - 0.
ALGEBRAIC EXPRESSIONS
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
MULTIPLYING MONOMIALS TIMES POLYNOMIALS (DISTRIBUTIVE PROPERTY)
ADDING INTEGERS 1. POS. + POS. = POS. 2. NEG. + NEG. = NEG. 3. POS. + NEG. OR NEG. + POS. SUBTRACT TAKE SIGN OF BIGGER ABSOLUTE VALUE.
SUBTRACTING INTEGERS 1. CHANGE THE SUBTRACTION SIGN TO ADDITION
MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
Addition Facts
ALGEBRAIC EXPRESSIONS
Year 6 mental test 5 second questions
ZMQS ZMQS
Richmond House, Liverpool (1) 26 th January 2004.
BT Wholesale October Creating your own telephone network WHOLESALE CALLS LINE ASSOCIATED.
Slide 1 Copyright © 2004 Glenna R. Shaw & FTC Publishing Background Courtesy of Awesome BackgroundsAwesome BackgroundsDeliberation!Deliberation!
ABC Technology Project
Health, Nutrition & Fitness. 2 | Health, Nutrition & Fitness Health, Nutrition & Fitness Nutrition Fruit & Vegetables Bread, Rice, Potatoes, Pasta Starchy.
2 |SharePoint Saturday New York City
O X Click on Number next to person for a question.
© S Haughton more than 3?
© Charles van Marrewijk, An Introduction to Geographical Economics Brakman, Garretsen, and Van Marrewijk.
© Charles van Marrewijk, An Introduction to Geographical Economics Brakman, Garretsen, and Van Marrewijk.
© Charles van Marrewijk, An Introduction to Geographical Economics Brakman, Garretsen, and Van Marrewijk.
VOORBLAD.
1 Breadth First Search s s Undiscovered Discovered Finished Queue: s Top of queue 2 1 Shortest path from s.
Twenty Questions Subject: Twenty Questions
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Linking Verb? Action Verb or. Question 1 Define the term: action verb.
Squares and Square Root WALK. Solve each problem REVIEW:
Lets play bingo!!. Calculate: MEAN Calculate: MEDIAN
Past Tense Probe. Past Tense Probe Past Tense Probe – Practice 1.
Chapter 5 Test Review Sections 5-1 through 5-4.
GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.
1 First EMRAS II Technical Meeting IAEA Headquarters, Vienna, 19–23 January 2009.
Addition 1’s to 20.
25 seconds left…...
Test B, 100 Subtraction Facts
Week 1.
A Cognitive Architecture Theory of Comprehension and Appraisal: Unifying Cognitive Functions and Appraisal Bob Marinier John Laird University of Michigan.
We will resume in: 25 Minutes.
A SMALL TRUTH TO MAKE LIFE 100%
1 Unit 1 Kinematics Chapter 1 Day
O X Click on Number next to person for a question.
TASK: Skill Development A proportional relationship is a set of equivalent ratios. Equivalent ratios have equal values using different numbers. Creating.
How Cells Obtain Energy from Food
1 Soar Emote Bob Marinier John Laird University of Michigan.
Chapter 30 Induction and Inductance In this chapter we will study the following topics: -Faraday’s law of induction -Lenz’s rule -Electric field induced.
Integrating Background Knowledge and Reinforcement Learning for Action Selection John E. Laird Nate Derbinsky Miller Tinkerhess.
John Laird, Nate Derbinsky , Jonathan Voigt University of Michigan
Presentation transcript:

Learning To Use Memory Nick Gorski & John Laird Soar Workshop 2011

Agent Memory & Learning Memory Environment action observations reward 2

Agent Actions, Internal & External Environment action observations reward {go left, go right, eat food, bid 5 5s, pick a flower} action {store, retrieve, maintain} 3 Memory

Internal Actions Over Memory Internal actions are deliberate or automatic Automatic actions are in the background – Architectural and always happen – Ex: storage to episodic memory Deliberate actions are in the foreground – Procedural knowledge and cognitive – Ex: storage to working memory 4

Agent Internal Reinforcement Learning Environment action observations reward 5 Memory Reinforcement Learning Reinforcement Learning Action Selection

Assumptions Custom framework, not using Soar Simple memory models Simple tasks 6

Learning to Use Memory Research Question: – When can agents learn to use memory? Idea: – Investigate dynamics of memory and environment independently Need: – Simple, parameterized task 7

An Interactive TMaze LEFT (observation) {forward} (avail. actions) 8

An Interactive TMaze DECIDE (observation) {left, right} (avail. actions) 9

An Interactive TMaze +1 (reward) 10

TMaze A/B C C Base TMaze 11 Question: how much knowledge is needed to perform this task?

Parameterized TMazes A/B C C Base TMazeTemporal Delay A/B C C C C # Dependent Actions D D D D A/B X/Y C C Concurrent Knowledge A B W X Y Z C C Amt. of Knowledge A/B C C 2 nd Order Knowledge 12

Two Working Memory Models Internal action toggles between memory states Less expressive Ungrounded knowledge Internal action stores current observation More expressive Grounded knowledge Bit memory toggle A/B Gated WM gate

TMaze 14 A/B C C Base TMaze

Bit Memory & TMaze Methodology: – Modify memory to attribute blame Interfering behavior in choice location Doesnt manifest with GWM 15

State Diagram: Bit Memory TMaze 16 L L L L 1 1 trueperceptmem L L L L 0 0 trueperceptmem R R R R 1 1 trueperceptmem R R R R 0 0 trueperceptmem L L C C 1 1 trueperceptmem R R C C 1 1 trueperceptmem R R C C 0 0 trueperceptmem L L C C 0 0 trueperceptmem toggle STARTING STATES up leftright leftright leftright toggle leftright

State Diagram: GWM TMaze 17 L L L L L L trueperceptmem L L L L trueperceptmem R R R R trueperceptmem R R R R R R trueperceptmem L L C C L L trueperceptmem L L C C trueperceptmem R R C C trueperceptmem R R C C R R trueperceptmem L L C C C C trueperceptmem R R C C C C trueperceptmem gate STARTING STATES gate up gate leftright leftrightleftright left right

Number of Dependent Actions 18 C C # Dependent Actions D D D D

What Weve Learned Our machine learning intuition is often wrong (and yours probably is, too!) Chicken & Egg Problem State ambiguity is very problematic to learning 19

Chicken & Egg Problem Prospective uses of memory are hard Case study: bit memory & TMazes 20 1/0 Bit memory Endemic across memory models A/B C C Base TMazeChicken & Egg Problem: Must learn an association between 1 & 0 and A & B Must learn an association between 1 & 0 and left & right To be effective, cant self- interfere with memory in C!

Implications for Soar Soar natively supports learning internal acts. Next step: learning to use Soars memories Learning alongside hand-coded procedural knowledge is potentially strong approach Soar got the WM model right RL will never be a magic bullet 21

Nuggets & Coal Nearly finished! Better understanding of RL + memory, and thus Soar 9 Parameterized, empirical evaluations of RL gaining traction Optimality not only metric of performance Not quite finished! Qualitative results, but no closed form results yet No recent results for long term memories Not immediately applicable to Soar 22