Soar One-hour Tutorial John E. Laird University of Michigan March 2009 Supported in part by DARPA and ONR.

Slides:



Advertisements
Similar presentations
University Paderborn 07 January 2009 RG Knowledge Based Systems Prof. Dr. Hans Kleine Büning Reinforcement Learning.
Advertisements

1 A B C
Simplifications of Context-Free Grammars
Variations of the Turing Machine
PDAs Accept Context-Free Languages
Process Description and Control
AP STUDY SESSION 2.
1
Reinforcement Learning
Sequential Logic Design
Processes and Operating Systems
Copyright © 2013 Elsevier Inc. All rights reserved.
STATISTICS HYPOTHESES TEST (I)
1 On the Various Conceptualizations of Systems …and Their Impact on the Practice of Systems Engineering 2008 INCOSE Symposium James N Martin Timothy L.
David Burdett May 11, 2004 Package Binding for WS CDL.
Prepared by: Workforce Enterprise Services For: The Illinois Department of Commerce and Economic Opportunity Bureau of Workforce Development ENTRY OF EMPLOYER.
Create an Application Title 1Y - Youth Chapter 5.
CALENDAR.
1 Chapter 12 File Management Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,
The 5S numbers game..
1.
Chapter 7: Steady-State Errors 1 ©2000, John Wiley & Sons, Inc. Nise/Control Systems Engineering, 3/e Chapter 7 Steady-State Errors.
Bob Marinier 27 th Soar Workshop May 24, SESAME is a theory of human cognition Stephan Kaplan (University of Michigan) Modeled at the connectionist.
Welcome. © 2008 ADP, Inc. 2 Overview A Look at the Web Site Question and Answer Session Agenda.
Break Time Remaining 10:00.
The basics for simulations
Chapter 11: Models of Computation
Turing Machines.
Database Performance Tuning and Query Optimization
FIGURE 6-1 Comparison of: (a) ac waveform: (b) dc waveform; (c) dc variable power supply and battery-sources of dc; (d) function generator-a source.
PP Test Review Sections 6-1 to 6-6
Briana B. Morrison Adapted from William Collins
Chapter 3 Logic Gates.
Operating Systems Operating Systems - Winter 2010 Chapter 3 – Input/Output Vrije Universiteit Amsterdam.
Lexical Analysis Arial Font Family.
Computer vision: models, learning and inference
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
Adding Up In Chunks.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
Artificial Intelligence
1 Lab 17-1 ONLINE LESSON. 2 If viewing this lesson in Powerpoint Use down or up arrows to navigate.
: 3 00.
5 minutes.
1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.
Systems Analysis and Design in a Changing World, Fifth Edition
Types of selection structures
12 System of Linear Equations Case Study
Converting a Fraction to %
Bob Marinier Based on paper by Lehman, Laird, Rosenbloom NSF, DARPA, ONR University of Michigan April 18, 2008.
CSE20 Lecture 15 Karnaugh Maps Professor CK Cheng CSE Dept. UC San Diego 1.
Clock will move after 1 minute
Chapter 11 Creating Framed Layouts Principles of Web Design, 4 th Edition.
Chapter 13: Digital Control Systems 1 ©2000, John Wiley & Sons, Inc. Nise/Control Systems Engineering, 3/e Chapter 13 Digital Control Systems.
Physics for Scientists & Engineers, 3rd Edition
Select a time to count down from the clock above
Copyright Tim Morris/St Stephen's School
1.step PMIT start + initial project data input Concept Concept.
1 Dr. Scott Schaefer Least Squares Curves, Rational Representations, Splines and Continuity.
1 Decidability continued…. 2 Theorem: For a recursively enumerable language it is undecidable to determine whether is finite Proof: We will reduce the.
FIGURE 3-1 Basic parts of a computer. Dale R. Patrick Electricity and Electronics: A Survey, 5e Copyright ©2002 by Pearson Education, Inc. Upper Saddle.
Princess Sumaya University
1 Episodic Memory for Soar Andrew Nuxoll 15 June 2005.
1 Episodic Memory for Soar Agents Andrew Nuxoll 11 June 2004.
1 Soar Semantic Memory Yongjia Wang University of Michigan.
The Importance of Architecture for Achieving Human-level AI John Laird University of Michigan June 17, th Soar Workshop
A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of Michigan.
Beyond Chunking: Learning in Soar March 22, 2003 John E. Laird Shelley Nason, Andrew Nuxoll and a cast of many others University of Michigan.
Learning Fast and Slow John E. Laird
Presentation transcript:

Soar One-hour Tutorial John E. Laird University of Michigan March Supported in part by DARPA and ONR 1

Tutorial Outline 1.Cognitive Architecture 2.Soar History 3.Overview of Soar 4.Details of Basic Soar Processing and Syntax –Internal decision cycle –Interaction with external environments –Subgoals and meta-reasoning –Chunking 5.Recent extensions to Soar –Reinforcement Learning –Semantic Memory –Episodic Memory –Visual Imagery 2

Learning How can we build a human-level AI? 3 Tasks Neurons Neural Circuits Brain Structure Calculus History Reading Sudoku Shopping Driving Talking on cell phone

Learning How can we build a human-level AI? Tasks Neurons Neural Circuits Brain Structure Calculus History Reading Sudoku Shopping Driving Talking on cell phone 4 Programs Computer Architecture Logic Circuits Electrical circuits

Learning How can we build a human-level AI? Tasks Neurons Neural Circuits Brain Structure Calculus History Reading Sudoku Shopping Driving Talking on cell phone 5 Programs Computer Architecture Logic Circuits Electrical circuits Cognitive Architecture

Body Cognitive Architecture Fixed mechanisms underlying cognition –Memories, processing elements, control, interfaces –Representations of knowledge –Separation of fixed processes and variable knowledge –Complex behavior arises from composition of simple primitives Purpose: –Bring knowledge to bear to select actions to achieve goals Not just a framework –BDI, NN, logic & probability, rule-based systems Important constraints: –Continual performance –Real-time performance –Incremental, on-line learning Architecture Knowledge Goals Task Environment 6

Common Structures of many Cognitive Architectures 7 Short-term Memory Procedural Long-term Memory Procedural Long-term Memory Declarative Long-term Memory Declarative Long-term Memory Perception Action Selection Action Selection Procedure Learning Declarative Learning Goals

Different Goals of Cognitive Architecture Biological plausibility: Does the architecture correspond to what we know about the brain? Psychological plausibility: Does the architecture capture the details of human performance in a wide range of cognitive tasks? Functionality: Does the architecture explain how humans achieve their high level of intellectual function? –Building Human-level AI 8

Short History of Soar Pre-Soar Problem Spaces Production Systems Heuristic Search Functionality Modeling Multi-method Multi-task problem solving Subgoaling Chunking UTC Natural Language HCI External Environment Integration Large bodies of knowledge Teamwork Real Application Virtual Agents Learning from Experience, Observation, Instruction New Capabilities

Distinctive Features of Soar Emphasis on functionality –Take engineering, scaling issues seriously –Interfaces to real world systems –Can build very large systems in Soar that exist for a long time Integration with perception and action –Mental imagery and spatial reasoning Integrates reaction, deliberation, meta-reasoning –Dynamically switching between them Integrated learning –Chunking, reinforcement learning, episodic & semantic Useful in cognitive modeling –Expanding this is emphasis of many current projects Easy to integrate with other systems & environments –SML efficiently supports many languages, inter-process 10

System Architecture Soar Kernel gSKI KernelSML ClientSML SWIG Language Layer Application SML Soar 9.0 Kernel (C) Higher-level Interface (C++) Encodes/Decodes function calls and responses in XML (C++) Soar Markup Language Encodes/Decodes function calls and responses in XML (C++) Wrapper for Java/Tcl (Not needed if app is in C++) Application (any language)

Soar Basics Operators: Deliberate changes to internal/external state Activity is a series of operators controlled by knowledge: 1.Input from environment 2.Elaborate current situation: parallel rules 3.Propose and evaluate operators via preferences: parallel rules 4.Select operator 5.Apply operator: Modify internal data structures: parallel rules 6.Output to motor system 12 Agent in real or virtual world ? Agent in new state ? Operator

Basic Soar Architecture Body Long-Term Memory Procedural Symbolic Short-Term Memory Decision Procedure Chunking PerceptionAction Elaborate Operator Output Input Elaborate State Propose Operators Evaluate Operators Select OperatorApply Operator Apply Decide 13

Evaluate Operators Evaluate Operators Production Memory Working Memory Soar 101: Eaters East South North Propose Operator North > East South > East North = South Apply Operator OutputInput Select Operator If cell in direction is not a wall, --> propose operator move If operator will move to a bonus food and operator will move to a normal food, --> operator > If an operator is selected to move --> create output move-direction Input Propose Operator Select Operator Apply Operator Output If operator will move to a empty cell --> operator < North > East South < move-direction North

Example Working Memory B A (s1 ^block b1 ^block b2 ^table t1) (b1 ^color blue ^name A ^ontop b2 ^size 1 ^type block ^weight 14) (b2 ^color yellow ^name B ^ontop t1 ^size 1 ^type block ^under b1 ^weight 14) (t1 ^color gray ^shape square ^type table ^under b2) Working memory is a graph. All working memory elements must be “linked” directly or indirectly to a state. S1 b1 t1 b2 ^block ^table yellow block 1 B 14 ^color ^name ^size ^type ^weight ^under ^ontop 15

Soar Processing Cycle 16 Elaborate Operator Output Input Elaborate State Propose Operators Evaluate Operators Select OperatorApply Operator Apply Decide Rules Impasse Subgoal Elaborate Operator Output Input Elaborate State Propose Operators Evaluate Operators Select OperatorApply Operator Apply Decide

TankSoar Red Tank’s Shield Borders (stone) Walls (trees) Health charger Missile pack Blue tank (Ouch!) Energy charger Green tank’s radar 17

Soar 103: Subgoals Propose Operator Compare Operators Apply Operator OutputInput Select Operator Input Propose Operator Compare Operators Select Operator Move Wander If enemy not sensed, then wander Turn Apply Operator Output

Soar 103: Subgoals Propose Operator Compare Operators Apply Operator Output Input Select Operator Attack If enemy is sensed, then attack Shoot

TacAir-Soar [1997] Controls simulated aircraft in real-time training exercises (>3000 entities) Flies all U.S. air missions Dynamically changes missions as appropriate Communicates and coordinates with computer and human controlled planes Large knowledge base (8000 rules) No learning

TacAir-Soar Task Decomposition Achieve Proximity Employ Weapons Search Execute Tactic Scram Get Missile LAR Select Missile Get Steering Circle Sort Group Launch Missile Lock RadarLock IRFire-Missile Wait-for Missile-Clear If intercepting an enemy and the enemy is within range ROE are met then propose employ-weapons Employ Weapons If employing-weapons and missile has been selected and the enemy is in the steering circle and LAR has been achieved, then propose launch-missile Launch Missile If launching a missile and it is an IR missile and there is currently no IR lock then propose lock-IR Lock IR Execute Mission Fly-route Ground Attack Fly-Wing Intercept If instructed to intercept an enemy then propose intercept Intercept >250 goals, >600 operators, >8000 rules 21

Impasse/Substate Implications: Substate is really meta-state that allows system to reflect Substate = goal to resolve impasse –Generate operator –Select operator (deliberate control) –Apply operator (task decomposition) All basic problem solving functions open to reflection –Operator creation, selection, application, state elaboration Substate is where knowledge to resolve impasse can be found Hierarchy of substate/subgoals arise through recursive impasses 22

Tie Subgoals and Chunking East South North Propose Operator Evaluate Operators Apply Operator Output Input Select Operator Input Propose Operator Evaluate Operators Select Operator Tie Impasse Evaluate-operator (North) North = 10 Evaluate-operator (South) Evaluate-operator (East) = 10 = 5 Chunking creates rule that applies evaluate-operator North > East South > East North = South = 10 Chunking creates rules that create preferences based on what was tested

Chunking Analysis Converts deliberate reasoning/planning to reaction Generality of learning based on generality of reasoning –Leads to many different types learning –If reasoning is inductive, so is learning Soar only learns what it thinks about Chunking is impasse driven –Learning arises from a lack of knowledge 24

Extending Soar Learn from internal rewards –Reinforcement learning Learn facts –What you know –Semantic memory Learn events –What you remember –Episodic memory Basic drives and … –Emotions, feelings, mood Non-symbolic reasoning –Mental imagery Learn from regularities –Spatial and temporal clusters Body Symbolic Long-Term Memories Procedural Symbolic Short-Term Memory Decision Procedure Chunking Reinforcement Learning Semantic Learning Episodic Learning Perception Action Visual Imagery Appraisal Detector Reinforcement Learning Clustering 25

26

Reinforcement Learning Shelly Nason 27

RL in Soar 1.Encode the value function as operator evaluation rules with numeric preferences. 2.Combine all numeric preferences for an operator dynamically. 3.Adjust value of numeric preferences with experience. Internal State Value Function Perception Reward Update Value Function Action Selection Action 28

The Q-function in Soar The value-function is stored in rules that test the state and operator, and create numeric preferences. sp {rl-rule (state ^operator +) … --> ( ^operator = 0.34)} Operator Q-value = the sum of all numeric preferences. Selection: epsilon greedy, or Boltzmann O1: {.34,.45,.02} = 8.1 O2: {.25,.11,.12} = 4.8 O3: {-.04,.14, -.05} =.05 epsilon-greedy: With probability ε the agent selects an action at random. Otherwise the agent takes the action with the highest expected value. [Balance exploration/exploitation] 29

Updating operator values Sarsa update: Q(s,O1)  Q(s,O1) + α[r + λQ(s’,O2) – Q(s,O1)].1 * [.2 +.9* ] = -.03 Update is split evenly between rules contributing to O1 = R1 =.19, R2 =.14, R3 = -.03 O1 =.33 Q(s,O1) = sum of numeric prefs. r = reward =.2 O2 =.11 Q(s’,O2) = sum of numeric prefs. of selected operator (O2) R1(O1) =.20 R2(O1) =.15 R3(O1)=

Results with Eaters 31

RL TankSoar Agent 32

Semantic Memory Yongjia Wang 33

Memory Systems Memory Long Term Memory Short Term Memory Declarative Procedural Semantic Memory Episodic Memory Perceptual Representation System Procedural Memory Working Memory 34

Declarative Memory Alternatives Working Memory –Keep everything in working memory Retrieve dynamically with rules –Rules provide asymmetric access –Data chunking to learn (complex) Separate Declarative Memories –Semantic memory (facts) –Episodic memory (events) 35

Basic Semantic Memory Functionalities Encoding –What to save? –When to add new declarative chunk? –How to update knowledge? Retrieval –How the cue is placed and matched? –What are the different types of retrieval? Storage –What are the storage structures? –How are they maintained? 36

Semantic Memory Functionalities A B A state B Cue A Expand NIL Expand Cue C D E F DEF E E Save NILSave Feature Match Retrieval Update with Complex Structure AutoCommit Remove-No-Change Semantic Memory Working Memory 37

Episodic Memory Andrew Nuxoll 38

Memory Systems Memory Long Term Memory Short Term Memory Declarative Procedural Semantic Memory Episodic Memory Perceptual Representation System Procedural Memory Working Memory 39

Episodic vs. Semantic Memory Semantic Memory –Knowledge of what we “know” –Example: what state the Grand Canyon is in Episodic Memory –History of specific events –Example: a family vacation to the Grand Canyon

Characteristics of Episodic Memory: Tulving Architectural: –Does not compete with reasoning. –Task independent Automatic: –Memories created without deliberate decision. Autonoetic: –Retrieved memory is distinguished from sensing. Autobiographical: –Episode remembered from own perspective. Variable Duration: –The time period spanned by a memory is not fixed. Temporally Indexed: –Rememberer has a sense of when the episode occurred. 41

Long-term Procedural Memory Production Rules Implementation Encoding Initiation? Storage Retrieval When the agent takes an action. Input Output Cue Retrieved Working Memory 42

Long-term Procedural Memory Production Rules Current Implementation Encoding Initiation Content? Storage Retrieval The entire working memory is stored in the episode Input Output Cue Retrieved Working Memory 43

Long-term Procedural Memory Production Rules Current Implementation Encoding Initiation Content Storage Episode Structure? Retrieval Episodes are stored in a separate memory Input Output Cue Retrieved Working Memory Episodic Memory Episodic Learning 44

Long-term Procedural Memory Production Rules Current Implementation Encoding Initiation Content Storage Episode Structure Retrieval Initiation/Cue? Cue is placed in an architecture specific buffer. Input Output Cue Retrieved Working Memory Episodic Memory Episodic Learning 45

Episodic Memory Long-term Procedural Memory Production Rules Current Implementation Encoding Initiation Content Storage Episode Structure Retrieval Initiation/Cue Retrieval The closest partial match is retrieved. Input Output Cue Retrieved Working Memory Episodic Learning 46

Cognitive Capability: Virtual Sensing Retrieve prior perception that is relevant to the current task Tank recursively searches memory –Have I seen a charger from here? –Have I seen a place where I can see a charger? ? 47

Virtual Sensors Results 48

Create a memory cue East South North Evaluate moving in each available direction Cognitive Capability: Action Modeling 49 Episodic Retrieval Retrieve the best matching memory Retrieve Next Memory Retrieve the next memory Use the change in score to evaluate the proposed action Move North = 10 points Agent’s knowledge is insufficient - impasse Agent attempts to choose direction

Episodic Memory: Multi-Step Action Projection [Andrew Nuxoll] Learn tactics from prior success and failure –Fight/flight –Back away from enemy (and fire) –Dodging

Enables Cognitive Capabilities Sensing –Detect Changes –Detect Repetition –Virtual Sensing Reasoning –Model Actions –Use Previous Successes/Failures –Model the Environment –Manage Long Term Goals –Explain Behavior Learning –Retroactive Learning –Allows Reanalysis Given New Knowledge – “Boost” other Learning Mechanisms Episodic Memory 51

Mental Imagery and Spatial Reasoning Scott Lathrop Sam Wintermute See AGI Talks 52

Shape, color, topology, spatial properties Depictive, pixel-based representations Image algebra algorithms  Sentential/Algebraic algorithms  Depictive/Ordinal algorithms VISUAL IMAGERY VISUAL-SPATIALVISUAL-DEPICTIVE Location, orientation Sentential, quantitative representations Linear algebra and computational geometry algorithms WHAT IS VISUAL IMAGERY? 53

Where can you put A next to I? 54

Spatial Problem Solving with Mental Imagery [Scott Lathrop & Sam Wintermute] Environment Spatial Scene Soar Qualitative descriptions of object relationships Qualitative description of new objects in relation to existing objects Quantitative descriptions of environmental objects A A A’ A’A’ (on AI) (imagine_left_of A I) (intersect A′ O)(no_intersect A’) (imagine_right_of A I)(move_right_of A I)

Upcoming Challenges Continued refinement and integration Integrate with complex perception and motor systems Adding/learning lots of world knowledge +Language, Spatial, Temporal Reasoning, … Scaling up to large bodies of knowledge –Build up from instruction, experience, exploration, … 56

Soar Community Soar Website – Soar Workshop every June in Ann Arbor –June 22-26, 2009 Soar-group – –Low traffic 57

Thanks to Funding Agencies: NSF, DARPA, ONR Ph.D. students: Nate Derbinsky, Nicholas Gorski, Scott Lathrop, Robert Marinier, Andrew Nuxoll, Yongjia Wang, Samuel Wintermute, Joseph Xu Research Programmers: Karen Coulter, Jonathan Voigt Continued inspiration: Allen Newell 58

Challenges in Cognitive Architecture Research Dynamic taskability –Pursue novel tasks Learning –Always learning, learning in unexpected and unplanned ways (wild learning) –Transition from programming to learning by imitation, instruction, experience, reflection, … Natural language –Active area but much left to do. Social behavior –Interaction with humans and other entities Connect to the real world –Cognitive robotics with long-term existence Applications –Expand domains and problems –Putting cognitive architectures to work Connect to unfolding research on the brain, psychology, and the rest of AI. 60

ACT-R The ‘chunk’ declarative data structure Buffers holding single chunk Long term declarative memory declarative memory red #3 ‘x’ #9 #45 goal perception #3 #45 visualization 61

Relevant Soar - ACT-R Differences Soar –Single generic Working Memory –WME structures represent individual attributes –Activations associated with individual attributes –Complex WM structures, parallel/serial rule firing ACT-R –Specialized buffers –Chunk is the atomic retrieval unit –Activations associated with chunks –Each buffer holds single chunk, serial rule firing 62