Integrating Background Knowledge and Reinforcement Learning for Action Selection John E. Laird Nate Derbinsky Miller Tinkerhess.

Slides:



Advertisements
Similar presentations
Pat Langley School of Computing and Informatics Arizona State University Tempe, Arizona USA Modeling Social Cognition in a Unified Cognitive Architecture.
Advertisements

Information Processing Technology Office Learning Workshop April 12, 2004 Seedling Overview Learning Hierarchical Reactive Skills from Reasoning and Experience.
Artificial Intelligence of a Scrabble System
1 Reinforcement Learning (RL). 2 Introduction The concept of reinforcement learning incorporates an agent that solves the problem in hand by interacting.
Learning To Use Memory Nick Gorski & John Laird Soar Workshop 2011.
Learning in Computer Go David Silver. The Problem Large state space  Approximately states  Game tree of about nodes  Branching factor.
1 Soar Emote Bob Marinier John Laird University of Michigan.
Automated Regression Modeling Descriptive vs. Predictive Regression Models Four common automated modeling procedures Forward Modeling Backward Modeling.
1 Update on Learning By Observation Learning from Positive Examples Only Tolga Konik University of Michigan.
RL for Large State Spaces: Value Function Approximation
Outline Dudo Rules Regret and Counterfactual Regret Imperfect Recall Abstraction Counterfactual Regret Minimization (CFR) Difficulties Fixed-Strategy.
Ai in game programming it university of copenhagen Reinforcement Learning [Outro] Marco Loog.
Playing Dice with Soar Soar Workshop June 2011 John E. Laird Nate Derbinsky, Miller Tinkerhess University of Michigan 1.
Outline Introduction Soar (State operator and result) Architecture
Notes for Teachers This activity is based upon Non-Transitive Dice, and is an excellent exploration into some seemingly complex.
Modeling Human Reasoning About Meta-Information Presented By: Scott Langevin Jingsong Wang.
Introduction to SOAR Based on “a gentle introduction to soar: an Architecture for Human Cognition” by Jill Fain Lehman, John Laird, Paul Rosenbloom. Presented.
1 Chunking with Confidence John Laird University of Michigan June 17, th Soar Workshop
Learning Shape in Computer Go David Silver. A brief introduction to Go Black and white take turns to place down stones Once played, a stone cannot move.
Reinforcement Learning Rafy Michaeli Assaf Naor Supervisor: Yaakov Engel Visit project’s home page at: FOR.
Reinforcement Learning of Local Shape in the Game of Atari-Go David Silver.
Reinforcement Learning in Real-Time Strategy Games Nick Imrei Supervisors: Matthew Mitchell & Martin Dick.
Impact of Working Memory Activation on Agent Design John Laird, University of Michigan 1.
Incorporating Advice into Agents that Learn from Reinforcement Presented by Alp Sardağ.
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
The Importance of Architecture for Achieving Human-level AI John Laird University of Michigan June 17, th Soar Workshop
A Soar’s Eye View of ACT-R John Laird 24 th Soar Workshop June 11, 2004.
Using Probabilistic Knowledge And Simulation To Play Poker (Darse Billings …) Presented by Brett Borghetti 7 Jan 2007.
1 Kunstmatige Intelligentie / RuG KI Reinforcement Learning Johan Everts.
Soar-RL Discussion: Future Directions, Open Questions, and Why You Should Use It 31 st Soar Workshop 1.
Reinforcement Learning and Soar Shelley Nason. Reinforcement Learning Reinforcement learning: Learning how to act so as to maximize the expected cumulative.
Soar-RL: Reinforcement Learning and Soar Shelley Nason.
Reinforcement Learning (1)
Reinforcement Learning of Local Shape in the Game of Atari-Go David Silver.
Reinforcement Learning in the Presence of Hidden States Andrew Howard Andrew Arnold {ah679
L-3 PROPRIETARY 1 Real-time IO-oriented Soar Agent for Base Station-Mobile Control David Daniel, Scott Hastings L-3 Communications.
Game Trees: MiniMax strategy, Tree Evaluation, Pruning, Utility evaluation Adapted from slides of Yoonsuck Choe.
Mao Ye, Peifeng Yin, Wang-Chien Lee, Dik-Lun Lee Pennsylvania State Univ. and HKUST SIGIR 11.
Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning CPSC 315 – Programming Studio Spring 2008 Project 2, Lecture 2 Adapted from slides of Yoonsuck.
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
Game Playing.
Reinforcement Learning
A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of Michigan.
Learning BlackJack with ANN (Aritificial Neural Network) Ip Kei Sam ID:
Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin.
Games. Adversaries Consider the process of reasoning when an adversary is trying to defeat our efforts In game playing situations one searches down the.
Reinforcement Learning Ata Kaban School of Computer Science University of Birmingham.
Soar Technology, Inc. Proprietary 6/4/ Using Soar to Teach Probabilistic Reasoning Jim Thomas 6/5/2013.
AI in games Roger Crawfis CSE 786 Game Design. AI vs. AI for games AI for games poses a number of unique design challenges AI for games poses a number.
CSCI 4310 Lecture 6: Adversarial Tree Search. Book Winston Chapter 6.
Carla P. Gomes CS4700 CS 4701: Practicum in Artificial Intelligence Carla P. Gomes
Reactive and Output-Only HKOI Training Team 2006 Liu Chi Man (cx) 11 Feb 2006.
ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.
SOAR A cognitive architecture By: Majid Ali Khan.
Beyond Chunking: Learning in Soar March 22, 2003 John E. Laird Shelley Nason, Andrew Nuxoll and a cast of many others University of Michigan.
Graph Search II GAM 376 Robin Burke. Outline Homework #3 Graph search review DFS, BFS A* search Iterative beam search IA* search Search in turn-based.
Administrative: “Create New Game” Project Apply the principles of Iterative Design –First run of games in class: March 28 th in class Short document describing:
Competence-Preserving Retention of Learned Knowledge in Soar’s Working and Procedural Memories Nate Derbinsky, John E. Laird University of Michigan.
ConvNets for Image Classification
CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.
Dice Game Introduction
Action Modeling with Graph-Based Version Spaces in Soar
Learning Fast and Slow John E. Laird
Executable Specification for Soar Theory
John Laird, Nate Derbinsky , Jonathan Voigt University of Michigan
John E. Laird 32nd Soar Workshop
Announcements Homework 3 due today (grace period through Friday)
Joseph Xu Soar Workshop 31 June 2011
Learning the Task Definition of Games through ITL
Chapter 4 Summary.
Presentation transcript:

Integrating Background Knowledge and Reinforcement Learning for Action Selection John E. Laird Nate Derbinsky Miller Tinkerhess

Goal Provide an architectural support so that: – Agent uses (possibly heuristic) background knowledge to initially make action selections Might be non determinism in the environment that is hard to build a good theory of. – Learning improves action selections based on experienced-based reward Captures regularities that are hard to encode by hand. Approach: Use chunking to learn RL rules and then use reinforcement learning to tune behavior. Page 2

Deliberate Background Knowledge for Action Selection Examples: – Compute the likelihood of success of different actions using explicit probabilities. – Look-ahead internal search using an action model to predict future result from an action – Retrieve from episodic memory of similar situation – Request from instructor Characteristics – Multiple steps of internal actions – Not easy to incorporate accumulated experienced-based reward Soar Approach – Calculations in a substate of a tie impasse Page 3

Page 4 Each player starts with five dice.Players take turns making bids as to number of dice under cups. Bid 4 2’s Bid 6 6’s Players hide dice under cups and shake. Players must increase bid or challenge.

Page 5 Players can “push” out a subset of their dice and reroll when bidding. Bid 7 2’s

Page 6 Player can Challenge previous bid. All dice are revealed Challenge! I bid 7 2’s Challenge fails player loses die

Evaluation with Probability Calculation Page 7 compute-bid- probability Tie Impasse Evaluation =.3 Evaluate(challenge) Evaluate(bid: 6[6]) Evaluation =.8 Evaluation =.1.1 Evaluation =.3 Challenge =.8 Bid: 6[4] =.3 Bid: 6[6] =.1 Bid: 6[4] Challenge Bid: 6[6] Evaluate(bid: 6[4]).3 Evaluation Substate Selection Problem Space

Using Reinforcement Learning for Operator Selection Reinforcement Learning – Choosing best action based on expected value – Expected value updated based on received reward and expected future reward Characteristics – Direct mapping between situation-action and expected value (value function) – Does not use any background knowledge – No theory of original of initial values (usually 0) or value-function Soar Approach – Operator selection rules with numeric preferences – Reward received base on task performance – Update numeric preferences based on experience Issues: – Where do RL-rules come from? Conditions that determine the structure of the value function Actions that initialize the value function Page 8

Value Function in Soar Rules map from situation-action to expected value. – Conditions determine generality/specificity of mapping Page

Approach: Using Chunking over Substate For each preference created in the substate, chunking (EBL) creates a new RL rule – Actions are numeric preference – Conditions based on working memory elements tested in substate Reinforcement learning tunes rules based on experience Page 10

Two-Stage Learning 11 ? Processing in Substates Deliberately evaluate alternatives Use probabilities, heuristics, model Uses multiple operator selections and Select action # # # # # # # # # # # # # # # # # # Deliberate Processing in Substates Always available for new situations Can expand to any number of players RL rules updated based on agent’s experience # #

Evaluation with Probability Calculation Page 12 compute-bid- probability Tie Impasse Evaluation =.3 Evaluate(challenge) Evaluate(bid: 6[6]) Evaluation =.8 Evaluation =.1.1 Evaluation =.3 Challenge =.8 Bid: 6[4] =.3 Bid: 6[6] =.1 Bid: 6[4] Challenge Bid: 6[6] Evaluate(bid: 6[4]).3 Create rule for each result adds entry to value function Evaluation Substate Selection Problem Space

Learning RL-rules Bid: 6 4’s? Last bid: 5 4’s Last player: 4 unknown Shared info 9 unknown Pushed dice Dice in my cup Page 13 Using only probability calculation 4 4’s Need 2 4’s 2 4’s out of 9 unknown?.3 If 9 unknown dice, 2 4’s pushed and 2 4’s under my cup, and considering biding 6 4’s, expected value is.3

Learning RL-rules Bid: 6 4’s? Last bid: 5 4’s Last player: 4 unknown Shared info 9 unknown Pushed dice Dice in my cup Page 14 Using probability and model 7 unknown 2 4’s shared 2 4’s under his cup Need 1 4 out of 5 unknown.60 If 9 unknown dice, 2 4’s pushed and 2 4’s under my cup, and previous player had 4 unknown dice and bid 5 4’s, and I’m considering biding 6 4’s, the expected value is.60

Research Questions Does RL help? – Do agents improve with experience? – Can learning lead to better performance than the best hand-coded agent? Does initialization of RL rules improve performance? How does background knowledge affect rules learned by chunking and how do they affect learning? Page 15

Evaluation of Learning 3-player games – Against best non learning player agent Heuristics and opponent model – Alternate 1000 game blocks of testing and training Metrics – Speed of learning & asymptotic performance Agent variants: – B: baseline – H: with heuristics – M: with opponent model – MH: with opponent model and heuristics Page 16

Learning Agent Comparison Page 17 Best agents do significantly better than hand coded. H and M give better initial performance than B. P alone speed learning (smaller state space). M slows learning (much larger state space).

Learning Agents with Initial Values = 0 Page 18

Number of Rules Learned Page 19

Nuggets and Coal Nuggets: – First combination of chunking/EBL with RL Transition from deliberate to reactive to learning Potential story for origin of value functions for RL – Intriguing idea for creating evaluation functions for game- playing agents Complex deliberation for novel and rare situations Reactive RL learning for common situations Coal – Sometimes background knowledge slows learning… – Appears we need more than RL! – How recover if learn very general RL rules? Page 20