Lisa Torrey University of Wisconsin – Madison CS 540.

Slides:



Advertisements
Similar presentations
Neural Networks and Kernel Methods
Advertisements

1 Update on Learning By Observation Learning from Positive Examples Only Tolga Konik University of Michigan.
Kshitij Judah, Alan Fern, Tom Dietterich TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: School of EECS, Oregon State.
David Wingate Reinforcement Learning for Complex System Management.
COSC 878 Seminar on Large Scale Statistical Machine Learning 1.
Self Taught Learning : Transfer learning from unlabeled data Presented by: Shankar B S DMML Lab Rajat Raina et al, CS, Stanford ICML 2007.
An Introduction to Machine Learning In the area of AI (earlier) machine learning took a back seat to Expert Systems Expert system development usually consists.
Hierarchical Reinforcement Learning Ersin Basaran 19/03/2005.
Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA.
Reinforcement Learning (1)
Exploration in Reinforcement Learning Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham, UK
Kunstmatige Intelligentie / RuG KI Reinforcement Learning Sander van Dijk.
Behaviorist Psychology R+R- P+P- B. F. Skinner’s operant conditioning.
Reinforcement Learning in the Presence of Hidden States Andrew Howard Andrew Arnold {ah679
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Reinforcement Learning
Lisa Torrey and Jude Shavlik University of Wisconsin Madison WI, USA.
Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,
Lisa Torrey University of Wisconsin – Madison Doctoral Defense May 2009.
Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin.
Transfer Learning Via Advice Taking Jude Shavlik University of Wisconsin-Madison.
Transfer in Reinforcement Learning via Markov Logic Networks Lisa Torrey, Jude Shavlik, Sriraam Natarajan, Pavan Kuppili, Trevor Walker University of Wisconsin-Madison,
Integrating Background Knowledge and Reinforcement Learning for Action Selection John E. Laird Nate Derbinsky Miller Tinkerhess.
Leveraging Human Knowledge for Machine Learning Curriculum Design Matthew E. Taylor teamcore.usc.edu/taylorm.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Hierarchical Reinforcement Learning Using Graphical Models Victoria Manfredi and.
Design Principles for Creating Human-Shapable Agents W. Bradley Knox, Ian Fasel, and Peter Stone The University of Texas at Austin Department of Computer.
Reinforcement Learning Ata Kaban School of Computer Science University of Birmingham.
Machine Learning, Decision Trees, Overfitting Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 14,
Relational Macros for Transfer in Reinforcement Learning Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin University.
A reinforcement learning algorithm for sampling design in Markov random fields Mathieu BONNEAU Nathalie PEYRARD Régis SABBADIN INRA-MIA Toulouse
Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik.
Top level learning Pass selection using TPOT-RL. DT receiver choice function DT is trained off-line in artificial situation DT used in a heuristic, hand-coded.
POMDPs: 5 Reward Shaping: 4 Intrinsic RL: 4 Function Approximation: 3.
Other NN Models Reinforcement learning (RL)
Learning from Positive and Unlabeled Examples Investigator: Bing Liu, Computer Science Prime Grant Support: National Science Foundation Problem Statement.
Learning to Share Meaning in a Multi-Agent System (Part I) Ganesh Padmanabhan.
Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.
Reinforcement learning (Chapter 21)
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10
Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach Aaron Wilson, Alan Fern, Prasad Tadepalli School of EECS Oregon State.
The ideals reality of science The pursuit of verifiable answers highly cited papers for your c.v. The validation of our results by reproduction convincing.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Additional NN Models Reinforcement learning (RL) Basic ideas: –Supervised learning: (delta rule, BP) Samples (x, f(x)) to learn function f(.) precise error.
Thirty-Two Years of Knowledge-Based Machine Learning Jude Shavlik University of Wisconsin Not on cs540 final.
Sparse Approximate Gaussian Processes. Outline Introduction to GPs Subset of Data Bayesian Committee Machine Subset of Regressors Sparse Pseudo GPs /
Adaptive Reinforcement Learning Agents in RTS Games Eric Kok.
Matthew E. Taylor 1 Autonomous Inter-Task Transfer in Reinforcement Learning Domains Matthew E. Taylor Learning Agents Research Group Department of Computer.
Reinforcement Learning Guest Lecturer: Chengxiang Zhai Machine Learning December 6, 2001.
Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.
REINFORCEMENT LEARNING Unsupervised learning 1. 2 So far ….  Supervised machine learning: given a set of annotated istances and a set of categories,
CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.
Transfer Learning and Intelligence: an Argument and Approach Matthew E. Taylor Joint work with: Gregory Kuhlmann and Peter Stone Learning Agents Research.
Evolvable dialogue systems
Reinforcement Learning
Deep Reinforcement Learning
Adversarial Learning for Neural Dialogue Generation
Mastering the game of Go with deep neural network and tree search
Reinforcement learning (Chapter 21)
Reinforcement learning (Chapter 21)
Transferring Instances for Model-Based Reinforcement Learning
Integrating Learning of Dialog Strategies and Semantic Parsing
Propagating Uncertainty In POMDP Value Iteration with Gaussian Process
Richard Maclin University of Minnesota - Duluth
Reinforcement Learning
Introduction to Reinforcement Learning and Q-Learning
Emir Zeylan Stylianos Filippou
Refining Rules Incorporated into Knowledge-Based Support Vector Learners via Successive Linear Programming Richard Maclin University of Minnesota - Duluth.
Introduction to Imitation Learning
Continuous Curriculum Learning for RL
Presentation transcript:

Lisa Torrey University of Wisconsin – Madison CS 540

Education Hierarchical curriculum Learning tasks share common stimulus-response elements Abstract problem-solving Learning tasks share general underlying principles Multilingualism Knowing one language affects learning in another Transfer can be both positive and negative

GivenLearn Task T Task S

performance training higher start higher slope higher asymptote

All Hypotheses Allowed Hypotheses Search

All Hypotheses Allowed Hypotheses Search Thrun and Mitchell 1995: Transfer slopes for gradient descent

Bayesian LearningBayesian Transfer Prior distribution + Data = Posterior Distribution Bayesian methods Raina et al.2006: Transfer a Gaussian prior

LineCurve Surface Circle Pipe Hierarchical methods Stracuzzi 2006: Learn Boolean concepts that can depend on each other

Dealing with Missing Data or Labels Shi et al. 2008: Transfer via active learning Task S Task T

Environment s1s1 Agent Q(s 1, a) = 0 π(s 1 ) = a 1 a1a1 s2s2 r2r2 δ(s 1, a 1 ) = s 2 r(s 1, a 1 ) = r 2 Q(s 1, a 1 )  Q(s 1, a 1 ) + Δ π(s 2 ) = a 2 a2a2 δ(s 2, a 2 ) = s 3 r(s 2, a 2 ) = r 3 s3s3 r3r3

Starting-point methods Hierarchical methods Alteration methods Imitation methods New RL algorithms

target-task training Initial Q-table transfer no transfer Source task Starting-point methods Taylor et al. 2005: Value-function transfer

Hierarchical methods RunKick Pass Shoot Soccer Mehta et al. 2008: Transfer a learned hierarchy

Alteration methods Walsh et al. 2006: Transfer aggregate states Task S Original states Original actions Original rewards New states New actions New rewards

New RL Algorithms Torrey et al. 2006: Transfer advice about skills Environment s1s1 Agent Q(s 1, a) = 0 π(s 1 ) = a 1 a1a1 s2s2 r2r2 δ(s 1, a 1 ) = s 2 r(s 1, a 1 ) = r 2 Q(s 1, a 1 )  Q(s 1, a 1 ) + Δ π(s 2 ) = a 2 a2a2 δ(s 2, a 2 ) = s 3 r(s 2, a 2 ) = r 3 s3s3 r3r3

Imitation methods training source target policy used Torrey et al. 2007: Demonstrate a strategy

Starting-point methods Imitation methods Hierarchical methods Hierarchical methods New RL algorithms Skill Transfer Macro Transfer

3-on-2 BreakAway 3-on-2 KeepAway 3-on-2 MoveDownfield 2-on-1 BreakAway

IF [ ] THEN pass(Teammate) IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 15 THEN pass(Teammate) IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 30 THEN pass(Teammate) IF distance(Teammate) ≤ 5 THEN pass(Teammate) IF distance(Teammate) ≤ 10 THEN pass(Teammate) …

Find Q-functions that minimize:ModelSize + C × DataMisfit Batch Reinforcement Learning via Support Vector Regression (RL-SVR) Environment Agent Batch 1 Environment Agent Batch 2 … Compute Q-functions

Find Q-functions that minimize:ModelSize + C × DataMisfit Batch Reinforcement Learning with Advice (KBKR) Environment Agent Batch 1 Compute Q-functions Environment Agent Batch 2 … Advice + µ × AdviceMisfit

Source Target IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 30 THEN pass(Teammate) ILP Advice Taking [Human advice] Mapping

Skill transfer to 3-on-2 BreakAway from several tasks

pass(Teammate) move(Direction) shoot(goalRight) shoot(goalLeft) IF [... ] THEN pass(Teammate) IF [... ] THEN move(ahead) IF [... ] THEN shoot(goalRight) IF [... ] THEN shoot(goalLeft) IF [... ] THEN pass(Teammate) IF [... ] THEN move(left) IF [... ] THEN shoot(goalRight) IF [... ] THEN shoot(goalRight)

source target training policy used An imitation method

Source Target ILP Demonstration

Learning structures Positive: BreakAway games that score Negative: BreakAway games that didn’t score ILP IF actionTaken(Game, StateA, pass(Teammate), StateB) actionTaken(Game, StateB, move(Direction), StateC) actionTaken(Game, StateC, shoot(goalRight), StateD) actionTaken(Game, StateD, shoot(goalLeft), StateE) THEN isaGoodGame(Game)

Learning rules for arcs Positive: states in good games that took the arc Negative: states in good games that could have taken the arc but didn’t ILP shoot(goalRight) IF [ … ] THEN enter(State) IF [ … ] THEN loop(State, Teammate)) pass(Teammate)

Macro transfer to 3-on-2 BreakAway from 2-on-1 BreakAway

Machine learning is often designed in standalone tasks Transfer is a natural learning ability that we would like to incorporate into machine learners There are some successes, but challenges remain, like avoiding negative transfer and automating mapping