Transfer in Reinforcement Learning via Markov Logic Networks Lisa Torrey, Jude Shavlik, Sriraam Natarajan, Pavan Kuppili, Trevor Walker University of Wisconsin-Madison,

Slides:

Advertisements

Similar presentations

Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.

Advertisements

Explanation-Based Learning (borrowed from mooney et al)

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.

Discriminative Training of Markov Logic Networks

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Discriminative Structure and Parameter.

Online Max-Margin Weight Learning for Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.

Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa 1), Sebastian Riedel 2), Tsutomu Hirao 3), Masayuki Asahara.

Background Reinforcement Learning (RL) agents learn to do tasks by iteratively performing actions in the world and using resulting experiences to decide.

Markov Logic Networks Instructor: Pedro Domingos.

1 Unsupervised Semantic Parsing Hoifung Poon and Pedro Domingos EMNLP 2009 Best Paper Award Speaker: Hao Xiong.

Review Markov Logic Networks Mathew Richardson Pedro Domingos Xinran(Sean) Luo, u

Speeding Up Inference in Markov Logic Networks by Preprocessing to Reduce the Size of the Resulting Grounded Network Jude Shavlik Sriraam Natarajan Computer.

Efficient Weight Learning for Markov Logic Networks Daniel Lowd University of Washington (Joint work with Pedro Domingos)

Markov Logic: A Unifying Framework for Statistical Relational Learning Pedro Domingos Matthew Richardson

Speaker:Benedict Fehringer Seminar:Probabilistic Models for Information Extraction by Dr. Martin Theobald and Maximilian Dylla Based on Richards, M., and.

School of Computing Science Simon Fraser University Vancouver, Canada.

Learning Markov Network Structure with Decision Trees Daniel Lowd University of Oregon Jesse Davis Katholieke Universiteit Leuven Joint work with:

Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.

Recursive Random Fields Daniel Lowd University of Washington June 29th, 2006 (Joint work with Pedro Domingos)

CSE 574: Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.

Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.

1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

1 Kunstmatige Intelligentie / RuG KI Reinforcement Learning Johan Everts.

Recursive Random Fields Daniel Lowd University of Washington (Joint work with Pedro Domingos)

Markov Logic Networks: A Unified Approach To Language Processing Pedro Domingos Dept. of Computer Science & Eng. University of Washington Joint work with.

1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington.

Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA.

Statistical Relational Learning Pedro Domingos Dept. Computer Science & Eng. University of Washington.

© Jesse Davis 2006 View Learning Extended: Learning New Tables Jesse Davis 1, Elizabeth Burnside 1, David Page 1, Vítor Santos Costa 2 1 University of.

WISCONSIN UNIVERSITY OF WISCONSIN - MADISON Integrating Knowledge Capture and Supervised Learning through a Human-Computer Interface Trevor Walker, Gautam.

Lisa Torrey University of Wisconsin – Madison CS 540.

Pedro Domingos Dept. of Computer Science & Eng.

Boosting Markov Logic Networks

1/24 Learning to Extract Genic Interactions Using Gleaner LLL05 Workshop, 7 August 2005 ICML 2005, Bonn, Germany Mark Goadrich, Louis Oliphant and Jude.

Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.

Learning from Human Teachers: Issues and Challenges for ILP in Bootstrap Learning Sriraam Natarajan 1, Gautam Kunapuli 1, Richard Maclin 3, David Page.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

1 Transfer Learning by Mapping and Revising Relational Knowledge Raymond J. Mooney University of Texas at Austin with acknowledgements to Lily Mihalkova,

Reinforcement Learning

CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.

Lisa Torrey and Jude Shavlik University of Wisconsin Madison WI, USA.

Introduction Many decision making problems in real life

Web Query Disambiguation from Short Sessions Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College.

Learning Ensembles of First-Order Clauses for Recall-Precision Curves A Case Study in Biomedical Information Extraction Mark Goadrich, Louis Oliphant and.

Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Lisa Torrey University of Wisconsin – Madison Doctoral Defense May 2009.

Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin.

Transfer Learning Via Advice Taking Jude Shavlik University of Wisconsin-Madison.

WISCONSIN UNIVERSITY OF WISCONSIN - MADISON Broadening the Applicability of Relational Learning Trevor Walker Ph.D. Defense 1.

Markov Logic And other SRL Approaches

Markov Logic and Deep Networks Pedro Domingos Dept. of Computer Science & Eng. University of Washington.

Speeding Up Relational Data Mining by Learning to Estimate Candidate Hypothesis Scores Frank DiMaio and Jude Shavlik UW-Madison Computer Sciences ICDM.

Tuffy Scaling up Statistical Inference in Markov Logic using an RDBMS

Markov Logic Networks Pedro Domingos Dept. Computer Science & Eng. University of Washington (Joint work with Matt Richardson)

Relational Macros for Transfer in Reinforcement Learning Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin University.

Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik.

Top level learning Pass selection using TPOT-RL. DT receiver choice function DT is trained off-line in artificial situation DT used in a heuristic, hand-coded.

For Monday Finish chapter 19 Take-home exam due. Program 4 Any questions?

Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach Aaron Wilson, Alan Fern, Prasad Tadepalli School of EECS Oregon State.

Happy Mittal (Joint work with Prasoon Goyal, Parag Singla and Vibhav Gogate) IIT Delhi New Rules for Domain Independent Lifted.

Thirty-Two Years of Knowledge-Based Machine Learning Jude Shavlik University of Wisconsin Not on cs540 final.

Progress Report ekker. Problem Definition In cases such as object recognition, we can not include all possible objects for training. So transfer learning.

Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.

Happy Mittal Advisor : Parag Singla IIT Delhi Lifted Inference Rules With Constraints.

Frank DiMaio and Jude Shavlik Computer Sciences Department

Trevor Walker, Gautam Kunapuli, Noah Larsen, David Page, Jude Shavlik

Adversarial Learning for Neural Dialogue Generation

Markov Logic Networks for NLP CSCI-GA.2591

Louis Oliphant and Jude Shavlik

Richard Maclin University of Minnesota - Duluth

Presentation transcript:

Transfer in Reinforcement Learning via Markov Logic Networks Lisa Torrey, Jude Shavlik, Sriraam Natarajan, Pavan Kuppili, Trevor Walker University of Wisconsin-Madison, USA

Possible Benefits of Transfer in RL Learning curves in the target task: performance training with transfer without transfer

The RoboCup Domain 2-on-1 BreakAway 3-on-2 BreakAway

Reinforcement Learning Environment Agent action reward state distance(me,teammate1) = 15 distance(me,opponent1) = 5 angle(opponent1, me, teammate1) = 30 … States are described by features: Move Pass Shoot Actions are: +1 for scoring 0 otherwise Rewards are:

Our Previous Methods Skill transfer Skill transfer Learn a rule for when to take each action Learn a rule for when to take each action Use rules as advice Use rules as advice Macro transfer Macro transfer Learn a relational multi-step action plan Learn a relational multi-step action plan Use macro to demonstrate Use macro to demonstrate

Transfer via Markov Logic Networks MLN Q-function Analyze Target-task learner MLN Q-function Demonstrate Source-task learner Learn Source-task Q-function and data

Markov Logic Networks A Markov network models a joint distribution A Markov network models a joint distribution A Markov Logic Network combines probability with logic A Markov Logic Network combines probability with logic Template: a set of first-order formulas with weights Template: a set of first-order formulas with weights Each grounded predicate in a formula becomes a node Each grounded predicate in a formula becomes a node Predicates in grounded formula are connected by arcs Predicates in grounded formula are connected by arcs Probability of a world: (1/Z) exp( Σ W i N i ) Probability of a world: (1/Z) exp( Σ W i N i ) Richardson and Domingos, ML 2006 X Y Z A B

MLN Q-function IF distance(me, Teammate) < 15 AND angle(me, goalie, Teammate) > 45 THEN Q є (0.8, 1.0) IF distance(me, GoalPart) < 10 AND angle(me, goalie, GoalPart) > 45 THEN Q є (0.8, 1.0) Formula 1 W 1 = 0.75 N 1 = 1 teammate Formula 2 W 1 = 1.33 N 1 = 3 goal parts Probability that Q є (0.8, 1.0): __exp(W 1 N 1 + W 1 N 1 )__ 1 + exp(W 1 N 1 + W 1 N 1 )

Grounded Markov Network Q є (0.8, 1.0) distance(me, teammate1) < 15 angle(me, goalie, teammate1) > 45 distance(me, goalRight) < 10 angle(me, goalie, goalRight) > 45 distance(me, goalLeft) < 10 angle(me, goalie, goalLeft) > 45

Learning an MLN Find good Q-value bins using hierarchical clustering Find good Q-value bins using hierarchical clustering Learn rules that classify examples into bins using inductive logic programming Learn rules that classify examples into bins using inductive logic programming Learn weights for these formulas to produce the final MLN Learn weights for these formulas to produce the final MLN

Binning via Hierarchical Clustering Frequency Q-value Frequency Q-value Frequency Q-value

Classifying Into Bins via ILP Given examples Given examples Positive: inside this Q-value bin Positive: inside this Q-value bin Negative: outside this Q-value bin Negative: outside this Q-value bin The Aleph* ILP learning system finds rules that separate positive from negative The Aleph* ILP learning system finds rules that separate positive from negative Builds rules one predicate at a time Builds rules one predicate at a time Top-down search through the feature space Top-down search through the feature space * Srinivasan, 2001

Learning Formula Weights Given formulas and examples Given formulas and examples Same examples as for ILP Same examples as for ILP ILP rules as network structure ILP rules as network structure Alchemy* finds weights that make the probability estimates accurate Alchemy* finds weights that make the probability estimates accurate Scaled conjugate-gradient algorithm Scaled conjugate-gradient algorithm * Kok, Singla, Richardson, Domingos, Sumner, Poon and Lowd,

Using an MLN Q-function Q є (0.8, 1.0) P 1 = 0.75 Q є (0.5, 0.8) P 2 = 0.15 Q є (0, 0.5) P 2 = 0.10 Q = P 1 ● E [Q | bin1] + P 2 ● E [Q | bin2] + P 3 ● E [Q | bin3] Q-value of most similar training example in bin

Example Similarity 1 1 E [Q | bin] = Q-value of most similar training example in bin Similarity = dot product of example vectors Example vector shows which bin rules the example satisfies Rule 1 Rule 2 Rule 3 …

Experiments Source task: 2-on-1 BreakAway Source task: 2-on-1 BreakAway 3000 existing games from the learning curve 3000 existing games from the learning curve Learn MLNs from 5 separate runs Learn MLNs from 5 separate runs Target task: 3-on-2 BreakAway Target task: 3-on-2 BreakAway Demonstration period of 100 games Demonstration period of 100 games Continue training up to 3000 games Continue training up to 3000 games Perform 5 target runs for each source run Perform 5 target runs for each source run

Discoveries Results can vary widely with the source- task chunk from which we transfer Results can vary widely with the source- task chunk from which we transfer Most methods use the “final” Q-function from the last chunk Most methods use the “final” Q-function from the last chunk MLN transfer performs better from chunks halfway through the learning curve MLN transfer performs better from chunks halfway through the learning curve

Results in 3-on-2 BreakAway

Conclusions MLN transfer can significantly improve initial target-task performance MLN transfer can significantly improve initial target-task performance Like macro transfer, it is an aggressive approach for tasks with similar strategies Like macro transfer, it is an aggressive approach for tasks with similar strategies It “lifts” transferred information to first-order logic, making it more general for transfer It “lifts” transferred information to first-order logic, making it more general for transfer Theory refinement in the target task may be viable through MLN revision Theory refinement in the target task may be viable through MLN revision

Potential Future Work Model screening for transfer learning Model screening for transfer learning Theory refinement in the target task Theory refinement in the target task Fully relational RL in RoboCup using MLNs as Q-function approximators Fully relational RL in RoboCup using MLNs as Q-function approximators

Acknowledgements DARPA Grant HR C-0060 DARPA Grant HR C-0060 DARPA Grant FA C-7606 DARPA Grant FA C-7606 Thank You