Relating Reinforcement Learning Performance to Classification performance Presenter: Hui Li Sept.11, 2006.

Slides:



Advertisements
Similar presentations
Is Random Model Better? -On its accuracy and efficiency-
Advertisements

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Markov Decision Process
1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.
Value Iteration & Q-learning CS 5368 Song Cui. Outline Recap Value Iteration Q-learning.
Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5
K-NEAREST NEIGHBORS AND DECISION TREE Nonparametric Supervised Learning.
Kshitij Judah, Alan Fern, Tom Dietterich TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: School of EECS, Oregon State.
Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.
Optimal Policies for POMDP Presented by Alp Sardağ.
Markov Decision Processes
Planning under Uncertainty
1 Policies for POMDPs Minqing Hu. 2 Background on Solving POMDPs MDPs policy: to find a mapping from states to actions POMDPs policy: to find a mapping.
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Reinforcement Learning.
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
1 Kunstmatige Intelligentie / RuG KI Reinforcement Learning Johan Everts.
Discretization Pieter Abbeel UC Berkeley EECS
Markov Decision Processes Value Iteration Pieter Abbeel UC Berkeley EECS TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Decision Trees with Minimal Costs (ICML 2004, Banff, Canada) Charles X. Ling, Univ of Western Ontario, Canada Qiang Yang, HK UST, Hong Kong Jianning Wang,
Reinforcement Learning Yishay Mansour Tel-Aviv University.
Boosting Main idea: train classifiers (e.g. decision trees) in a sequence. a new classifier should focus on those cases which were incorrectly classified.
Nycomed Chair for Bioinformatics and Information Mining Kernel Methods for Classification From Theory to Practice 14. Sept 2009 Iris Adä, Michael Berthold,
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
MAKING COMPLEX DEClSlONS
Isolated-Word Speech Recognition Using Hidden Markov Models
Search and Planning for Inference and Learning in Computer Vision
Topics on Final Perceptrons SVMs Precision/Recall/ROC Decision Trees Naive Bayes Bayesian networks Adaboost Genetic algorithms Q learning Not on the final:
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.
Utilities and MDP: A Lesson in Multiagent System Based on Jose Vidal’s book Fundamentals of Multiagent Systems Henry Hexmoor SIUC.
CS690L Data Mining: Classification
Reinforcement Learning Yishay Mansour Tel-Aviv University.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
MDPs (cont) & Reinforcement Learning
Automated Planning and Decision Making Prof. Ronen Brafman Automated Planning and Decision Making Fully Observable MDP.
ECE 471/571 – Lecture 20 Decision Tree 11/19/15. 2 Nominal Data Descriptions that are discrete and without any natural notion of similarity or even ordering.
Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach Aaron Wilson, Alan Fern, Prasad Tadepalli School of EECS Oregon State.
Reinforcement Learning for Mapping Instructions to Actions S.R.K. Branavan, Harr Chen, Luke S. Zettlemoyer, Regina Barzilay Computer Science and Artificial.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.
Bayes Risk Minimization using Metric Loss Functions R. Schlüter, T. Scharrenbach, V. Steinbiss, H. Ney Present by Fang-Hui, Chu.
1 Passive Reinforcement Learning Ruti Glick Bar-Ilan university.
Hidden Markov Models BMI/CS 576
Machine Learning: Ensemble Methods
Keep the Adversary Guessing: Agent Security by Policy Randomization
Partially Observable Markov Decision Process and RL
Machine Learning – Classification David Fenyő
Privacy-Preserving Data Mining
k-Nearest neighbors and decision tree
Bagging and Random Forests
Adversarial Learning for Neural Dialogue Generation
Mastering the game of Go with deep neural network and tree search
Ch9: Decision Trees 9.1 Introduction A decision tree:
Sampling Distributions
Markov Decision Processes
Importance Weighted Active Learning
Policy Gradient in Continuous Time
Propagating Uncertainty In POMDP Value Iteration with Gaussian Process
CS 188: Artificial Intelligence Fall 2007
Chapter 2: Evaluative Feedback
Markov Decision Problems
Chapter 10: Dimensions of Reinforcement Learning
Decision Trees for Mining Data Streams
PHYLOGENETIC TREES.
Reinforcement Nisheeth 18th January 2019.
CH2 Time series.
Chapter 2: Evaluative Feedback
Markov Decision Processes
Markov Decision Processes
Presentation transcript:

Relating Reinforcement Learning Performance to Classification performance Presenter: Hui Li Sept.11, 2006

Outline Motivation Reduction from reinforcement learning to classifier learning Results Conclusion

Motivation A Simple Relationship: The goal of reinforcement learning: The goal of (binary) classifier learning:

Motivation Question: The problem of classification has been intensively investigated The problem of reinforcement learning is still under investigation Is it possible to reduce the reinforcement learning to classifier learning ?

Reduction Definition: What is reinforcement learning problem A reinforcement learning problem D is defined as a conditional probability table D(o’,r|(o,a,r)*,o,a) on a set of observations O and rewards r [0,) given any history of past observations (o,a,r)*, actions (from action set A), and rewards.

Reduction 2. What is reinforcement learning goal Given some horizon T, find a policy , Maximizing the expected sum of rewards:

Reduction How to reduce a reinforcement learning problem to a cost-sensitive classification problem How to obtain training examples How to obtain training label How to define the cost of misclassification

Reduction A illustration of trajectory tree M = {S, A, D, Ps,a} Two actions {0,1} non-stationary policy

Reduction The value of the policy of a single step is estimated Which is explicitly written by The goal is the i-th realization

Reduction Value of the policy of a single step S0 = s0n S0 = s01 . . . … a = 0 … a = 0 a = 0 … a = 1 a = L-1 a = 1 a = L-1 a = 1 a = L-1 S1|0 S1|0 S1|0 S1|1 S1|L-1 S1|1 S1|L-1 S1|1 S1|L-1 the i-th realization

Reduction One step reduction: One step reinforcement learning problem Cost-sensitive classifier learning problem

Reduction where s0i: the ith sample (or data) : label wi: the costs of classifying example i to each of of possible labels.

Reduction Properties of cost The cost for misclassification is always positive The cost for correct classification is zero The larger the difference between the possible actions in terms of future reward, the larger the cost (or weight)

Reduction T-step MDP reduction How to find good policies for a T-step MDP by solving a sequence of weighted classification problems T-step policy  =(0, 1, … T-1) When updating t, hold the rest constant When updating t, the trees are pruned form the root to stage t by keeping only the branch which agree with controls 0, 1, … t-1

Reduction Rewards accumulated along the branch which agrees with the controls t+1, t+2, … T-1 Immediate reward Realization of the reward follows actions at stage t

Illustrative Example Two-step MDP problem: Continuous state space S = [0, 1] Binary action Space A = {0,1} Uniform distribution over the initial state

Illustrative Example Value function

Illustrative Example Path Taken by the algorithm