Planning, Acting, and Learning Chapter 10. 2 Contents The Sense/Plan/Act Cycle Approximate Search Learning Heuristic Functions Rewards Instead of Goals.

Slides:



Advertisements
Similar presentations
Artificial Intelligence By Mr. Ejaz CIIT Sahiwal.
Advertisements

RL for Large State Spaces: Value Function Approximation
Partially Observable Markov Decision Process (POMDP)
NIPS 2007 Workshop Welcome! Hierarchical organization of behavior Thank you for coming Apologies to the skiers… Why we will be strict about timing Why.
11 Planning and Learning Week #9. 22 Introduction... 1 Two types of methods in RL ◦Planning methods: Those that require an environment model  Dynamic.
Ai in game programming it university of copenhagen Reinforcement Learning [Outro] Marco Loog.
Biointelligence Lab School of Computer Sci. & Eng. Seoul National University Artificial Intelligence Chapter 8 Uninformed Search.
Reinforcement Learning
Best-First Search: Agendas
Problem Solving and Search in AI Part I Search and Intelligence Search is one of the most powerful approaches to problem solving in AI Search is a universal.
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Reinforcement Learning.
R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Chapter 9: Planning and Learning pUse of environment models pIntegration of planning.
Lecture 14: Collaborative Filtering Based on Breese, J., Heckerman, D., and Kadie, C. (1998). Empirical analysis of predictive algorithms for collaborative.
Chapter 8: Generalization and Function Approximation pLook at how experience with a limited part of the state set be used to produce good behavior over.
Reinforcement Learning
Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Reinforcement Learning Introduction Presented by Alp Sardağ.
1 Kunstmatige Intelligentie / RuG KI Reinforcement Learning Johan Everts.
Making Decisions CSE 592 Winter 2003 Henry Kautz.
R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Chapter 9: Planning and Learning pUse of environment models pIntegration of planning.
Kunstmatige Intelligentie / RuG KI Reinforcement Learning Sander van Dijk.
Artificial Intelligence Course outline Introduction Problem solving Generic algorithms Knowledge Representation and Reasoning Expert Systems Uncertainty.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Reinforcement Learning
Chapter 8: Generalization and Function Approximation pLook at how experience with a limited part of the state set be used to produce good behavior over.
Dr.Abeer Mahmoud ARTIFICIAL INTELLIGENCE (CS 461D) Dr. Abeer Mahmoud Computer science Department Princess Nora University Faculty of Computer & Information.
© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.
Reinforcement Learning
Curiosity-Driven Exploration with Planning Trajectories Tyler Streeter PhD Student, Human Computer Interaction Iowa State University
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Neural Networks Chapter 7
Regularization and Feature Selection in Least-Squares Temporal Difference Learning J. Zico Kolter and Andrew Y. Ng Computer Science Department Stanford.
Search CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
Artificial Intelligence Chapter 10 Planning, Acting, and Learning Biointelligence Lab School of Computer Sci. & Eng. Seoul National University.
Model Minimization in Hierarchical Reinforcement Learning Balaraman Ravindran Andrew G. Barto Autonomous Learning Laboratory.
1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 14: Planning and Learning Dr. Itamar Arel College of Engineering Department of Electrical.
Artificial Intelligence Lecture No. 6 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
Chapter 10 Planning, Acting, and Learning. 2 Contents The Sense/Plan/Act Cycle Approximate Search Learning Heuristic Functions Rewards Instead of Goals.
Possible actions: up, down, right, left Rewards: – 0.04 if non-terminal state Environment is observable (i.e., agent knows where it is) MDP = “Markov Decision.
Lecture 2: Problem Solving using State Space Representation CS 271: Fall, 2008.
Biointelligence Lab School of Computer Sci. & Eng. Seoul National University Artificial Intelligence Chapter 8 Uninformed Search.
Computational Modeling Lab Wednesday 18 June 2003 Reinforcement Learning an introduction part 5 Ann Nowé By Sutton.
1 Passive Reinforcement Learning Ruti Glick Bar-Ilan university.
Breadth First and Depth First
Plan Agents Chapter 7..
Chapter 6: Temporal Difference Learning
Design and Analysis of Algorithm
Reinforcement Learning
Intelligent Information System Lab
CS 188: Artificial Intelligence
UAV Route Planning in Delay Tolerant Networks
CMSC 671 – Fall 2010 Class #22 – Wednesday 11/17
Announcements Homework 3 due today (grace period through Friday)
RL for Large State Spaces: Value Function Approximation
CSE (c) S. Tanimoto, 2001 Search-Introduction
Chapter 8: Generalization and Function Approximation
Artificial Intelligence Chapter 10 Planning, Acting, and Learning
Chapter 6: Temporal Difference Learning
Chapter 9: Planning and Learning
LECTURE 15: REESTIMATION, EM AND MIXTURES
CS 188: Artificial Intelligence Fall 2008
CMSC 471 – Fall 2011 Class #25 – Tuesday, November 29
Machine Learning: UNIT-4 CHAPTER-1
CS 188: Artificial Intelligence Spring 2006
Artificial Intelligence Chapter 10 Planning, Acting, and Learning
Actively Learning Ontology Matching via User Interaction
October 20, 2010 Dr. Itamar Arel College of Engineering
Morteza Kheirkhah University College London
Presentation transcript:

Planning, Acting, and Learning Chapter 10

2 Contents The Sense/Plan/Act Cycle Approximate Search Learning Heuristic Functions Rewards Instead of Goals

3 Learning Heuristic Functions Learning from experiences continuous feedback from the environment is one way to reduce uncertainties and to compensate for an agent ’ s lack of knowledge about the effects of its actions. Useful information can be extracted from the experience of interacting the environments. Explicit Graphs and Implicit Graphs

4 Learning Heuristic Functions Explicit Graphs Agent has a good model of the effects of its actions and knows the costs of moving from any node to its successor nodes. C(n i, n j ): the cost of moving from n i to n j.  (n 0, a): the description of the state reached from node n after taking action a. DYNA [Sutton 1990] Combination of “ learning in the world ” with “ learning and planning in the model ”.

5 Learning Heuristic Functions Implicit Graphs Impractical to make an explicit graph or table of all the nodes and their transitions. To learn the heuristic function while performing a search process. e.g.) Eight-puzzle W(n): the number of tiles in the wrong place, P(n): the sum of the distances that each tile if from “ home ”

6 Learning Heuristic Functions Learning the weights Minimizing the sum of the squared errors between the training samples and the h ’ function given by the weighted combination. Node expansion Temporal difference learning [Sutton 1988]: the weight adjustment depends only on two temporally adjacent values of a function.

7 Rewards Instead of Goals State-space search more theoretical conditions It is assumed that the agent had a single, short-term task that could be described by a goal condition. Practical problem the task cannot be so simply stated. The user expresses his or her satisfaction and dissatisfaction with task performance by giving the agent positive and negative rewards. The task for the agent can be formalized to maximize the amount of reward it receives.

8 Rewards Instead of Goals Seeking an action policy that maximizes reward Policy Improvement by Its Iteration  : policy function on nodes whose value is the action prescribed by that policy at that node. r(n i, a): the reward received by the agent when it takes an action a at n i.  (n j ): the value of any special reward given for reaching node n j.

9 Value iteration [Barto, Bradtke, and Singh, 1995] delayed-reinforcement learning learning action policies in settings in which rewards depend on a sequence of earlier actions temporal credit assignment credit those state-action pairs most responsible for the reward structural credit assignment in state space too large for us to store the entire graph, we must aggregate states with similar V ’ values. [Kaelbling, Littman, and Moore, 1996]