Topics: Introduction to Robotics CS 491/691(X) Lecture 12 Instructor: Monica Nicolescu.

Slides:

Advertisements

Similar presentations

Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Advertisements

ARCHITECTURES FOR ARTIFICIAL INTELLIGENCE SYSTEMS

Lecture 8: Three-Level Architectures CS 344R: Robotics Benjamin Kuipers.

Intelligent Agents Russell and Norvig: 2

1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.

AuRA: Principles and Practice in Review

1 Introduction to Robotics 13. Deliberative and Hybrid Control.

Autonomous Mobile Robots CPE 470/670 Lecture 12 Instructor: Monica Nicolescu.

Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Reinforcement Learning.

Reinforcement learning

LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.

Reinforcement Learning

Topics: Introduction to Robotics CS 491/691(X) Lecture 11 Instructor: Monica Nicolescu.

Reinforcement Learning Rafy Michaeli Assaf Naor Supervisor: Yaakov Engel Visit project’s home page at: FOR.

Autonomous Mobile Robots CPE 470/670 Lecture 11 Instructor: Monica Nicolescu.

Experiences with an Architecture for Intelligent Reactive Agents By R. Peter Bonasso, R. James Firby, Erann Gat, David Kortenkamp, David P Miller, Marc.

Autonomous Mobile Robots CPE 470/670 Lecture 12 Instructor: Monica Nicolescu.

SSS: A Hybrid Architecture Applied to Robot Navigation Jonathan H. Connell IBM T.J. Watson Research Center Review Paper By Kai Xu What’s this?

Autonomous Mobile Robots CPE 470/670 Lecture 8 Instructor: Monica Nicolescu.

Integrating POMDP and RL for a Two Layer Simulated Robot Architecture Presented by Alp Sardağ.

1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

Motor Schema Based Navigation for a Mobile Robot: An Approach to Programming by Behavior Ronald C. Arkin Reviewed By: Chris Miles.

1 Kunstmatige Intelligentie / RuG KI Reinforcement Learning Johan Everts.

Introduction to mobile robots Slides modified from Maja Mataric’s CSCI445, USC.

Behavior- Based Approaches Behavior- Based Approaches.

Advanced Topics in Robotics CS493/790 (X) Lecture 2 Instructor: Monica Nicolescu.

AuRA: Autonomous Robot Architecture From: Integrating Behavioral, Perceptual, and World Knowledge in Reactive Navigation Ron Arkin, 1990.

Inventing Hybrid Control The basic idea is simple: we want the best of both worlds (if possible). The goal is to combine closed-loop and open-loop execution.

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

Robotica Lezione 13. Lecture Outline  What is learning?  What is adaptation?  Why robot should learn?  Why is learning in robots hard?  Learning.

CHAPTER 12 ADVANCED INTELLIGENT SYSTEMS © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang.

Chapter 5 Models and theories 1. Cognitive modeling If we can build a model of how a user works, then we can predict how s/he will interact with the interface.

Mobile Robot Control Architectures “A Robust Layered Control System for a Mobile Robot” -- Brooks 1986 “On Three-Layer Architectures” -- Gat 1998? Presented.

Robotica Lezione 14. Lecture Outline  Neural networks  Classical conditioning  AHC with NNs  Genetic Algorithms  Classifier Systems  Fuzzy learning.

Robotica Lecture 3. 2 Robot Control Robot control is the mean by which the sensing and action of a robot are coordinated The infinitely many possible.

Autonomous Mobile Robots CPE 470/670 Lecture 12 Instructor: Monica Nicolescu.

Autonomous Mobile Robots CPE 470/670 Lecture 11 Instructor: Monica Nicolescu.

Reinforcement Learning

Lecture 10: 8/6/1435 Machine Learning Lecturer/ Kawther Abas 363CS – Artificial Intelligence.

Robotica Lecture 3. 2 Robot Control Robot control is the mean by which the sensing and action of a robot are coordinated The infinitely many possible.

Robotics Sharif In the name of Allah. Robotics Sharif Introduction to Robotics o Leila Sharif o o Lecture #3: The.

Architecture for Autonomous Assembly 1 Reid Simmons Robotics Institute Carnegie Mellon University.

University of Windsor School of Computer Science Topics in Artificial Intelligence Fall 2008 Sept 11, 2008.

Learning Agents MSE 2400 EaLiCaRA Spring 2015 Dr. Tom Way.

Robotica Lecture Review Reactive control Complete control space Action selection The subsumption architecture –Vertical vs. horizontal decomposition.

Introduction to Artificial Intelligence CS 438 Spring 2008 Today –AIMA, Ch. 25 –Robotics Thursday –Robotics continued Home Work due next Tuesday –Ch. 13:

Chapter 1. Cognitive Systems Introduction in Cognitive Systems, Christensen et al. Course: Robots Learning from Humans Park, Sae-Rom Lee, Woo-Jin Statistical.

Distributed Models for Decision Support Jose Cuena & Sascha Ossowski Pesented by: Gal Moshitch & Rica Gonen.

Course Overview  What is AI?  What are the Major Challenges?  What are the Main Techniques?  Where are we failing, and why?  Step back and look at.

Reinforcement learning (Chapter 21)

Selection of Behavioral Parameters: Integration of Case-Based Reasoning with Learning Momentum Brian Lee, Maxim Likhachev, and Ronald C. Arkin Mobile Robot.

Reinforcement Learning

Autonomous Mobile Robots CPE 470/670 Lecture 10 Instructor: Monica Nicolescu.

Trends in Robotics Research Classical AI Robotics (mid-70’s) Sense-Plan-Act Complex world model and reasoning Reactive Paradigm (mid-80’s) No models: “the.

Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.

Reinforcement Learning Guest Lecturer: Chengxiang Zhai Machine Learning December 6, 2001.

Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.

Presentation By SANJOG BHATTA Student ID : July 28’ 2009.

CS b659: Intelligent Robotics

Reinforcement learning (Chapter 21)

Today: Classic & AI Control Wednesday: Image Processing/Vision

Trends in Robotics Research

Announcements Homework 3 due today (grace period through Friday)

Deliberative & Hybrid Control

Symbolic cognitive architectures

Dr. Unnikrishnan P.C. Professor, EEE

Behavior Based Systems

Reinforcement Learning (2)

Morteza Kheirkhah University College London

Presentation transcript:

Topics: Introduction to Robotics CS 491/691(X) Lecture 12 Instructor: Monica Nicolescu

CS 491/691(X) - Lecture 122 Review Emergent behavior Deliberative systems –Planning –Drawbacks of SPA architectures Hybrid systems –Biological evidence –Components –Universal plans

CS 491/691(X) - Lecture 123 Hybrid Control Idea: get the best of both worlds Combine the speed of reactive control and t he brains of deliberative control Fundamentally different controllers must be made to work together –Time scales: short (reactive), long (deliberative) –Representations: none (reactive), elaborate world models (deliberative) This combination is what makes these systems hybrid

CS 491/691(X) - Lecture 124 Reaction – Deliberation Coordination Selection: Planning is viewed as configuration Advising: Planning is viewed as advice giving Adaptation: Planning is viewed as adaptation Postponing: Planning is viewed as a least commitment process

CS 491/691(X) - Lecture 125 Selection Example: AuRA Autonomous Robot Architecture (R. Arkin, ’86) –A deliberative hierarchical planner and a reactive controller based on schema theory Rule-based system A* planner Interface to human Plan sequencer Spatial reasoner Mission planner

CS 491/691(X) - Lecture 126 Advising Example: Atlantis E. Gat, Jet Propulsion Laboratory (1991) Three layers: –Deliberator: planning and world modeling –Sequencer: initiation and termination of low-level activities –Controller: collection of primitive activities Asynchronous, heterogeneous architecture Controller implemented in ALFA ( A Language for Action ) Introduces the notion of cognizant failure Planning results view as advice, not decree Tested on NASA rovers

CS 491/691(X) - Lecture 127 Atlantis Schematic

CS 491/691(X) - Lecture 128 Adaptation Example: Planner-Reactor D. Lyons (1992) The planner continuously modifies of the reactive control system Planning is a form of reactor adaptation –Monitor execution, adapts control system based on environment changes and changes of the robot’s goals Adaptation is on-line rather than off-line deliberation Planning is used to remove performance errors when they occur and improve plan quality Tested in assembly and grasp planning

CS 491/691(X) - Lecture 129 Postponing Example: PRS Procedural Reasoning System, Georgeff and A. Lansky (1987) Reactivity refers to postponement of planning until it is necessary Information necessary to make a decision is assumed to become available later in the process Plans are determined in reaction to current situation Previous plans can be interrupted and abandoned at any time Tested on SRI Flakey

CS 491/691(X) - Lecture 1210 Flakey the Robot

CS 491/691(X) - Lecture 1211 Postponing Example: SSS Servo Subsumption Symbolic, J. Connell (1992) 3 layers: servo, subsumption, symbolic World models are viewed as a convenience, not a necessity The symbolic layer selectively turns behaviors on/off and handles strategic decisions (where-to-go-next) The subsumption layer handles tactical decisions (where-to-go-now) The servo layer deals with making the robot go (continuous time) Tested on TJ

CS 491/691(X) - Lecture 1212 SSS Implementation: T J

CS 491/691(X) - Lecture 1213 Other Examples Multi-valued logic –Saffiotti, Konolige, Ruspini (SRI) –Variable planner-controller interface, strongly dependent on the context SOMASS hybrid assembly system –C. Malcolm and T. Smithers (Edinburgh U.) –Cognitive/subcognitive components –Cognitive component designed to be as ignorant as possible –Planning as configuration

CS 491/691(X) - Lecture 1214 Other Examples Agent architecture –B. Hayes-Roth (Stanford) –2 levels: physical and cognitive –Claim: reactive and deliberative behaviors can exist at each level  blurry functional boundary –Difference consists in: time-scale, symbolic/metric representation, level of abstraction Theo-Agent –T. Mitchell (CMU, 1990) –Reacts when it can plans when it must –Emphasis on learning: how to become more reactive?

CS 491/691(X) - Lecture 1215 More Examples Generic Robot Architecture –Noreils and Chatila (1995, France) –3 levels: planning, control system, functional –Formal method for designing and interfacing modules (task description language) Dynamical Systems Approach –Schoner and Dose (1992) –Influenced by biological systems –Planning is selecting and parameterizing behavioral fields –Behaviors use vector summation

CS 491/691(X) - Lecture 1216 More Examples Supervenience architecture –L. Spector (1992, U. of Maryland) –Integration based on “distance from the world” –Multiple levels of abstraction: perceptual, spatial, temporal, causal Teleo-reactive agent architecture –Benson and N. Nilsson (1995, Stanford) –Plans are built as sets of teleoreactive (TR) operators –Arbitrator selects operator for execution –Unifying representation for reasoning and reaction

CS 491/691(X) - Lecture 1217 More Examples Reactive Deliberation –M. Sahota (1993, U. of British Columbia) –Reactive executor: consists of action schemas –Deliberator: enables one schema at a time and provides parameter values  action selection –Robosoccer Integrated path planning and dynamic steering control – Krogh and C. Thorpe (1986, CMU) – Relaxation over grid-based model with potential fields controller – Planner generated waypoints for controller Many others (including several for UUVs)

CS 491/691(X) - Lecture 1218 BBS vs. Hybrid Control Both BBS and Hybrid control have the same expressive and computational capabilities –Both can store representations and look ahead BBS and Hybrid Control have different niches in the set of application domains –BBS: multi-robot domains, hybrid systems: single-robot domain Hybrid systems: –Environments and tasks where internal models and planning can be employed, and real-time demands are few Behavior-based systems: –Environments with significant dynamic changes, where looking ahead would be required

CS 491/691(X) - Lecture 1219 Adaptive Behavior Learning produces changes within an agent that over time enable it to perform more effectively within its environment Adaptation refers to an agent’s learning by making adjustments in order to be more attuned to its environment –Phenotypic (within an individual agent) or genotypic (evolutionary) –Acclimatization (slow) or homeostasis (rapid)

CS 491/691(X) - Lecture 1220 Types of Adaptation Behavioral adaptation –Behaviors are adjusted relative to each other Evolutionary adaptation –Descendants change over long time scales based on ancestor’s performance Sensory adaptation –Perceptual system becomes more attuned to the environment Learning as adaptation –Anything else that results in a more ecologically fit agent

CS 491/691(X) - Lecture 1221 Adaptive Control Astrom 1995 –Feedback is used to adjust controller’s internal parameters

CS 491/691(X) - Lecture 1222 Learning Learning can improve performance in additional ways: Introduce new knowledge (facts, behaviors, rules) Generalize concepts Specialize concepts for specific situations Reorganize information Create or discover new concepts Create explanations Reuse past experiences

CS 491/691(X) - Lecture 1223 At What Level Can Learning Occur? Within a behavior –Suitable stimulus for a particular response –Suitable response for a given stimulus –Suitable behavioral behavioral mapping –Magnitude of response –Whole new behaviors Within a behavior assemblage –Component behavior set –Relative strengths –Suitable coordination function

CS 491/691(X) - Lecture 1224 What Can BBS Learn? Entire new behaviors More effective responses New combinations of behaviors Coordination strategies between behaviors Structure of a robot’s body

CS 491/691(X) - Lecture 1225 Challenges of Learning Systems Credit assignment –How is credit/blame assigned to the components for the success or failure of the task? Saliency problem –What features are relevant to the learning task? New term problem –When to create a new concept/representation? Indexing problem –How can memory be efficiently organized? Utility problem –When/what to forget?

CS 491/691(X) - Lecture 1226 Classification of Learning Methods Tan 1991 Numeric vs. symbolic –Numeric: manipulate numeric quantities (neural networks) –Symbolic: manipulate symbolic representations Inductive vs. deductive –Inductive: generalize from examples –Deductive: produce a result from initial knowledge Continuous vs. batch –Continuous: during the robot’s performance in the world –Batch: from a large body of accumulated experience

CS 491/691(X) - Lecture 1227 Learning Methods Reinforcement learning Neural network (connectionist) learning Evolutionary learning Learning from experience –Memory-based –Case-based Learning from demonstration Inductive learning Explanation-based learning Multistrategy learning

CS 491/691(X) - Lecture 1228 Reinforcement Learning (RL) Motivated by psychology (the Law of Effect, Thorndike 1991): Applying a reward immediately after the occurrence of a response increases its probability of reoccurring, while providing punishment after the response will decrease the probability One of the most widely used methods for adaptation in robotics

CS 491/691(X) - Lecture 1229 Reinforcement Learning Combinations of stimuli (i.e., sensory readings and/or state) and responses (i.e., actions/behaviors) are given positive/negative reward in order to increase/decrease their probability of future use Desirable outcomes are strengthened and undesirable outcomes are weakened Critic: evaluates the system’s response and applies reinforcement –external: the user provides the reinforcement –internal: the system itself provides the reinforcement (reward function)

CS 491/691(X) - Lecture 1230 Decision Policy The robot can observe the state of the environment The robot has a set of actions it can perform –Policy: state/action mapping that determines which actions to take Reinforcement is applied based on the results of the actions taken –Utility: the function that gives a utility value to each state Goal: learn an optimal policy that chooses the best action for every set of possible inputs

CS 491/691(X) - Lecture 1231 Unsupervised Learning RL is an unsupervised learning method: –No target goal state Feedback only provides information on the quality of the system’s response –Simple: binary fail/pass –Complex: numerical evaluation Through RL a robot learns on its own, using its own experiences and the feedback received The robot is never told what to do

CS 491/691(X) - Lecture 1232 Challenges of RL Credit assignment problem: –When something good or bad happens, what exact state/condition-action/behavior should be rewarded or punished? Learning from delayed rewards: –It may take a long sequence of actions that receive insignificant reinforcement to finally arrive at a state with high reinforcement –How can the robot learn from reward received at some time in the future?

CS 491/691(X) - Lecture 1233 Challenges of RL Exploration vs. exploitation: –Explore unknown states/actions or exploit states/actions already known to yield high rewards Partially observable states –In practice, sensors provide only partial information about the state –Choose actions that improve observability of environment Life-long learning –In many situations it may be required that robots learn several tasks within the same environment

CS 491/691(X) - Lecture 1234 Types of RL Algorithms Adaptive Heuristic Critic (AHC) Learning the policy is separate from learning the utility function the critic uses for evaluation Idea: try different actions in different states and observe the outcomes over time

CS 491/691(X) - Lecture 1235 Q-Learning Watkins 1980’s A single utility Q-function is learned to evaluate both actions and states Q values are stored in a table Updated at each step, using the following rule: Q(x,a)  Q(x,a) +  (r + E(y) - Q(x,a)) x: state; a: action;  : learning rate; r: reward; : discount factor (0,1); E(y) is the utility of the state y: E(y) = max(Q(y,a))  actions a Guaranteed to converge to optimal solution, given infinite trials

CS 491/691(X) - Lecture 1236 Learning to Walk Maes, Brooks (1990) Genghis: hexapod robot Learned stable tripod stance and tripod gait Rule-based subsumption controller Two sensor modalities for feedback: –Two touch sensors to detect hitting the floor: - feedback –Trailing wheel to measure progress: + feedback

CS 491/691(X) - Lecture 1237 Learning to Walk Nate Kohl & Peter Stone (2004)

CS 491/691(X) - Lecture 1238 Learning to Push Mahadevan & Connell 1991 Obelix: 8 ultrasonic sensors, 1 IR, motor current Learned how to push a box (Q-learning) Motor outputs grouped into 5 choices: move forward, turn left or right (22 degrees), sharp turn left/right (45 degrees) 250,000 states

CS 491/691(X) - Lecture 1239 Readings Lecture notes