Autonomous Mobile Robots CPE 470/670 Lecture 12 Instructor: Monica Nicolescu.

Slides:



Advertisements
Similar presentations
Reinforcement Learning
Advertisements

ARCHITECTURES FOR ARTIFICIAL INTELLIGENCE SYSTEMS
Lecture 8: Three-Level Architectures CS 344R: Robotics Benjamin Kuipers.
AuRA: Principles and Practice in Review
Lecture 6: Hybrid Robot Control Gal A. Kaminka Introduction to Robots and Multi-Robot Systems Agents in Physical and Virtual Environments.
1 Lecture 35 Brief Introduction to Main AI Areas (cont’d) Overview  Lecture Objective: Present the General Ideas on the AI Branches Below  Introduction.
Autonomous Mobile Robots CPE 470/670 Lecture 12 Instructor: Monica Nicolescu.
EC: Lecture 17: Classifier Systems Ata Kaban University of Birmingham.
Autonomous Mobile Robots CPE 470/670 Lecture 11 Instructor: Monica Nicolescu.
Reinforcement Learning
Topics: Introduction to Robotics CS 491/691(X) Lecture 11 Instructor: Monica Nicolescu.
Brent Dingle Marco A. Morales Texas A&M University, Spring 2002
Autonomous Mobile Robots CPE 470/670 Lecture 11 Instructor: Monica Nicolescu.
Experiences with an Architecture for Intelligent Reactive Agents By R. Peter Bonasso, R. James Firby, Erann Gat, David Kortenkamp, David P Miller, Marc.
Autonomous Mobile Robots CPE 470/670 Lecture 8 Instructor: Monica Nicolescu.
Integrating POMDP and RL for a Two Layer Simulated Robot Architecture Presented by Alp Sardağ.
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Motor Schema Based Navigation for a Mobile Robot: An Approach to Programming by Behavior Ronald C. Arkin Reviewed By: Chris Miles.
Topics: Introduction to Robotics CS 491/691(X) Lecture 8 Instructor: Monica Nicolescu.
1 Kunstmatige Intelligentie / RuG KI Reinforcement Learning Johan Everts.
Introduction to mobile robots Slides modified from Maja Mataric’s CSCI445, USC.
Behavior- Based Approaches Behavior- Based Approaches.
Topics: Introduction to Robotics CS 491/691(X) Lecture 13 Instructor: Monica Nicolescu.
Topics: Introduction to Robotics CS 491/691(X) Lecture 12 Instructor: Monica Nicolescu.
IROS04 (Japan, Sendai) University of Tehran Amir massoud Farahmand - Majid Nili Ahmadabadi Babak Najar Araabi {mnili,
Advanced Topics in Robotics CS493/790 (X) Lecture 2 Instructor: Monica Nicolescu.
Evolutionary Reinforcement Learning Systems Presented by Alp Sardağ.
AuRA: Autonomous Robot Architecture From: Integrating Behavioral, Perceptual, and World Knowledge in Reactive Navigation Ron Arkin, 1990.
Inventing Hybrid Control The basic idea is simple: we want the best of both worlds (if possible). The goal is to combine closed-loop and open-loop execution.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Robotica Lezione 13. Lecture Outline  What is learning?  What is adaptation?  Why robot should learn?  Why is learning in robots hard?  Learning.
CHAPTER 12 ADVANCED INTELLIGENT SYSTEMS © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang.
Mobile Robot Control Architectures “A Robust Layered Control System for a Mobile Robot” -- Brooks 1986 “On Three-Layer Architectures” -- Gat 1998? Presented.
Chapter 11: Artificial Intelligence
Robotica Lezione 14. Lecture Outline  Neural networks  Classical conditioning  AHC with NNs  Genetic Algorithms  Classifier Systems  Fuzzy learning.
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence.
Robotica Lecture 3. 2 Robot Control Robot control is the mean by which the sensing and action of a robot are coordinated The infinitely many possible.
Autonomous Mobile Robots CPE 470/670 Lecture 12 Instructor: Monica Nicolescu.
Autonomous Mobile Robots CPE 470/670 Lecture 11 Instructor: Monica Nicolescu.
Autonomous Mobile Robots CPE 470/670 Lecture 8 Instructor: Monica Nicolescu.
Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (
Robotica Lecture 3. 2 Robot Control Robot control is the mean by which the sensing and action of a robot are coordinated The infinitely many possible.
Robotics Sharif In the name of Allah. Robotics Sharif Introduction to Robotics o Leila Sharif o o Lecture #3: The.
Evolving Virtual Creatures & Evolving 3D Morphology and Behavior by Competition Papers by Karl Sims Presented by Sarah Waziruddin.
University of Windsor School of Computer Science Topics in Artificial Intelligence Fall 2008 Sept 11, 2008.
Learning Agents MSE 2400 EaLiCaRA Spring 2015 Dr. Tom Way.
Chapter 4 Decision Support System & Artificial Intelligence.
Robotica Lecture Review Reactive control Complete control space Action selection The subsumption architecture –Vertical vs. horizontal decomposition.
Introduction to Artificial Intelligence CS 438 Spring 2008 Today –AIMA, Ch. 25 –Robotics Thursday –Robotics continued Home Work due next Tuesday –Ch. 13:
Chapter 1. Cognitive Systems Introduction in Cognitive Systems, Christensen et al. Course: Robots Learning from Humans Park, Sae-Rom Lee, Woo-Jin Statistical.
Behavior-based Multirobot Architectures. Why Behavior Based Control for Multi-Robot Teams? Multi-Robot control naturally grew out of single robot control.
Course Overview  What is AI?  What are the Major Challenges?  What are the Main Techniques?  Where are we failing, and why?  Step back and look at.
Trends in Robotics Research Classical AI Robotics (mid-70’s) Sense-Plan-Act Complex world model and reasoning Reactive Paradigm (mid-80’s) No models: “the.
Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.
Intelligent Agents Chapter 2. How do you design an intelligent agent? Definition: An intelligent agent perceives its environment via sensors and acts.
제 9 주. 응용 -4: Robotics Artificial Life and Real Robots R.A. Brooks, Proc. European Conference on Artificial Life, pp. 3~10, 1992 학습목표 시뮬레이션 로봇과 실제 로봇을.
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
A Presentation on Adaptive Neuro-Fuzzy Inference System using Particle Swarm Optimization and it’s Application By Sumanta Kundu (En.R.No.
Chapter 11: Artificial Intelligence
Done Done Course Overview What is AI? What are the Major Challenges?
Reinforcement learning (Chapter 21)
Intelligent Agents Chapter 2.
© James D. Skrentny from notes by C. Dyer, et. al.
Deliberative & Hybrid Control
CIS 488/588 Bruce R. Maxim UM-Dearborn
The use of Neural Networks to schedule flow-shop with dynamic job arrival ‘A Multi-Neural Network Learning for lot Sizing and Sequencing on a Flow-Shop’
Robot Intelligence Kevin Warwick.
Morteza Kheirkhah University College London
Presentation transcript:

Autonomous Mobile Robots CPE 470/670 Lecture 12 Instructor: Monica Nicolescu

CPE 470/670 - Lecture 122 Review Behavior coordination –Arbitration –Fusion Emergent behavior Deliberative systems –Planning –Drawbacks of SPA architectures

CPE 470/670 - Lecture 123 Hybrid Control Idea: get the best of both worlds Combine the speed of reactive control and t he brains of deliberative control Fundamentally different controllers must be made to work together –Time scales: short (reactive), long (deliberative) –Representations: none (reactive), elaborate world models (deliberative) This combination is what makes these systems hybrid

CPE 470/670 - Lecture 124 An Example A robot that has to deliver medication to a patient in a hospital Requirements: –Reactive: avoid unexpected obstacles, people, objects –Deliberative: use a map and plan short paths to destination What happens if: –The robot needs to deliver medication to a patient, but does not have a plan to his room? –The shortest path to its destination becomes blocked? –The patient was moved to another room? –The robot always goes to the same room?

CPE 470/670 - Lecture 125 Bottom-up Communication Dynamic Re-Planning If the reactive layer cannot do its job  It can inform the deliberative layer The information about the world is updated The deliberative layer will generate a new plan The deliberative layer cannot continuously generate new plans and update world information  the input from the reactive layer is a good indication of when to perform such an update

CPE 470/670 - Lecture 126 Top-Down Communication The deliberative layer provides information to the reactive layer –Path to the goal –Directions to follow, turns to take The deliberative layer may interrupt the reactive layer if better plans have been discovered Partial plans can also be used when there is no time to wait for the complete solution –Go roughly in the correct direction, plan for the details when getting close to destination

CPE 470/670 - Lecture 127 Reusing Plans Frequently planned decisions could be reused to avoid re-planning These can be stored in an intermediate layer and can be looked up when needed Useful when fast reaction is needed These mini-plans can be stored as contingency tables –intermediate-level actions –macro operators: plans compiled into more general operators for future use

CPE 470/670 - Lecture 128 Universal Plans Assume that we could pre-plan in advance for all possible situations that might come up Thus, we could generate and store all possible plans ahead of time For each situation a robot will have a pre-existing optimal plan, and will react optimally The robot has a universal plan : –A set of all possible plans for all initial states and all goals within the robot’s state space The system is a reactive controller!!

CPE 470/670 - Lecture 129 Applicability of Universal Plans Examples have been developed as situated automata Universal plans are not useful for the majority of real- world domains because: –The state space is too large for most realistic problems –The world must not change –The goals must not change Disadvantages of pre-compiled systems –Are not flexible in the presence of changing environments, tasks or goals –It is prohibitively large to enumerate the state space of a real robot, and thus pre-compiling generally does not scale up to complex systems

CPE 470/670 - Lecture 1210 Reaction – Deliberation Coordination Selection: Planning is viewed as configuration Advising: Planning is viewed as advice giving Adaptation: Planning is viewed as adaptation Postponing: Planning is viewed as a least commitment process

CPE 470/670 - Lecture 1211 Selection Example: AuRA Autonomous Robot Architecture (R. Arkin, ’86) –A deliberative hierarchical planner and a reactive controller based on schema theory Rule-based system A* planner Interface to human Plan sequencer Spatial reasoner Mission planner

CPE 470/670 - Lecture 1212 Advising Example: Atlantis E. Gat, Jet Propulsion Laboratory (1991) Three layers: –Deliberator: planning and world modeling –Sequencer: initiation and termination of low-level activities –Controller: collection of primitive activities Asynchronous, heterogeneous architecture Controller implemented in ALFA ( A Language for Action ) Introduces the notion of cognizant failure Planning results view as advice, not decree Tested on NASA rovers

CPE 470/670 - Lecture 1213 Adaptation Example: Planner-Reactor D. Lyons (1992) The planner continuously modifies the reactive control system Planning is a form of reactor adaptation –Monitor execution, adapts control system based on environment changes and changes of the robot’s goals Adaptation is on-line rather than off-line deliberation Planning is used to remove performance errors when they occur and improve plan quality Tested in assembly and grasp planning

CPE 470/670 - Lecture 1214 Postponing Example: PRS Procedural Reasoning System, Georgeff and A. Lansky (1987) Reactivity refers to postponement of planning until it is necessary Information necessary to make a decision is assumed to become available later in the process Plans are determined in reaction to current situation Previous plans can be interrupted and abandoned at any time Tested on SRI Flakey

CPE 470/670 - Lecture 1215 Flakey the Robot

CPE 470/670 - Lecture 1216 BBS vs. Hybrid Control Both BBS and Hybrid control have the same expressive and computational capabilities –Both can store representations and look ahead BBS and Hybrid Control have different niches in the set of application domains –BBS: multi-robot domains, hybrid systems: single-robot domain Hybrid systems: –Environments and tasks where internal models and planning can be employed, and real-time demands are few Behavior-based systems: –Environments with significant dynamic changes, where looking ahead would be required

CPE 470/670 - Lecture 1217 Learning & Adaptive Behavior Learning produces changes within an agent that over time enable it to perform more effectively within its environment Adaptation refers to an agent’s learning by making adjustments in order to be more attuned to its environment –Phenotypic (within an individual agent) or genotypic (evolutionary) –Acclimatization (slow) or homeostasis (rapid)

CPE 470/670 - Lecture 1218 Types of Adaptation Behavioral adaptation –Behaviors are adjusted relative to each other Evolutionary adaptation –Descendants change over long time scales based on ancestor’s performance Sensory adaptation –Perceptual system becomes more attuned to the environment Learning as adaptation –Anything else that results in a more ecologically fit agent

CPE 470/670 - Lecture 1219 Learning Learning can improve performance in additional ways: Introduce new knowledge (facts, behaviors, rules) Generalize concepts Specialize concepts for specific situations Reorganize information Create or discover new concepts Create explanations Reuse past experiences

CPE 470/670 - Lecture 1220 At What Level Can Learning Occur? Within a behavior –Suitable stimulus for a particular response –Suitable response for a given stimulus –Suitable behavioral mapping between stimulus and responses –Magnitude of response –Whole new behaviors Within a behavior assemblage –Component behavior set –Relative strengths –Suitable coordination function

CPE 470/670 - Lecture 1221 Challenges of Learning Systems Credit assignment –How is credit/blame assigned to the components for the success or failure of the task? Saliency problem –What features are relevant to the learning task? New term problem –When to create a new concept/representation? Indexing problem –How can memory be efficiently organized? Utility problem –When/what to forget?

CPE 470/670 - Lecture 1222 Learning Methods Reinforcement learning Neural network (connectionist) learning Evolutionary learning Learning from experience –Memory-based –Case-based Learning from demonstration Inductive learning Explanation-based learning Multistrategy learning

CPE 470/670 - Lecture 1223 Reinforcement Learning (RL) Motivated by psychology (the Law of Effect, Thorndike 1991): Applying a reward immediately after the occurrence of a response increases its probability of reoccurring, while providing punishment after the response will decrease the probability One of the most widely used methods for adaptation in robotics

CPE 470/670 - Lecture 1224 Reinforcement Learning Goal: learn an optimal policy that chooses the best action for every set of possible inputs Policy: state/action mapping that determines which actions to take Desirable outcomes are strengthened and undesirable outcomes are weakened Critic: evaluates the system’s response and applies reinforcement –external: the user provides the reinforcement –internal: the system itself provides the reinforcement (reward function)

CPE 470/670 - Lecture 1225 Challenges of RL Credit assignment problem: –When something good or bad happens, what exact state/condition-action/behavior should be rewarded or punished? Learning from delayed rewards: –It may take a long sequence of actions that receive insignificant reinforcement to finally arrive at a state with high reinforcement Exploration vs. exploitation: –Explore unknown states/actions or exploit states/actions already known to yield high rewards

CPE 470/670 - Lecture 1226 Learning to Walk Maes, Brooks (1990) Genghis: hexapod robot Learned stable tripod stance and tripod gait Rule-based subsumption controller Two sensor modalities for feedback: –Two touch sensors to detect hitting the floor: - feedback –Trailing wheel to measure progress: + feedback

CPE 470/670 - Lecture 1227 Learning to Walk Nate Kohl & Peter Stone (2004)

CPE 470/670 - Lecture 1228 Learning to Push Mahadevan & Connell 1991 Obelix: 8 ultrasonic sensors, 1 IR, motor current Learned how to push a box (Q-learning) Motor outputs grouped into 5 choices: move forward, turn left or right (22 degrees), sharp turn left/right (45 degrees) NEAR, FAR, STUCK, BUMP  250,000 states

CPE 470/670 - Lecture 1229 Supervised Learning Supervised learning requires the user to give the exact solution to the robot in the form of the error direction and magnitude The user must know the exact desired behavior for each situation Supervised learning involves training, which can be very slow; the user must supervise the system with numerous examples

CPE 470/670 - Lecture 1230 Neural Networks One of the most used supervised learning methods Used for approximating real-valued and vector- valued target functions Inspired from biology: learning systems are built from complex networks of interconnecting neurons The goal is to minimize the error between the network output and the desired output –This is achieved by adjusting the weights on the network connections

CPE 470/670 - Lecture 1231 ALVINN ALVINN (Autonomous Land Vehicle in a Neural Network) Dean Pomerleau (1991) Pittsburg to San Diego: 98.2% autonomous

CPE 470/670 - Lecture 1232 Classical Conditioning Pavlov 1927 Assumes that unconditioned stimuli (e.g. food) automatically generate an unconditioned response (e.g., salivation) Conditioned stimulus (e.g., ringing a bell) can, over time, become associated with the unconditioned response

CPE 470/670 - Lecture 1233 Darvin’s Perceptual Categorization Two types of stimulus blocks –6cm metallic cubes –Blobs: low conductivity (“bad taste”) –Stripes: high conductivity (“good taste”) Instead of hard-wiring stimulus-response rules, develop these associations over time Early trainingAfter the 10 th stimulus

CPE 470/670 - Lecture 1234 Genetic Algorithms Inspired from evolutionary biology Individuals in a populations have a particular fitness with respect to a task Individuals with the highest fitness are kept as survivors Individuals with poor performance are discarded: the process of natural selection Evolutionary process: search through the space of solutions to find the one with the highest fitness

CPE 470/670 - Lecture 1235 Genetic Operators Knowledge is encoded as bit strings: chromozome –Each bit represents a “gene” Biologically inspired operators are applied to yield better generations

CPE 470/670 - Lecture 1236 Evolving Structure and Control Karl Sims 1994 Evolved morphology and control for virtual creatures performing swimming, walking, jumping, and following Genotypes encoded as directed graphs are used to produce 3D kinematic structures Genotype encode points of attachment Sensors used: contact, joint angle and photosensors

CPE 470/670 - Lecture 1237 Evolving Structure and Control Jordan Pollak –Real structures

CPE 470/670 - Lecture 1238 Readings M. Matarić: Chapters 17, 18 Lecture notes