Autonomous Mobile Robots CPE 470/670 Lecture 12 Instructor: Monica Nicolescu.

Slides:



Advertisements
Similar presentations
Flexible Shaping: How learning in small steps helps Hierarchical Organization of Behavior, NIPS 2007 Kai Krueger and Peter Dayan Gatsby Computational Neuroscience.
Advertisements

1 Machine Learning: Lecture 1 Overview of Machine Learning (Based on Chapter 1 of Mitchell T.., Machine Learning, 1997)
Reinforcement Learning
Learning to Coordinate Behaviors Pattie Maes & Rodney A. Brooks Presented by: Javier Martinez.
Using Parallel Genetic Algorithm in a Predictive Job Scheduling
Bio-Inspired Optimization. Our Journey – For the remainder of the course A brief review of classical optimization methods The basics of several stochastic.
1 Lecture 35 Brief Introduction to Main AI Areas (cont’d) Overview  Lecture Objective: Present the General Ideas on the AI Branches Below  Introduction.
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Reinforcement Learning.
Neural Networks Basic concepts ArchitectureOperation.
Reinforcement Learning
Autonomous Mobile Robots CPE 470/670 Lecture 12 Instructor: Monica Nicolescu.
Autonomous Mobile Robots CPE 470/670 Lecture 8 Instructor: Monica Nicolescu.
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
1 Kunstmatige Intelligentie / RuG KI Reinforcement Learning Johan Everts.
Intelligent Agents revisited.
Topics: Introduction to Robotics CS 491/691(X) Lecture 13 Instructor: Monica Nicolescu.
Topics: Introduction to Robotics CS 491/691(X) Lecture 12 Instructor: Monica Nicolescu.
Advanced Topics in Robotics CS493/790 (X) Lecture 2 Instructor: Monica Nicolescu.
Evolutionary Reinforcement Learning Systems Presented by Alp Sardağ.
Kunstmatige Intelligentie / RuG KI Reinforcement Learning Sander van Dijk.
Ant Colony Optimization: an introduction
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Robotica Lezione 13. Lecture Outline  What is learning?  What is adaptation?  Why robot should learn?  Why is learning in robots hard?  Learning.
CHAPTER 12 ADVANCED INTELLIGENT SYSTEMS © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang.
Machine Learning. Learning agent Any other agent.
Genetic Algorithms and Ant Colony Optimisation
Chapter 14: Artificial Intelligence Invitation to Computer Science, C++ Version, Third Edition.
Robotica Lezione 14. Lecture Outline  Neural networks  Classical conditioning  AHC with NNs  Genetic Algorithms  Classifier Systems  Fuzzy learning.
Robotica Lecture 3. 2 Robot Control Robot control is the mean by which the sensing and action of a robot are coordinated The infinitely many possible.
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
1 Local search and optimization Local search= use single current state and move to neighboring states. Advantages: –Use very little memory –Find often.
Swarm Intelligence 虞台文.
Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (
Lecture 8: 24/5/1435 Genetic Algorithms Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Presented by Scott Lichtor An Introduction to Neural Networks.
Lecture 10: 8/6/1435 Machine Learning Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Robotica Lecture 3. 2 Robot Control Robot control is the mean by which the sensing and action of a robot are coordinated The infinitely many possible.
Chapter 40 Springer Handbook of Robotics, ©2008 Presented by:Shawn Kristek.
Introduction to Self-Organization
Evolving Virtual Creatures & Evolving 3D Morphology and Behavior by Competition Papers by Karl Sims Presented by Sarah Waziruddin.
Methodology of Simulations n CS/PY 399 Lecture Presentation # 19 n February 21, 2001 n Mount Union College.
Learning from observations
University of Windsor School of Computer Science Topics in Artificial Intelligence Fall 2008 Sept 11, 2008.
Neural Networks Chapter 7
Learning Agents MSE 2400 EaLiCaRA Spring 2015 Dr. Tom Way.
Robotica Lecture Review Reactive control Complete control space Action selection The subsumption architecture –Vertical vs. horizontal decomposition.
Chapter 1. Cognitive Systems Introduction in Cognitive Systems, Christensen et al. Course: Robots Learning from Humans Park, Sae-Rom Lee, Woo-Jin Statistical.
Distributed Models for Decision Support Jose Cuena & Sascha Ossowski Pesented by: Gal Moshitch & Rica Gonen.
CITS7212: Computational Intelligence An Overview of Core CI Technologies Lyndon While.
Behavior-based Multirobot Architectures. Why Behavior Based Control for Multi-Robot Teams? Multi-Robot control naturally grew out of single robot control.
Course Overview  What is AI?  What are the Major Challenges?  What are the Main Techniques?  Where are we failing, and why?  Step back and look at.
Reinforcement learning (Chapter 21)
Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10
Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.
A field of study that encompasses computational techniques for performing tasks that require intelligence when performed by humans. Simulation of human.
Organic Evolution and Problem Solving Je-Gun Joung.
Robot Intelligence Technology Lab. 10. Complex Hardware Morphologies: Walking Machines Presented by In-Won Park
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
Genetic Algorithm. Outline Motivation Genetic algorithms An illustrative example Hypothesis space search.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
Chapter 11: Artificial Intelligence
Done Done Course Overview What is AI? What are the Major Challenges?
Reinforcement learning (Chapter 21)
Reinforcement learning (Chapter 21)
Dr. Unnikrishnan P.C. Professor, EEE
Dr. Unnikrishnan P.C. Professor, EEE
Market-based Dynamic Task Allocation in Mobile Surveillance Systems
CHAPTER I. of EVOLUTIONARY ROBOTICS Stefano Nolfi and Dario Floreano
Unsupervised Perceptual Rewards For Imitation Learning
Morteza Kheirkhah University College London
Presentation transcript:

Autonomous Mobile Robots CPE 470/670 Lecture 12 Instructor: Monica Nicolescu

CPE 470/670 - Lecture 122 Learning & Adaptive Behavior Learning produces changes within an agent that over time enable it to perform more effectively within its environment Adaptation refers to an agent’s learning by making adjustments in order to be more attuned to its environment –Phenotypic (within an individual agent) or genotypic (evolutionary) –Acclimatization (slow) or homeostasis (rapid)

CPE 470/670 - Lecture 123 Learning Learning can improve performance in additional ways: Introduce new knowledge (facts, behaviors, rules) Generalize concepts Specialize concepts for specific situations Reorganize information Create or discover new concepts Create explanations Reuse past experiences

CPE 470/670 - Lecture 124 Learning Methods Reinforcement learning Neural network (connectionist) learning Evolutionary learning Learning from experience –Memory-based –Case-based Learning from demonstration Inductive learning Explanation-based learning Multistrategy learning

CPE 470/670 - Lecture 125 Reinforcement Learning (RL) Motivated by psychology (the Law of Effect, Thorndike 1991): Applying a reward immediately after the occurrence of a response increases its probability of reoccurring, while providing punishment after the response will decrease the probability One of the most widely used methods for adaptation in robotics

CPE 470/670 - Lecture 126 Reinforcement Learning Goal: learn an optimal policy that chooses the best action for every set of possible inputs Policy: state/action mapping that determines which actions to take Desirable outcomes are strengthened and undesirable outcomes are weakened Critic: evaluates the system’s response and applies reinforcement –external: the user provides the reinforcement –internal: the system itself provides the reinforcement (reward function)

CPE 470/670 - Lecture 127 Unsupervised Learning RL is an unsupervised learning method: –No target goal state Feedback only provides information on the quality of the system’s response –Simple: binary fail/pass –Complex: numerical evaluation Through RL a robot learns on its own, using its own experiences and the feedback received The robot is never told what to do

CPE 470/670 - Lecture 128 Challenges of RL Credit assignment problem: –When something good or bad happens, what exact state/condition-action/behavior should be rewarded or punished? Learning from delayed rewards: –It may take a long sequence of actions that receive insignificant reinforcement to finally arrive at a state with high reinforcement –How can the robot learn from reward received at some time in the future?

CPE 470/670 - Lecture 129 Challenges of RL Exploration vs. exploitation: –Explore unknown states/actions or exploit states/actions already known to yield high rewards Partially observable states –In practice, sensors provide only partial information about the state –Choose actions that improve observability of environment Life-long learning –In many situations it may be required that robots learn several tasks within the same environment

CPE 470/670 - Lecture 1210 Learning to Walk Maes, Brooks (1990) Genghis: hexapod robot Learned stable tripod stance and tripod gait Rule-based subsumption controller Two sensor modalities for feedback: –Two touch sensors to detect hitting the floor: - feedback –Trailing wheel to measure progress: + feedback

CPE 470/670 - Lecture 1211 Learning to Walk Nate Kohl & Peter Stone (2004)

CPE 470/670 - Lecture 1212 Supervised Learning Supervised learning requires the user to give the exact solution to the robot in the form of the error direction and magnitude The user must know the exact desired behavior for each situation Supervised learning involves training, which can be very slow; the user must supervise the system with numerous examples

CPE 470/670 - Lecture 1213 Neural Networks One of the most used supervised learning methods Used for approximating real-valued and vector- valued target functions Inspired from biology: learning systems are built from complex networks of interconnecting neurons The goal is to minimize the error between the network output and the desired output –This is achieved by adjusting the weights on the network connections

CPE 470/670 - Lecture 1214 ALVINN ALVINN (Autonomous Land Vehicle in a Neural Network) Dean Pomerleau (1991) Pittsburg to San Diego: 98.2% autonomous

CPE 470/670 - Lecture 1215 Learning from Demonstration & RL S. Schaal (’97) Pole balancing, pendulum-swing-up

CPE 470/670 - Lecture 1216 Classical Conditioning Pavlov 1927 Assumes that unconditioned stimuli (e.g. food) automatically generate an unconditioned response (e.g., salivation) Conditioned stimulus (e.g., ringing a bell) can, over time, become associated with the unconditioned response

CPE 470/670 - Lecture 1217 Darvin’s Perceptual Categorization Two types of stimulus blocks –6cm metallic cubes –Blobs: low conductivity (“bad taste”) –Stripes: high conductivity (“good taste”) Instead of hard-wiring stimulus-response rules, develop these associations over time Early trainingAfter the 10 th stimulus

CPE 470/670 - Lecture 1218 Genetic Algorithms Inspired from evolutionary biology Individuals in a populations have a particular fitness with respect to a task Individuals with the highest fitness are kept as survivors Individuals with poor performance are discarded: the process of natural selection Evolutionary process: search through the space of solutions to find the one with the highest fitness

CPE 470/670 - Lecture 1219 Genetic Operators Knowledge is encoded as bit strings: chromozome –Each bit represents a “gene” Biologically inspired operators are applied to yield better generations

CPE 470/670 - Lecture 1220 Evolving Structure and Control Karl Sims 1994 Evolved morphology and control for virtual creatures performing swimming, walking, jumping, and following Genotypes encoded as directed graphs are used to produce 3D kinematic structures Genotype encode points of attachment Sensors used: contact, joint angle and photosensors Video:

CPE 470/670 - Lecture 1221 Evolving Structure and Control Jordan Pollak –Real structures

CPE 470/670 - Lecture 1222 Learning from Demonstration Inspiration: Human-like teaching by demonstration Multiple means for interaction and learning: concurrent use of demonstration, verbal instruction, attentional cues, gestures, etc. Solution: Instructive demonstrations, generalization and practice DemonstrationRobot performance

CPE 470/670 - Lecture 1223 Robot Learning from other Robot Teachers Transfer of task knowledge from humans to robots, between heterogeneous robots Human demonstrationRobot performance

CPE 470/670 - Lecture 1224 Multirobot Systems Motivation –the task complexity is too high for a single robot –the task is inherently distributed –building several resource-bounded robots is much easier than having a single powerful robot –multiple robots can solve problems faster –the introduction of multiple robots increases robustness through redundancy

CPE 470/670 - Lecture 1225 Multirobot Systems – Control Approaches Collective swarms –robots execute their own tasks with only minimal need for knowledge about other robot team members –homogeneous teams –little explicit communication among robots Intentionally cooperative systems –have knowledge of the presence of other robots in the environment and act together to accomplish the same goal –strongly cooperative solutions: robots act in concert to achieve the goal, executing tasks that are not trivially serializable (require some type of communication and synchronization among the robots. –weakly cooperative solutions: robots have periods of operational independence –heterogeneous teams

CPE 470/670 - Lecture 1226 Architectures for Robot Teams How is group behavior generated from the control architectures of the individual robots in the team? Several approaches –centralized: coordinate the entire team from a single point of control –hierarchical: each robot oversees the actions of a relatively small group of other robots –decentralized: robots to take actions based only on knowledge local to their situation –hybrid: combine local control with higher-level control approaches

CPE 470/670 - Lecture 1227 Communication in Multirobot Systems Global solutions should be achieved through interaction of robots lacking global information Implicit communication through the world (stigmergy) –robots sense the effects of teammate’s actions through their effects on the world Passive action recognition –robots use sensors to directly observe the actions of their teammates Explicit (intentional) communication –robots directly and intentionally communicate relevant information through some active means, such as radio

CPE 470/670 - Lecture 1228 Task Allocation Each task can be worked on by different robots; each robot can work on a variety of different tasks Taxonomy (Gerkey & Matarić 2004) –Single robot tasks (SR): require only one robot at a time –Multirobot tasks (MR): require more than one robot working on the same task at the same time –Single task robots (ST): work on only one task at a time –Multitask robots (MT): work on multiple tasks at a time –Instantaneous Allocation (IA): optimize the instantaneous allocation –Time-extended Allocation (TA): optimize the assignments into the future

CPE 470/670 - Lecture 1229 Task Allocation ST-SR-IA: single-robot tasks are assigned once to single-task robots ST-SR-IA: the easiest - can be solved in polynomial time as an instance of the optimal assignment problem ST-MR-IA variant is an instance of the set partitioning problem, which is NP-hard ST-MR-TA, MT-SR-IA, and MT-SR-TA are also NP-hard Most approaches to task allocation in multirobot teams generate approximate solutions

CPE 470/670 - Lecture 1230 Readings M. Matarić: Chapters 17, 18