Learning Momentum: Integration and Experimentation Brian Lee and Ronald C. Arkin Mobile Robot Laboratory Georgia Tech Atlanta, GA.

Slides:



Advertisements
Similar presentations
Feedback Control Real-Time Scheduling: Framework, Modeling, and Algorithms Chenyang Lu, John A. Stankovic, Gang Tao, Sang H. Son Presented by Josh Carl.
Advertisements

Reinforcement Learning
Genetic Algorithms (Evolutionary Computing) Genetic Algorithms are used to try to “evolve” the solution to a problem Generate prototype solutions called.
Georgia Tech / Mobile Intelligence 1 Multi-Level Learning in Hybrid Deliberative/Reactive Mobile Robot Architectural Software Systems DARPA MARS Review.
Class Project Due at end of finals week Essentially anything you want, so long as it’s AI related and I approve Any programming language you want In pairs.
DARPA Mobile Autonomous Robot SoftwareMay Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial.
Effective Reinforcement Learning for Mobile Robots Smart, D.L and Kaelbing, L.P.
Hybrid architecture for autonomous indoor navigation Georgia Institute of Technology CS 7630 – Autonomous Robotics Spring 2008 Serge Belinski Cyril Roussillon.
Speech Group INRIA Lorraine
An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.
AuRA: Principles and Practice in Review
Design of Attitude and Path Tracking Controllers for Quad-Rotor Robots using Reinforcement Learning Sérgio Ronaldo Barros dos Santos Cairo Lúcio Nascimento.
Bastien DURAND Karen GODARY-DEJEAN – Lionel LAPIERRE Robin PASSAMA – Didier CRESTANI 27 Janvier 2011 ConecsSdf Architecture de contrôle adaptative : une.
Multi-Level Learning in Hybrid Deliberative/Reactive Mobile Robot Architectural Software Systems Georgia Tech College of Computing Georgia Tech Research.
Autonomy for Ground Vehicles Status Report, July 2006 Sanjiv Singh Associate Research Professor Field Robotics Center Carnegie Mellon University.
Behavior-Based Formation Control for Multi-robot Teams Tucker Balch, and Ronald C. Arkin.
1 Motion Planning Algorithms : BUG-family. 2 To plan a path  find a continuous trajectory leading from initial position of the automaton (a mobile robot)
Sampling Strategies for Narrow Passages Presented by Irena Pashchenko CS326A, Winter 2004.
Cooperative Q-Learning Lars Blackmore and Steve Block Expertness Based Cooperative Q-learning Ahmadabadi, M.N.; Asadpour, M IEEE Transactions on Systems,
Integrating POMDP and RL for a Two Layer Simulated Robot Architecture Presented by Alp Sardağ.
Motor Schema Based Navigation for a Mobile Robot: An Approach to Programming by Behavior Ronald C. Arkin Reviewed By: Chris Miles.
A Probabilistic Approach to Collaborative Multi-robot Localization Dieter Fox, Wolfram Burgard, Hannes Kruppa, Sebastin Thrun Presented by Rajkumar Parthasarathy.
Reinforcement Learning Game playing: So far, we have told the agent the value of a given board position. How can agent learn which positions are important?
Exploration in Reinforcement Learning Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham, UK
AuRA: Autonomous Robot Architecture From: Integrating Behavioral, Perceptual, and World Knowledge in Reactive Navigation Ron Arkin, 1990.
8/9/20151 DARPA-MARS Kickoff Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT.
Marcus Gallagher and Mark Ledwich School of Information Technology and Electrical Engineering University of Queensland, Australia Sumaira Saeed Evolving.
Georgia Tech / Mobile Intelligence 1 Multi-Level Learning in Hybrid Deliberative/Reactive Mobile Robot Architectural Software Systems DARPA MARS Review.
A Randomized Approach to Robot Path Planning Based on Lazy Evaluation Robert Bohlin, Lydia E. Kavraki (2001) Presented by: Robbie Paolini.
Introduction to Behavior- Based Robotics Based on the book Behavior- Based Robotics by Ronald C. Arkin.
1 Constant Following Distance Simulations CS547 Final Project December 6, 1999 Jeremy Elson.
Behavior Based Robotics: A Wall Following Behavior Arun Mahendra - Dept. of Math, Physics & Engineering, Tarleton State University Mentor: Dr. Mircea Agapie.
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence.
1 A Bayesian Method for Guessing the Extreme Values in a Data Set Mingxi Wu, Chris Jermaine University of Florida September 2007.
Robotica Lecture 3. 2 Robot Control Robot control is the mean by which the sensing and action of a robot are coordinated The infinitely many possible.
Reinforcement Learning for Spoken Dialogue Systems: Comparing Strengths & Weaknesses for Practical Deployment Tim Paek Microsoft Research Dialogue on Dialogues.
Policy Enabled Handoff Across Heterogeneous Wireless Networks Helen W and Jochen Giese Computer Science, Berkeley.
Mobile Robot Navigation Using Fuzzy logic Controller
Spatio-Temporal Case-Based Reasoning for Behavioral Selection Maxim Likhachev and Ronald Arkin Mobile Robot Laboratory Georgia Tech.
1 Distributed and Optimal Motion Planning for Multiple Mobile Robots Yi Guo and Lynne Parker Center for Engineering Science Advanced Research Computer.
Adrian Treuille, Seth Cooper, Zoran Popović 2006 Walter Kerrebijn
Reinforcement Learning Control with Robust Stability Chuck Anderson, Matt Kretchmar, Department of Computer Science, Peter Young, Department of Electrical.
DARPA ITO/MARS Project Update Vanderbilt University A Software Architecture and Tools for Autonomous Robots that Learn on Mission K. Kawamura, M. Wilkes,
Artificial Intelligence in Game Design Complex Steering Behaviors and Combining Behaviors.
Evolving the goal priorities of autonomous agents Adam Campbell* Advisor: Dr. Annie S. Wu* Collaborator: Dr. Randall Shumaker** School of Electrical Engineering.
Georgia Tech / Mobile Intelligence 1 Multi-Level Learning in Hybrid Deliberative/Reactive Mobile Robot Architectural Software Systems DARPA MARS Kickoff.
Cooperative Q-Learning Lars Blackmore and Steve Block Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents Tan, M Proceedings of the.
Learning to Navigate Through Crowded Environments Peter Henry 1, Christian Vollmer 2, Brian Ferris 1, Dieter Fox 1 Tuesday, May 4, University of.
Introduction of Intelligent Agents
Behavior-based Multirobot Architectures. Why Behavior Based Control for Multi-Robot Teams? Multi-Robot control naturally grew out of single robot control.
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; January Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence.
Mobility Models for Wireless Ad Hoc Network Research EECS 600 Advanced Network Research, Spring 2005 Instructor: Shudong Jin March 28, 2005.
Selection of Behavioral Parameters: Integration of Case-Based Reasoning with Learning Momentum Brian Lee, Maxim Likhachev, and Ronald C. Arkin Mobile Robot.
Artificial Intelligence in Game Design Lecture 8: Complex Steering Behaviors and Combining Behaviors.
SurveyBOT Final Report Chris Johnson Miguel Lopez Jeremy Coffeen July 24, 2003 Georgia Institute of Technology School of Electrical and Computer Engineering.
Group 3 Ballfinder a highly modifiable system. Introduction Quality attribute: Modifiability Environment: Random maze Known number of balls One light.
Paging Area Optimization Based on Interval Estimation in Wireless Personal Communication Networks By Z. Lei, C. U. Saraydar and N. B. Mandayam.
Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.
Artificial Intelligence in Game Design Lecture 20: Hill Climbing and N-Grams.
Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning Maxim Likhachev, Michael Kaess, and Ronald C. Arkin Mobile Robot Laboratory.
Robot Intelligence Technology Lab. Evolutionary Robotics Chapter 3. How to Evolve Robots Chi-Ho Lee.
Machine Learning Supervised Learning Classification and Regression
Computer Simulation Henry C. Co Technology and Operations Management,
On-Line Markov Decision Processes for Learning Movement in Video Games
Sérgio Ronaldo Barros dos Santos Cairo Lúcio Nascimento Júnior
Navigation In Dynamic Environment
CIS 488/588 Bruce R. Maxim UM-Dearborn
Humanoid Motion Planning for Dual-Arm Manipulation and Re-Grasping Tasks Nikolaus Vahrenkamp, Dmitry Berenson, Tamim Asfour, James Kuffner, Rudiger Dillmann.
Area Coverage Problem Optimization by (local) Search
Presentation transcript:

Learning Momentum: Integration and Experimentation Brian Lee and Ronald C. Arkin Mobile Robot Laboratory Georgia Tech Atlanta, GA

Motivation It’s hard to manually derive controller parameters. The parameter space increases exponentially with the number of parameters. You don’t always have a priori knowledge of the environment. Without prior knowledge, a user can’t confidently derive appropriate parameter values, so it becomes necessary for the robot to adapt on its own to what it finds. Obstacle densities and layout in the environment may be heterogeneous. Parameters that work well for one type of environment may not work well with another type.

Adaptation and Learning Methods – DARPA MARS Investigate robot shaping at five distinct levels in a hybrid robot software architecture Implement algorithms within MissionLab mission specification system Conduct experiments to evaluate performance of each technique Combine techniques where possible Integrate on a platform more suitable for realistic missions and continue development

Overview of techniques CBR Wizardry Guide the operator Probabilistic Planning Manage complexity for the operator RL for Behavioral Assemblage Selection Learn what works for the robot CBR for Behavior Transitions Adapt to situations the robot can recognize Learning Momentum Vary robot parameters in real time THE LEARNING CONTINUUM: Deliberative (premission). Behavioral switching. Reactive (online adaptation)......

Basic Concepts of LM Provides adaptability to behavior-based systems A crude form of reinforcement learning. If the robot is doing well, keep doing what it’s doing, otherwise try something different. Behavior parameters are changed in response to progress and obstacles. The system is still fully reactive. Although the robot changes its behavior, there is no deliberation.

Currently Used Behaviors Move to Goal Always returns a vector pointing toward the goal position. Avoid Obstacles Returns a sum of weighted vectors pointing away from obstacles. Wander Returns vectors pointing in random directions.

Adjustable Parameters Move to goal vector gain Avoid obstacle vector gain Avoid obstacle sphere of influence Radius around the robot inside of which obstacles are perceived Wander vector gain Wander persistence The number of consecutive steps the wander vector points in the same direction

Four Predefined Situations no movement M < T movement progress toward the goal M > T movement P > T progress no progress with obstacles M > T movement P < T progress O count > T obstacles no progress without obstacles M > T movement P < T progress O count < T obstacles M = average movement M goal = average movement to the goal P = M goal / M O count = obstacles encountered T movement = movement threshold T progress = progress threshold T obstacles = obstacles threshold

Parameter adjustments Sample adjustment parameters for ballooning.

Two Possible Strategies Ballooning - Sphere of influence is increased when obstacles impede progress. The robot moves around large objects. Squeezing - Sphere of influence is decreased when obstacles impede progress. The robot moves between closely spaced objects.

Integration Base System Position and Goal Information Obstacle Information Move To Goal(G m ) Avoid Obstacles(G o,S) Wander(G w,P) ∑ Output direction SensorsController G m = goal gain G o = obstacle gain S = obstacle sphere of influence G w = wander gain P = wander persistence

Integration Integrated System Position and Goal Information Obstacle Information Move To Goal(G m ) Avoid Obstacles(G o,S) Wander(G w,P) ∑ Output direction SensorsController LM Module New G m, G o, S, G w, and P parameters. G m = goal gain G o = obstacle gain S = obstacle sphere of influence G w = wander gain P = wander persistence

Experiments in Simulation 150m x 150m area robot moves from (10m, 10m) to (140m, 90m) Obstacle densities of 15% and 20% were used. Obstacle radii varied between 0.38m and 1.43m.

Ballooning

Observations on Ballooning Covers a lot of area Not as easily trapped in box canyon situations May settle in locally clear areas May require a high wander gain to carry the robot through closely spaced obstacles

Squeezing

Observations on Squeezing Results in a straighter path Moves easily through closely spaced obstacles May get trapped in small box canyon situations for large amounts of time

Simulations of the Real World Start Place End Place 24m x 10m Simulated setup of the real world environment.

Completion Rates For Simulation Uniform Obstacle Size (1m radii) Varying Obstacle Sizes (0.38m m radii)

Average Steps to Completion Uniform Obstacle Size (1m radii) Varying Obstacle Sizes (0.38m m radii)

Results From Simulated Real Environment % CompleteSteps to Completion As before, there is an increase in completion rates with an accompanying increase in steps to completion.

Simulation Results Completion rates can be drastically improved. Completion rate improvements come at a cost of time. Ballooning and squeezing strategies are geared toward different situations.

Physical Robot Experiments Nomad 150 robot Sonar ring for obstacle avoidance Traverses the length of a 24m x 10m room while negotiating obstacles

Outdoor Run (adaptive)

Outdoor Run (non-adaptive)

Physical Experiment Results Non-learning robots became stuck. Learning robots successfully negotiated the obstacles. Squeezing was faster than ballooning in this case. Average steps to goal.

Conclusions Improved success has a price of time. Performance of one strategy is very poor in situations better suited for another strategy. The ballooning strategy is generally faster. Ballooning robots can move through closely spaced objects faster than squeezing robots can move out of box canyon situations.

Conclusions (cont’d) If some general knowledge of the terrain is know a priori, an appropriate strategy can be chosen. If terrain is totally unknown, ballooning is probably the better choice. A way to dynamically switch strategies should improve performance.