Against the Gods: Strategies for Robust Autonomous Behaviour

Slides:



Advertisements
Similar presentations
Reactive and Potential Field Planners
Advertisements

Contact Mechanics B659: Principles of Intelligent Robot Motion Spring 2013 Kris Hauser.
Hierarchical Reinforcement Learning Amir massoud Farahmand
Motion Planning for Point Robots CS 659 Kris Hauser.
1. Algorithms for Inverse Reinforcement Learning 2
Benjamin Stephens Carnegie Mellon University 9 th IEEE-RAS International Conference on Humanoid Robots December 8, 2009 Modeling and Control of Periodic.
Control of Instantaneously Coupled Systems Applied to Humanoid Walking Eric C. Whitman & Christopher G. Atkeson Carnegie Mellon University.
Zach Ramaekers Computer Science University of Nebraska at Omaha Advisor: Dr. Raj Dasgupta 1.
9/23. Announcements Homework 1 returned today (Avg 27.8; highest 37) –Homework 2 due Thursday Homework 3 socket to open today Project 1 due Tuesday –A.
Approximation Metrics for Discrete and Continuous Systems Antoine Girard and George J. Pappas VERIMAG Workshop.
Institute of Perception, Action and Behaviour (IPAB) Director: Prof. Sethu Vijayakumar.
Belief space planning assuming maximum likelihood observations Robert Platt Russ Tedrake, Leslie Kaelbling, Tomas Lozano-Perez Computer Science and Artificial.
online convex optimization (with partial information)
Structure and Synthesis of Robot Motion Making Sense of Sensorimotor Systems Subramanian Ramamoorthy School of Informatics 26 January, 2012.
A Framework for Distributed Model Predictive Control
General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Duke University Machine Learning Group Discussion Leader: Kai Ni June 17, 2005.
Subramanian Ramamoorthy School of Informatics The University of Edinburgh 29 October 2008.
Generalized and Bounded Policy Iteration for Finitely Nested Interactive POMDPs: Scaling Up Ekhlas Sonu, Prashant Doshi Dept. of Computer Science University.
TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta.
Robust Space-Time Footsteps for Agent-Based Steering By Glen Berseth 1, Mubbasir Kapadia 2, Petros Faloutsos 3 University of British Columbia 1, Rutgers.
Feedback controller Motion planner Parameterized control policy (PID, LQR, …) One (of many) Integrated Approaches: Gain-Scheduled RRT Core Idea: Basic.
MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.
1 Distributed and Optimal Motion Planning for Multiple Mobile Robots Yi Guo and Lynne Parker Center for Engineering Science Advanced Research Computer.
Subramanian Ramamoorthy School of Informatics The University of Edinburgh 3 December 2008.
Structure and Synthesis of Robot Motion Introduction Subramanian Ramamoorthy School of Informatics 16 January, 2012.
Reinforcement Learning Dynamic Programming I Subramanian Ramamoorthy School of Informatics 31 January, 2012.
Reinforcement Learning Guest Lecturer: Chengxiang Zhai Machine Learning December 6, 2001.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Achieving Goals in Decentralized POMDPs Christopher Amato Shlomo Zilberstein UMass.
CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.
CSE Advanced Computer Animation Short Presentation Topic: Locomotion Kang-che Lee 2009 Fall 1.
Keep the Adversary Guessing: Agent Security by Policy Randomization
Optimal Acceleration and Braking Sequences for Vehicles in the Presence of Moving Obstacles Jeff Johnson, Kris Hauser School of Informatics and Computing.
Announcements Grader office hours posted on course website
Chapter 11: Artificial Intelligence
IPAB Research Areas and Strengths
Prof. Dr. Holger Schlingloff 1,2 Dr. Esteban Pavese 1
CS b659: Intelligent Robotics
Preface to the special issue on context-aware recommender systems
Reinforcement Learning (1)
Reinforcement Learning in POMDPs Without Resets
Reinforcement learning (Chapter 21)
Reinforcement Learning
Privacy and Fault-Tolerance in Distributed Optimization Nitin Vaidya University of Illinois at Urbana-Champaign.
Lab for Autonomous & Intelligent Robotic Systems (LAIRS)
CS 4700: Foundations of Artificial Intelligence
Biomedical Data & Markov Decision Process
Solving problems by searching
Artificial Intelligence
Clearing the Jungle of Stochastic Optimization
Markov Decision Processes
Autonomous Cyber-Physical Systems: Dynamical Systems
Markov Decision Processes
Announcements Homework 3 due today (grace period through Friday)
Chapter 3: The Reinforcement Learning Problem
Concurrent Graph Exploration with Multiple Robots
Hierarchical POMDP Solutions
Instructors: Fei Fang (This Lecture) and Dave Touretzky
A core course on Modeling kees van Overveld
Artificial Intelligence
Chapter 3: The Reinforcement Learning Problem
Reinforcement Learning
Chapter 3: The Reinforcement Learning Problem
October 6, 2011 Dr. Itamar Arel College of Engineering
ESE535: Electronic Design Automation
CMSC 471 – Fall 2011 Class #25 – Tuesday, November 29
Solving problems by searching
Topological Signatures For Fast Mobility Analysis
Reinforcement Learning in a Multi-Robot Domain
Reinforcement Learning Dealing with Partial Observability
Solving problems by searching
Presentation transcript:

Against the Gods: Strategies for Robust Autonomous Behaviour Subramanian Ramamoorthy School of Informatics The University of Edinburgh 3 December 2008

Autonomous Robots are Coming Here Physically Virtually

Some Observations about Robotics Autonomy is not useful unless it is also “robust” – a many-hued concept My focus is on strategy: systems issues vs. task Consider this origami robot [Balkcom & Mason at CMU]: Can we do such things autonomously using Nao/PR2 (robotic equivalents of the MITS Altair or Apple I from 1975) – in a semi-structured home environment? Autonomous robots must act in an adversarial world

What Problem is the Robot Solving? Perception High-level goals Adversarial actions & other agents Environment Adversarial actions & other agents Action Problem: How to generate actions, to achieve high-level goals, using limited perception and incomplete knowledge of environment & adversarial actions?

Robust Control and Decision Problem Robust control  Play a differential game against nature or other self-interested/cooperative agents w are adversarial actions (e.g., large deviations) Constrained high-dim partially-observed problem is hard!

Lattice of Control and Decision Problems Approach: Use this structure to devise abstractions & shape learning Adversary: Constraints impact immediate moves, e.g., state space subset rendered infeasible, and longer term (sequential decision making) (X,U,W) Robust control Game, adversary, strategy (X,U) Feedback control & optimality (X,W) Verification (X) Motion synthesis & planning Model incompleteness: Many constraints (e.g., c-space limits) play out at a slower time scale Can we combine such problem factorization and machine learning methods to learn solutions?

A Worked Example: Global Control of the Cart-Pole System

Introducing the Cart-Pole System System consists of two subsystems – pendulum and cart on finite track Only one actuator – cart We want global asymptotic stability of 4-dim system The Game: Experimenter hits the pole with arbitrary velocity at any time, system picks controls What are the weak sufficient conditions defining this task? Phase space of the pendulum

Dealing with the Adversary - Global Structure Adversary could push system anywhere, e.g., here Can describe global strategy as a qualitative transition graph Larger disturbances could truly change quantitative details, e.g., any number of rotations around origin The uncontrolled system converges to this point We want to reach and stay here

Describing Local Behaviour: Templates Lemma (Spring – Mass - Positive Damping): Let a system be described by where, and Then it is asymptotically stable at (0,0). Lemma (Spring – Mass - Negative Damping): Then it has an unstable fixed-point at (0,0), and no limit cycle.

Global Controller for Pendulum The control law: if Balance else if Pump else Spin Constraints:

The Global Control Law The switching strategy: If then Balance else if then Pump else Spin [Ramamoorthy & Kuipers, HSCC 02 & 03]

Demonstration on a physical set-up Result: Best Response computation for this game * A few more technical steps to ‘lift’ pendulum strategy to 4-dim

More Complex Examples

Bipedal Walking on Irregular Terrain Many constraints – dynamic stability, intermittent footholds Incomplete models: No high-dim models, only data from randomized exploration The Game: Nature picks foothold (on-line), robot picks trajectory

Structure of the Solution Define qualitative strategy in low-dimensions (finite horizon optimal control) (X,U,W) (X,U) (X,W) (X) Lift resulting strategy to the more complex c-space (presently unknown!)

Data-driven Approximation of Strategy

The Result – Humanoid Robot Simulation [Ramamoorthy & Kuipers RSS 06, ICRA 08]

Another Problem: (Un)tying Knots Task encoding: Knot energy shape descriptor For an n-edge polygonal knot Manipulation planning (Offline) Learn multi-scale structure in energy functional (Online) – Navigate a hierarchical graph The Game: Nature/adversary picks ways to deform/disturb object, robot picks manipulation actions

Simulation of Knot Untying Action synthesis: Shaped reinforcement learning (SARSA) Optimality of MDP solution is not compromised – knot energy is a valid potential energy 10x faster than uninformed RL For large problems, RL simply doesn’t converge within acceptable time – ours does [Also see poster by Sandhya Prabhakaran]

Current Work: Learning Abstractions and Strategies

Learning Abstractions Not hard to acquire low-dimensional models from data Simple tools like PCA/SVD have been around for a long time Recent explosion of non/semi-parametric methods Hard to summarize this information for use in the larger planning and control framework In order to reason about adversaries and actions My approach: Define notions of system equivalence –many geometric ideas Sampling-based algorithms to induce abstractions , with dim(A) << dim(Q)

Learning Global Strategies Shaping (PO)MDP and related models How to combine this with abstraction concepts and algorithms in previous slide? Multi-scale formulations of learning algorithms Risk-sensitive control – beyond simple best response Control learning is often driven by metrics related to predictive accuracy For robust control, we may be interested in quite different issues, e.g., large reachable sets from all c-space points Particularly relevant in electronic markets and competitive scenarios, i.e., agents with conflicting interests

Acknowledgements Pendulum and bipedal walking problems are from my PhD thesis – work with Benjamin Kuipers (U. Texas – Austin) Knots work was done by Sandhya Prabhakaran - MSc thesis Collaborators in my current & future work: Ioannis Havoutis, Thomas Larkworthy (PhD students) Sethu Vijayakumar, Taku Komura, Michael Herrmann (IPAB) Rahul Savani (Warwick) – algorithms for automated trading Ram Rajagopal (Berkeley) – sampling & non-parametric learning * The title of this talk is taken from a wonderful book by Peter Bernstein