Subramanian Ramamoorthy School of Informatics The University of Edinburgh 29 October 2008.

Slides:



Advertisements
Similar presentations
Dialogue Policy Optimisation
Advertisements

CSCTR Session 11 Dana Retová.  Start bottom-up  Create cognition based on sensori-motor interaction ◦ Cohen et al. (1996) – Building a baby ◦ Cohen.
Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.
From
Modeling and simulation of systems Slovak University of Technology Faculty of Material Science and Technology in Trnava.
The Evolution of Spatial Outlier Detection Algorithms - An Analysis of Design CSci 8715 Spatial Databases Ryan Stello Kriti Mehra.
COORDINATION and NETWORKING of GROUPS OF MOBILE AUTONOMOUS AGENTS.
CHAPTER 8 A NNEALING- T YPE A LGORITHMS Organization of chapter in ISSO –Introduction to simulated annealing –Simulated annealing algorithm Basic algorithm.
Patch to the Future: Unsupervised Visual Prediction
Structure and Synthesis of Robot Motion Introduction: Some Concepts in Dynamics and Control Subramanian Ramamoorthy School of Informatics 23 January, 2012.
Two Technique Papers on High Dimensionality Allan Rempel December 5, 2005.
Introduction to Mobile Robotics Bayes Filter Implementations Gaussian filters.
Fusing Machine Learning & Control Theory With Applications to Smart Buildings & ActionWebs UC Berkeley ActionWebs Meeting November 03, 2010 By Jeremy Gillula.
Reinforcement Learning of Local Shape in the Game of Atari-Go David Silver.
Modeling and Planning with Robust Hybrid Automata Cooperative Control of Distributed Autonomous Vehicles in Adversarial Environments 2001 MURI: UCLA, CalTech,
Cooperative Q-Learning Lars Blackmore and Steve Block Expertness Based Cooperative Q-learning Ahmadabadi, M.N.; Asadpour, M IEEE Transactions on Systems,
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Probabilistic Robotics
Probabilistic Robotics Bayes Filter Implementations Gaussian filters.
D Nagesh Kumar, IIScOptimization Methods: M1L4 1 Introduction and Basic Concepts Classical and Advanced Techniques for Optimization.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Reinforcement Learning of Local Shape in the Game of Atari-Go David Silver.
Decentralised Coordination of Mobile Sensors School of Electronics and Computer Science University of Southampton Ruben Stranders,
Institute of Perception, Action and Behaviour (IPAB) Director: Prof. Sethu Vijayakumar.
Crowdsourcing Predictors of Behavioral Outcomes. Abstract Generating models from large data sets—and deter¬mining which subsets of data to mine—is becoming.
1 Endgame Logistics  Final Project Presentations  Tuesday, March 19, 3-5, KEC2057  Powerpoint suggested ( to me before class)  Can use your own.
Belief space planning assuming maximum likelihood observations Robert Platt Russ Tedrake, Leslie Kaelbling, Tomas Lozano-Perez Computer Science and Artificial.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
OBJECT FOCUSED Q-LEARNING FOR AUTONOMOUS AGENTS M. ONUR CANCI.
Outline What Neural Networks are and why they are desirable Historical background Applications Strengths neural networks and advantages Status N.N and.
Hierarchical Dirichlet Process (HDP) A Dirichlet process (DP) is a discrete distribution that is composed of a weighted sum of impulse functions. Weights.
Probabilistic Robotics Bayes Filter Implementations Gaussian filters.
1 Robot Environment Interaction Environment perception provides information about the environment’s state, and it tends to increase the robot’s knowledge.
Machine Learning.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 4, 2013.
MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.
Statistics and the Verification Validation & Testing of Adaptive Systems Roman D. Fresnedo M&CT, Phantom Works The Boeing Company.
Mobile Robot Localization (ch. 7)
Subramanian Ramamoorthy School of Informatics The University of Edinburgh 3 December 2008.
Structure and Synthesis of Robot Motion Topics in Control Subramanian Ramamoorthy School of Informatics 19 February, 2009.
Information Theory for Mobile Ad-Hoc Networks (ITMANET): The FLoWS Project Competitive Scheduling in Wireless Networks with Correlated Channel State Ozan.
Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005.
Distributed Q Learning Lars Blackmore and Steve Block.
Chapter 8. Learning of Gestures by Imitation in a Humanoid Robot in Imitation and Social Learning in Robots, Calinon and Billard. Course: Robots Learning.
Vision-based SLAM Enhanced by Particle Swarm Optimization on the Euclidean Group Vision seminar : Dec Young Ki BAIK Computer Vision Lab.
Copyright Paula Matuszek Kinds of Machine Learning.
Asymptotic Analysis for Large Scale Dynamic Stochastic Games Sachin Adlakha, Ramesh Johari, Gabriel Weintraub and Andrea Goldsmith DARPA ITMANET Meeting.
Control-Theoretic Approaches for Dynamic Information Assurance George Vachtsevanos Georgia Tech Working Meeting U. C. Berkeley February 5, 2003.
Learning Analytics isn’t new Ways in which we might build on the long history of adaptive learning systems within contemporary online learning design Professor.
Introduction to Machine Learning, its potential usage in network area,
Evolvable dialogue systems
Extensive-Form Game Abstraction with Bounds
Online Multiscale Dynamic Topic Models
IPAB Research Areas and Strengths
Statistical Models for Automatic Speech Recognition
CH. 1: Introduction 1.1 What is Machine Learning Example:
Dynamical Statistical Shape Priors for Level Set Based Tracking
"Playing Atari with deep reinforcement learning."
Clearing the Jungle of Stochastic Optimization
Harm van Seijen Bram Bakker Leon Kester TNO / UvA UvA
Announcements Homework 3 due today (grace period through Friday)
Against the Gods: Strategies for Robust Autonomous Behaviour
Dr. Unnikrishnan P.C. Professor, EEE
Optimal Control and Reachability with Competing Inputs
Market-based Dynamic Task Allocation in Mobile Surveillance Systems
Reinforcement Learning Dealing with Partial Observability
Unsupervised Perceptual Rewards For Imitation Learning
Presentation transcript:

Subramanian Ramamoorthy School of Informatics The University of Edinburgh 29 October 2008

& beyond Autonomous Robotics Autonomous Trading Agents Computational Biology 2008 [Komura, Larkworthy - PhD] [Prabhakaran - MSc] Warwick, Elder - MSc] [Havoutis – PhD, Clossick – MSc]

Supervised Learning Regression - mapping between states and actions (everything from neural networks to Gaussian processes), forward/inverse models Dynamic time series modelling - HMM, SLDS, etc. Un-supervised Learning ‘Feature’ extraction, e.g., animators need a few key variables/knobs Reinforcement Learning Planning and control is the raison d'être Open issues w.r.t. goal of robust autonomy: Hard to acquire dynamically dexterous behaviours over large domains Even when ML sub-component seems rigorous, there may be little understanding or leverage over global task-level dynamical properties What does Asimo need to know, so he can compensate for a hard push?

(X,U,W) Robust control Game, adversary, strategy (X,U) Feedback control & optimality (X,W) Verification (X) Motion synthesis & planning Time series models, PCA/NLDR Regression, Reinforcement learning Real-world apps need robust control (arbitrary disturbances, high-dim, constraints) – but it isn’t well studied in ML context

Robust control  Compute Best Response in a differential game against nature or other agents (especially when application requires interaction and autonomy) w are structured large deviations (beyond sensor noise…) Constrained high-dim partially-observed problem is hard! - How can we extend ML methods to address this?

Multi-scale Problem Formulation: 1. Sufficient Abstraction – dynamical primitives that suffice to answer qualitative questions, e.g., reachability 2. Use abstract plan to constrain on-line solution Data-driven Solution: Learn from on-line exploration and/or demonstration Example [Prabhakaran, MSc ’08] : To unknot a rope, first decide using topology (level sets of knot energy) This constrains space for motion planning and reinforcement learning 10 x faster computation

Nonlinear dimensionality reduction & latent variable methods focus on metric properties at bottom of lattice - X How can reductions preserve other dynamical properties? Behavioural (I/O dynamics) equivalence Similar reachable sets given disturbances Related Prior Work: Pappas & Tabuada: Bisimulation Amari & Ohara: Differential geometry of systems Recent work on use of symmetries & homomorphisms in RL (X,U,W) (X,U) (X,W) (X)

Switched dynamical systems are obvious candidates However, we also need autonomous transitions & concurrence Architecture should support meaningful inferences about global dynamical behaviour Can we do active learning to refine such models? splitting, merging, simplification using abstractions Related Prior Work: Extensive literature on (flavours of) HMM, SLDS, etc. Also, cPHA work by Tomlin, Williams, et al.

Related to shaping in Reinforcement Learning – but we want more leverage over dynamical behaviour Approaches in the tradition of optimal control theory: Data-driven multi-scale solution of the HJI equation Alternate game-theoretic solution concepts Related Prior Work: Adversarial RL via stochastic games - Littman, Uther & Veloso, Filar & Vrieze, etc.

In the spirit of recent work on robust control, the exercises in our earlier paper analyzed the performance of policy rules in worst-case scenarios, rather than on average. However, the more conventional approach to policy evaluation is to assess the expected loss for alternative policy rules with respect to the entire probability distribution of economic shocks, not just the most unfavorable outcomes. How to get “entire probability distribution of shocks”? Why model (prediction vs. qualitative behaviour)? How might a learning algorithm incorporate policy concerns? [B. Bernanke, M. Gertler, Should central banks respond to movements in asset prices? American Economic Review, To Appear]