Viktor Zhumatiya, Faustino Gomeza,

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Dialogue Policy Optimisation
NEURAL NETWORKS Backpropagation Algorithm
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
RL for Large State Spaces: Value Function Approximation
DARPA Mobile Autonomous Robot SoftwareMay Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial.
Effective Reinforcement Learning for Mobile Robots Smart, D.L and Kaelbing, L.P.
Hybrid architecture for autonomous indoor navigation Georgia Institute of Technology CS 7630 – Autonomous Robotics Spring 2008 Serge Belinski Cyril Roussillon.
Planning under Uncertainty
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Reinforcement Learning
Machine LearningRL1 Reinforcement Learning in Partially Observable Environments Michael L. Littman.
Distributed Reinforcement Learning for a Traffic Engineering Application Mark D. Pendrith DaimlerChrysler Research & Technology Center Presented by: Christina.
Ratbert: Nearest Sequence Memory Based Prediction Model Applied to Robot Navigation by Sergey Alexandrov iCML 2003.
Integrating POMDP and RL for a Two Layer Simulated Robot Architecture Presented by Alp Sardağ.
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Octopus Arm Mid-Term Presentation Dmitry Volkinshtein & Peter Szabo Supervised by: Yaki Engel.
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
Reinforcement Learning Game playing: So far, we have told the agent the value of a given board position. How can agent learn which positions are important?
Reinforcement Learning Yishay Mansour Tel-Aviv University.
Based on slides by Nicholas Roy, MIT Finding Approximate POMDP Solutions through Belief Compression.
Monte Carlo Methods in Partial Differential Equations.
Radial Basis Function Networks
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
RL for Large State Spaces: Policy Gradient
Fuzzy control of a mobile robot Implementation using a MATLAB-based rapid prototyping system.
Helsinki University of Technology Adaptive Informatics Research Centre Finland Variational Bayesian Approach for Nonlinear Identification and Control Matti.
Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.
Conference Paper by: Bikramjit Banerjee University of Southern Mississippi From the Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence.
1 Constant Following Distance Simulations CS547 Final Project December 6, 1999 Jeremy Elson.
REINFORCEMENT LEARNING LEARNING TO PERFORM BEST ACTIONS BY REWARDS Tayfun Gürel.
Encoding Robotic Sensor States for Q-Learning using the Self-Organizing Map Gabriel J. Ferrer Department of Computer Science Hendrix College.
Segmental Hidden Markov Models with Random Effects for Waveform Modeling Author: Seyoung Kim & Padhraic Smyth Presentor: Lu Ren.
Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field.
Lecture 10: 8/6/1435 Machine Learning Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Robust localization algorithms for an autonomous campus tour guide Richard Thrapp Christian Westbrook Devika Subramanian Rice University Presented at ICRA.
Submitted by: Giorgio Tabarani, Christian Galinski Supervised by: Amir Geva CIS and ISL Laboratory, Technion.
Computer Vision Group Prof. Daniel Cremers Autonomous Navigation for Flying Robots Lecture 6.1: Bayes Filter Jürgen Sturm Technische Universität München.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama1)2) Hirotaka Hachiya1)2) Christopher Towell2) Sethu.
A Passive Approach to Sensor Network Localization Rahul Biswas and Sebastian Thrun International Conference on Intelligent Robots and Systems 2004 Presented.
Curiosity-Driven Exploration with Planning Trajectories Tyler Streeter PhD Student, Human Computer Interaction Iowa State University
Reinforcement Learning Yishay Mansour Tel-Aviv University.
Regularization and Feature Selection in Least-Squares Temporal Difference Learning J. Zico Kolter and Andrew Y. Ng Computer Science Department Stanford.
Learning to Navigate Through Crowded Environments Peter Henry 1, Christian Vollmer 2, Brian Ferris 1, Dieter Fox 1 Tuesday, May 4, University of.
ECE-7000: Nonlinear Dynamical Systems Overfitting and model costs Overfitting  The more free parameters a model has, the better it can be adapted.
Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning Minuru Asada,Shoichi Noda, Sukoya Tawarasudia, Koh Hosoda Presented.
Robotics Club: 5:30 this evening
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; January Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence.
1 Energy-Efficient Mobile Robot Exploration Yongguo Mei, Yung-Hsiang Lu Purdue University ICRA06.
1 (Chapter 3 of) Planning and Control in Stochastic Domains with Imperfect Information by Milos Hauskrecht CS594 Automated Decision Making Course Presentation.
Chapter 6 Neural Network.
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
Artificial Intelligence in Game Design Lecture 20: Hill Climbing and N-Grams.
ECE-7000: Nonlinear Dynamical Systems 3. Phase Space Methods 3.1 Determinism: Uniqueness in phase space We Assume that the system is linear stochastic.
Abstract LSPI (Least-Squares Policy Iteration) works well in value function approximation Gaussian kernel is a popular choice as a basis function but can.
Autonomous Mobile Robots Autonomous Systems Lab Zürich Probabilistic Map Based Localization "Position" Global Map PerceptionMotion Control Cognition Real.
Forward Until Near Stop when near a wall.
Reinforcement Learning
General-Purpose Learning Machine
Traffic Sign Recognition Using Discriminative Local Features Andrzej Ruta, Yongmin Li, Xiaohui Liu School of Information Systems, Computing and Mathematics.
CS b659: Intelligent Robotics
Schedule for next 2 weeks
Reinforcement learning (Chapter 21)
"Playing Atari with deep reinforcement learning."
RL for Large State Spaces: Value Function Approximation
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
CS 188: Artificial Intelligence Fall 2008
CS 416 Artificial Intelligence
Presentation transcript:

Metric State Space Reinforcement Learning for a Vision-Capable Mobile Robot Viktor Zhumatiya, Faustino Gomeza, Marcus Hutter a and Jürgen Schmidhubera,b (a) IDSIA, Switzerland (b)TU Munich, Germany Say why learning is needed at all

Learning environment: real robot, vision sensors Learning algorithm specifically targeted at real world mobile robots

Challenges of learning on vision-capable real robots Piecewise-continuous (PWC) control Partial observability (POMDP) High-dimensional sensors Costly exploration

Reinforcement learning RL: policy learning by autonomous environment exploration from reward signal Q-learning: estimation of discounted reward for each state-action pair Assumes that states st are fully observable at each moment In practice, only incomplete observations are available

Discrete and continuous versus PWC Transitions and reinforcements on actual robots differ from well-studied continuous and discrete cases Continuous Discrete PWC

Continuous and discrete versus PWC PWC is characterized by continuous and differentiable structure broken by jumps that appear when, for example, an object is hidden from view PWC Discrete Continuous

Candidate methods for PWC Discretizing state space with fixed /adaptive grid: artificial discontinuities Neural networks: do not model discontinuities CMAC & RBFs: knowledge of local scale required Instance-based memory: OK, but previously used with fixed scale

? POMDP What if the goal is not seen? Solution: use chain of observations for control. t=T-2 ? t=T-1 t=T

PC-NSM Nearest Sequence Memory (NSM) by McCallum: does everything we need, but discrete space and slow convergence Solution: modify to work in PWC + speed it up to use data more effectively = Piecewise-Continuous NSM (PC-NSM)

k nearest neighbours in the history NSM Description slide k nearest neighbours in the history q(t) -- Q-value for each moment in the history To compute Q(a?,T) – average q(t) over k nearest t where at=a? Do exploration action with probability epsilon Match length observation T

Pseudometric in original McCallum: Change 1: for PWC Pseudometric in original McCallum: 1/(1+<Number of matching observations>) Our metric:

Change 2: endogenous updates McCallum: update only for t=T-1 (with traces) PC-NSM update: Updates through all history needed since neighbourhoods change with new experience

Change 3: directed exploration Make least explored action greedily

Demonstration on a mobile robot

Goal: avoid wall collisions, go to the blue teapot Robot setup 4m 3m Sensory input vector: (x, y, isVisible, f, b) X Y Sonar sensor distance f b Target x-y position Goal: avoid wall collisions, go to the blue teapot Sonar sensor AXIS web camera

Move forward / backward approx 5 cm / 15 cm Actions Move forward / backward approx 5 cm / 15 cm Turn left / right 22.5o / 45o Exact values unimportant Stand still action Wait until robot stops before making the next action

Complete learning system

PC-NSM parameters epsilon-greedy policy with epsilon set to 0.3. (30% of the time the robot selects an exploratory action). The appropriate number of nearest neighbors, k, used to select actions, depends upon the noisiness of the environment. For the amount of noise in our sensors, we found that learning was fastest for k=3.

Reinforcement structure

Results: learned policy X Y Sonar sensor-measured distance f b Target x-y position Learned policy dimensionality reduction in the sensor space: varaible x, y; r, b walls are always far < turn left > turn right ^ move forward v move backward o stand still

Results: example trajectories A trajectory after learning White boxes mark the controller’s confusion resulted from sound-reflecting wall joints

Contributions and limitations An algorithm capable of learning on real vision-controlled robots is developed The algorithm is able to use modern vision preprocessing algorithms thanks to reliance on metric Limitations: Single metric may be too strict a limitation Exploration scheme is greedy

Future work Improved exploration Multimetric learning

Conclusion Requirements for real-world mobile robot learning defined An algorithm to satisfy these requirements is proposed Feasibility study on an actual robot is made