By: Stephen Robertson Supervisor: Phil Sterne

Slides:



Advertisements
Similar presentations
Operations Scheduling
Advertisements

Hierarchical Reinforcement Learning Amir massoud Farahmand
1 Assessment of Learning Outcomes: conceptual and practical challenges Paul Black Department of Education King’s College London.
Solving Problem by Searching
Tree Diagrams 1. Learning Objectives Upon completing this module, you will be able to:  Understand the purpose and use of a Tree Diagram (TD)  Construct,
Introduction to Hierarchical Reinforcement Learning Jervis Pinto Slides adapted from Ron Parr (From ICML 2005 Rich Representations for Reinforcement Learning.
Using Hierarchical Reinforcement Learning to Solve a Problem with Multiple Conflicting Sub-problems.
Design Patterns for Object Oriented systems CSC 515 Ashwin Dandwate.
Pieter Abbeel and Andrew Y. Ng Apprenticeship Learning via Inverse Reinforcement Learning Pieter Abbeel Stanford University [Joint work with Andrew Ng.]
Pieter Abbeel and Andrew Y. Ng Apprenticeship Learning via Inverse Reinforcement Learning Pieter Abbeel Stanford University [Joint work with Andrew Ng.]
Hierarchical Reinforcement Learning Ersin Basaran 19/03/2005.
Pieter Abbeel and Andrew Y. Ng Apprenticeship Learning via Inverse Reinforcement Learning Pieter Abbeel and Andrew Y. Ng Stanford University.
Pieter Abbeel and Andrew Y. Ng Reinforcement Learning and Apprenticeship Learning Pieter Abbeel and Andrew Y. Ng Stanford University.
Reinforcement Learning and Soar Shelley Nason. Reinforcement Learning Reinforcement learning: Learning how to act so as to maximize the expected cumulative.
HIV/AIDS Patient Care Programs
New Mexico Computer Science for All Computational Science Investigations (from the Supercomputing Challenge Kickoff 2012) Irene Lee December 9, 2012.
Applying reinforcement learning to Tetris A reduction in state space Underling : Donald Carr Supervisor : Philip Sterne.
ORDER OF OPERATIONS x 2 Evaluate the following arithmetic expression: x 2 Each student interpreted the problem differently, resulting in.
IXA 1234 : C++ PROGRAMMING CHAPTER 1. PROGRAMMING LANGUAGE Programming language is a computer program that can solve certain problem / task Keyword: Computer.
About Us Our vision is to eradicate poverty housing worldwide We are motivated by our Christian faith We work by empowering local communities to solve.
Using Hierarchical Reinforcement Learning to Balance Conflicting Sub- problems By: Stephen Robertson Supervisor: Phil Sterne.
Curiosity-Driven Exploration with Planning Trajectories Tyler Streeter PhD Student, Human Computer Interaction Iowa State University
1 Feature selection with conditional mutual information maximin in text categorization CIKM2004.
The Arab-Israeli Conflict
Progress presentation
Biologically Inspired Computation Ant Colony Optimisation.
Adaptive Reinforcement Learning Agents in RTS Games Eric Kok.
HOW TECHNOLOGY WORKS. TECHNOLOGY Using knowledge to develop products and systems that satisfy needs, solve problems, and increase our capabilities.
Class 17, October 29, 2015 Lesson 3.2.  By the end of this lesson, you should understand that: ◦ Units can be used in dimensional analysis to set up.
TD(0) prediction Sarsa, On-policy learning Q-Learning, Off-policy learning.
LO.2.2 MB1 How can you simplify an experienced maintenance procedure in order to produce a maintenance plan that is more efficient and easier to follow?
Simplifying Dynamic Programming Jamil Saquer & Lloyd Smith Computer Science Department Missouri State University Springfield, MO USA.
Introduction to Machine Learning, its potential usage in network area,
Chapter 12: Simulation and Modeling
Computational Thinking, Problem-solving and Programming: General Principals IB Computer Science.
Discrete Mathematics Algorithms.
A Comparison of Learning Algorithms on the ALE
Completing the Square 8
Fast Kernel-Density-Based Classification and Clustering Using P-Trees
Focus 4 Fraction Operations
CSCI-100 Introduction to Computing
Parallel Density-based Hybrid Clustering
Communication Technology
MSP Le t’s Get Started!.
Unit 6: Cognition WHS AP Psychology
Functions of Management Part I
نتعارف لنتألف في التعارف تألف (( الأرواح جنود مجندة , ماتعارف منها أئتلف , وماتنافر منها اختلف )) نماذج من العبارات الايجابية.
Affiliation of presenter
Exploiting Graphical Structure in Decision-Making
Machine Learning with Weka
The First Humans.
CAFOD Educational Resources Session 5A
Apprenticeship Learning via Inverse Reinforcement Learning
Unit 7: Cognition AP Psychology
Operations with Imaginary Numbers
Simplify and identify all x-values that are undefined
Fourier series Periodic functions and signals may be expanded into a series of sine and cosine functions
Unit 7: Cognition AP Psychology
5.4 – Complex Numbers.
Who Wants to be an Equationaire?. Who Wants to be an Equationaire?
Lecture 42.
Number Lines.
Work Breakdown Structure Tasks and Sub-Tasks
Lesson 3 Consumer Science

in Antony and Cleopatra
Learning Objective Solving equations with single brackets
HIV/AIDS Patient Care Programs
Continuous Curriculum Learning for RL
Nutrition. Good Food Health.
Presentation transcript:

By: Stephen Robertson Supervisor: Phil Sterne Using Hierarchical Reinforcement Learning to Solve a Problem with Multiple Conflicting Sub-problems By: Stephen Robertson Supervisor: Phil Sterne

Presentation Outline Project Motivation Project Aim Progress so far The Gridworld Problem Flat Reinforcement Learning Implementation Results Still to do

Project Motivation Reinforcement Learning is an attractive form of machine learning, but because of the curse of dimensionality, with complex problems it becomes inefficient Hierarchical Reinforcement Learning is a method for dealing with this curse of dimensionality

Project Aim Implementing various algorithms of Hierarchical Reinforcement Learning to a complex gridworld problem Comparing the various algorithms to each other and to flat Reinforcement Learning

Progress Gridworld Implemented in Java Flat Reinforcement Learning Implemented on a 6x6 gridworld in Java Feudal Reinforcement Learning in the process of being implemented

Rules of the gridworld Possible Actions: Left, Right, Up, Down and Rest Collecting food and drink increases nourishment and hydration Landing on the tree, the explorer is now carrying wood with which it can repair its shelter

Rules of the gridworld Resting in a repaired shelter increases health Landing on the lion decreases health With time, nourishment, hydration, health and shelter condition all gradually decrease

Flat Reinforcement learning SARSA with eligibility traces was used To get Flat Reinforcement Learning working at all I needed to simplify the task a bit 6x6 gridworld Nourishment, Hydration, Health and Shelter Condition minimised to 4 discrete levels each Total states: 6 x 6 x 4 x 4 x 4 x 4 x 2 = 18432 Managable

Results

Still to do Finish implementing Feudal Reinforcement Learning Implement Phil’s interpretation of Feudal Reinforcement Learning Implement MaxQ hierarchical reinforcement learning And perhaps others… Compare them

Questions ?