Download presentation
Presentation is loading. Please wait.
Published byThomasina Melton Modified over 9 years ago
1
Option and Constraint Generation using Work Domain Analysis Presenter: Guliz Tokadli Dr. Karen Feigh
2
Introduction Interactive Machine Learning Reinforcement Learning Work Domain Analysis Micro-world of Pac-Man
3
Introduction Goal: To develop a method to exploit experienced humans’ knowledge to improve algorithms for machine learning using work domain analysis techniques from cognitive engineering Current practice: Using programmer derived or auto- derived primitives, options and constraints, or inferring these from human coaching or demonstration How is our approach new: Provides a systematic way to mine a human’s knowledge about a domain and to translate it to a hierarchical goal structure How will we know we’ve succeeded: We will be able to generate better machine learning algorithms than current practice in a shorter period of time with less expert intervention 3
4
Interactive Machine Learning Our Objective: Robots and other machines that can learn from people unfamiliar with machine learning algorithms. 4 From:To: Reinforcement Learning is a method to generate policies for agent tasked with making decisions. Learning from action policy: state action It maximizes the reward
5
Reinforcement Learning – Option & Constraint (5) A. J. Irani, J. A. Rosalia, K. Subramanian, C. L. I. Jr., and A. L. Thomaz, “Using both positive and negative instructions from humans to decompose reinforcement learning problems,” Georgia Institute of Technology, Tech. Rep., 2014. (6) R.S.Sutton and A.G.Barto, Reinforcement Learning: An Introduction. The MIT Press, 2012. 5 GOAL OPTIONSCONSTRAINTS PRIMITIVES GOAL is the ultimate purpose to gain. CONSTRAINTS are defined as the set of all state action pairs that the agent should not do. -Don’t do X. -Don’t move onto unscared ghost. PRIMITIVES are the set of fundamental actions the agent can effect such as Up, Right, Left, Down OPTIONS are used for generalization of primitive actions to include temporally extended courses of action. An option O:.
6
5-level functional decomposition used for modeling complex sociotechnical systems. System is described at different levels of abstraction with “Why How” re lationship. (1)N. Naikar, R. Hopcroft, and A. Moylan, “Work domain analysis : Theoretical concepts and methodology,” pp. 5–35, 2005. (2)N. Naikar, Work Domain Analysis: Concepts, Guidelines, and Cases. CRC Press, 2013. (3)J. Rasmussen, Information Processing and Human-Machine Interaction: An Approach to Cognitive Engineering. Elsevier Science, 1986. (4) N.A. Stanton, P.M. Salmon, G.H. Walker, C. Baber and D.P. Jenkins, Human Factors Methods, Ch. 4 Cognitive Task Analysis Methods, 2005 Work Domain Analysis Method: Abstraction Hierarchy Levels of AHMeaning of Levels for Pac-ManExamples Functional Purposes (FP) The goals of Pac-Man.Staying Alive Abstract Functions (AF) What criteria is required to judge whether Pac-Man is achieving the purposes. Avoid Ghosts Generalized Functions (GF) What functions are required to accomplish Pac-Man’s abstract functions. Maneuver Pac-Man Physical Functions (PF) The limitation and capabilities of the system, Distance to Nearest Ghost Physical Objects (PO) The objects and characters on the maze. Ghosts, Pac-Man, Walls 6 How What How Why What How Why What Why What How Why What 6
7
Micro-World of Pac-Man Classic arcade game, Is invented in 1980 Why we chose Pac-Man: Helps to understand the research problem Helps to build AH Map linking the goals Provide insight into RL policies Used for many ML and RL studies Allow the comparison of the results Find what is needed quickly and conveniently. 7
8
Results of Pac-Man Study Familiarization & Interview Phase Modeling Phase Option & Constraint Set Generation Phase
9
Pac-Man Study - Method Familiarization Phase: 16 participants played Pac- Man for 10 minutes Interview Phase: researchers interviewed the participants using a structured interview script designed to generate abstraction hierarchies Modeling Phase: researchers created abstraction hierarchies for each player and composite abstraction hierarchies for high and low performing players Option & Constraint Set Generation Phase: researchers translated composite abstraction hierarchies into sets of options and constraints 9
10
Participant Performance High Performers Low Performers 10 Familiarization and Interview PhaseModelling PhaseOption & Constraint Set Generation Phase
11
Modeling Phase - Procedure 1.Creation of individual AH per player 2.Harmonization of statements in AHs 3.Combination of AHs of high and low performers Performance-based AH 11 Familiarization and Interview PhaseModelling PhaseOption & Constraint Set Generation Phase
12
WDA: Performance-Based AH Aggregated AH: Low and High Performance AHs are represented as single AH. 12 Familiarization and Interview PhaseModelling PhaseOption & Constraint Set Generation Phase
13
WDA: Performance-Based AH High PerformersLow Performers Player perspective is as ‘Competitor’ – Maximum score and minimum time Player perspective is as ‘Exploratory spirit’ – Learning Tricks - Having both defensive and offensive actions Having dual level of protection for Pac-Man as proactive and reactive strategy Having only defensive actions ‘Clearing current quadrant’ as action is using as tactic ‘Clearing current quadrant’ is using as strategy Quick observation to act in minimum time interval Observation on quadrant is being used for ‘Learning Tricks’ 13 Familiarization and Interview PhaseModelling PhaseOption & Constraint Set Generation Phase
14
Generation of Option and Constraint Sets 14 High Performer- Defined OC Set Low Performer- Defined OC Set Familiarization and Interview PhaseModelling PhaseOption & Constraint Set Generation Phase
15
High Performer-Defined OC Set Creation 15 Familiarization and Interview PhaseModelling PhaseOption & Constraint Set Generation Phase
16
Low Performer-Defined OC Set Creation 16 Familiarization and Interview PhaseModelling PhaseOption & Constraint Set Generation Phase
17
Conclusion
18
18 Low Performers High Performers Our Goal is to develop a method to improve algorithms for machine learning based on human experience and knowledge using Work Domain Analysis. Now we are able to generate option and constraint sets for different levels of performance. Implemented option and constraint sets separately on RL algorithms and evaluation of the performers – able to show the differences between performers Will work to auto-generate option and constraint sets using state-of- the-art methods
19
Thank you! Questions and Comments? 19
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.