Presentation is loading. Please wait.

Presentation is loading. Please wait.

STANFORD Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion J. Zico Kolter, Pieter Abbeel, Andrew Y. Ng Goal Initial Position.

Similar presentations


Presentation on theme: "STANFORD Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion J. Zico Kolter, Pieter Abbeel, Andrew Y. Ng Goal Initial Position."— Presentation transcript:

1 STANFORD Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion J. Zico Kolter, Pieter Abbeel, Andrew Y. Ng Goal Initial Position 1. Motivating Application Planning footsteps for a quadruped robot over challenging, irregular, previously unseen terrain Good footsteps need to properly trade off several features: slope, proximity to drop-offs, stability of robot’s pose, etc. Highly non-trivial to hand-specify the reward function for a planner, which requires manually determining relative weights for all features 2. Apprenticeship Learning Background Key idea of Apprenticeship Learning: often easier to demonstrate good behavior than to specify a reward that induces this behavior Two factors make Apprenticeship Learning hard to apply to large, complex problems such as quadruped planning: 1.Very difficult, even for a domain expert, to specify a good complete path (e.g., a full set of footsteps across terrain) 2.Even given a reward function, planning (e.g. finding a complete set of a footsteps) is a hard, high-dimensional, task 5. Experimental Results Multi-room Grid World S G 10x10 rooms connected by doors, where each room is a 10x10 grid world High level demonstration shows only room-to-room path (using true reward function) Low-level demonstration shows only local greedy action at grid level 6. Related Work Apprenticeship Learning: Abbeel and Ng (2004), Ratliff et. al (2006, 2007), Neu and Szepesvari (2007), Syed and Schapire (2007) Hierarchical Reinforcement Learning: Parr and Russell (1998), Sutton et. al (1999), Dietterich (2000), Barto and Mahadevan (2003) 7. Conclusion Presented a novel algorithm for applying apprenticeship learning to large, complex domains via hierarchical decomposition Demonstrated algorithm on multi-room grid world and challenging quadruped task, where we achieve state-of the-art performance More generally, algorithm is applicable whenever reward function can be hierarchically decomposed as described above Quadruped Robot Evaluated algorithm on easier terrain for training, and harder terrain for testing On training terrain, demonstrated a single high-level body path and 20 greedy low-level foot placements (~10 minutes to gather all data) System achieves state-of-the-art performance on this task 3. Hierarchical Apprenticeship Learning: Main Idea Step 1: High level Plan path for center of robot body Step 2: Low level Plan footsteps along body path GoalInitial Position 2)Demonstrate good behavior at each level separately1)Decompose planning task into multiple levels of abstraction Easier to specify a path in the reduced, abstract state space than in the full state space Goal Initial Position Goal Footstep specified by teacher Current foot positions Easier to demonstrate greedy actions than long- term optimal actions 4. Convex Formulation Two assumptions on the reward function 1.Reward is linear in state features 2.High level rewards are averages of low level rewards High-level demonstrations imply constraints on value function Low-level demonstrations imply constraints on reward function Can combine high and low-level constraints (plus adding slack variables) to form a single, unified, convex optimization problem TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA STANFORD Planned Footsteps Training TerrainTesting Terrain No Planning Hierarchical Apprenticeship Learning High Level (Body Path) Constraints Only Low Level (Footstep) Constraints Only High level demonstration: Demonstrate body path across terrain Low level demonstration: Greedy local footsteps at a few key locations


Download ppt "STANFORD Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion J. Zico Kolter, Pieter Abbeel, Andrew Y. Ng Goal Initial Position."

Similar presentations


Ads by Google