Download presentation
Presentation is loading. Please wait.
Published byGary Floyd Modified over 9 years ago
1
Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning Minuru Asada,Shoichi Noda, Sukoya Tawarasudia, Koh Hosoda Presented by: Subarna Sadhukhan
2
Reinforced learning Vision based reinforced learning by which a robot learns to shoot a ball into a goal. Develop a method which automatically acquires strategies for this. The robot and its environment are modeled by two synchronized finite state automatons interacting in discrete time cyclical processes. Robot : senses current state and selects an action Environment makes decision to transition to a new state and generates reward back to the robot Robot learns through purposive behavior to achieve a given goal
3
Environment – Ball, Goal Robot- Mobile and has a camera Nothing about the system is known Assume robot can discriminate the set S of states and take A actions on the world
4
Q-learning Let Q*(s,a) be the expected return for taking action a in situation s. Where T(s,a,s’) be probability of transition from s to s’, r(s,a) is the reward for state-action pair s-a γ is discounting factor Since T and r are not known we can write Where r is the actual reward for taking a. s’ is the next state and α is the learning rate
5
State Set 9*27+27+9 states (3*3 of ball*3*3*3 of goal+no goal+no ball)
6
Action set Two motors Each motor – forward, stop, back 9 actions in all. State-action deviation problem- Small change near observer results in large change in image, large change far from observer small change in image
7
Learning from Early Missions Delayed reinforcement problem due to no explicit teacher signal, since reward received only after ball is kicked to the goal. r(s,a) = 1 only in goal state Construct the learning schedule so that robot can learn in easy situations at early stages and later on learn in more difficult situations – Learning from Easy missions
8
Complexity analysis K states, m possible actions Q-learning for first, for second hence LEM m*k : Get reward at each step
9
Implementing LEM Rough ordering of easy situations Small -> medium -> large (sizes of ball roughly means reaching the goal) State space is categorized into sub-states such as ball size, position and so on. n = size of state space, m = number of ordered sets Apply LEM with m ordered states takes As opposed to
10
When to shift S1 is nearest to goal, next is S2 and so on. Shifting occurs when Where Δ t indicates a time interval for number of steps to change. We suppose that the current state set S(k-1) can transit only to its neighbors
11
From previous Q-learning equation if Q converges Thus
12
LEM
13
Experiments
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.