Download presentation
Presentation is loading. Please wait.
1
Persistent Autonomous FlightNicholas Lawrance Reinforcement Learning for Soaring CDMRG – 24 May 2010 Nick Lawrance
2
Persistent Autonomous FlightNicholas Lawrance Reinforcement Learning for Soaring What I want to do Have a good understanding of the dynamics involved in aerodynamic soaring in known conditions but: 1.Dynamic soaring requires energy loss actions for net energy gain cycles which can be difficult using traditional control or path generation methods 2.Wind is difficult to predict; guidance and nav must be done on-line whilst simultaneously maintaining reasonable energy levels and safety requirements 3.Classic exploration-exploitation problem with the added catch that exploration requires energy gained through exploitation
3
Persistent Autonomous FlightNicholas Lawrance
4
Persistent Autonomous FlightNicholas Lawrance Reinforcement Learning for Soaring Why reinforcement learning Previous work focused on understanding soaring and examining alternatives for generating energy-gain paths. Always have the issue of balancing exploration and exploitation, my code ended up being long sequences of heuristic rules Reinforcement learning could provide the link from known good paths towards optimal paths
5
Persistent Autonomous FlightNicholas Lawrance Monte Carlo, TD, Sarsa & Q-learning Monte Carlo – Learn an average reward for actions taken during series of episodes Temporal Difference – Simultaneously estimate expected reward and value function Sarsa – using TD for on-policy control Q-learning – off-policy TD control
6
Persistent Autonomous FlightNicholas Lawrance Figure 6.13: The cliff-walking task. Off-policy Q-learning learns the optimal policy, along the edge of the cliff, but then keeps falling off because of the -greedy action selection. On-policy Sarsa learns a safer policy taking into account the action selection method. These data are from a single run, but smoothed.
7
Persistent Autonomous FlightNicholas Lawrance Eligibility Traces TD(0) is effectively one-step backup of V π (reward only counts to previous action) Eligibility traces extend this to reward the sequence of actions that lead to the current reward.
8
Persistent Autonomous FlightNicholas Lawrance Sarsa(λ) Initialize Q(s,a) arbitrarily and e(s,a) = 0, for all s, a Repeat (for each episode): Initialize s, a Repeat (for each step of episode): Take action a, observe r, s’ Choose a’ from s’ using policy derived from Q (ε-greedy) For all s,a: until s is terminal
9
Persistent Autonomous FlightNicholas Lawrance Sarsa(λ)
10
Persistent Autonomous FlightNicholas Lawrance Simplest soaring attempt Square grid, simple motion, energy sinks and sources Movement cost, turn cost, edge cost
11
Persistent Autonomous FlightNicholas Lawrance Simulation - Static
12
Persistent Autonomous FlightNicholas Lawrance
13
Persistent Autonomous FlightNicholas Lawrance Hex grid, dynamic soaring Energy based simulation Drag movement cost, turn cost Constant speed No wind motion (due to limited states)
14
Persistent Autonomous FlightNicholas Lawrance Hex grid, dynamic soaring
15
Persistent Autonomous FlightNicholas Lawrance
16
Persistent Autonomous FlightNicholas Lawrance
17
Persistent Autonomous FlightNicholas Lawrance
18
Persistent Autonomous FlightNicholas Lawrance
19
Persistent Autonomous FlightNicholas Lawrance Next Reinforcement learning has advantages to offer our group, but our contribution should probably be focused in well defined areas For most of our problems, the state spaces are very large and usually continuous; we need estimation methods We usually have a good understanding of at least some aspects of the problem; how can/should we use this information to give better solutions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.