Persistent Autonomous FlightNicholas Lawrance Reinforcement Learning for Soaring CDMRG – 24 May 2010 Nick Lawrance.

Persistent Autonomous FlightNicholas Lawrance Reinforcement Learning for Soaring CDMRG – 24 May 2010 Nick Lawrance

Persistent Autonomous FlightNicholas Lawrance Reinforcement Learning for Soaring What I want to do Have a good understanding of the dynamics involved in aerodynamic soaring in known conditions but: 1.Dynamic soaring requires energy loss actions for net energy gain cycles which can be difficult using traditional control or path generation methods 2.Wind is difficult to predict; guidance and nav must be done on-line whilst simultaneously maintaining reasonable energy levels and safety requirements 3.Classic exploration-exploitation problem with the added catch that exploration requires energy gained through exploitation

Persistent Autonomous FlightNicholas Lawrance

Persistent Autonomous FlightNicholas Lawrance Reinforcement Learning for Soaring Why reinforcement learning Previous work focused on understanding soaring and examining alternatives for generating energy-gain paths. Always have the issue of balancing exploration and exploitation, my code ended up being long sequences of heuristic rules Reinforcement learning could provide the link from known good paths towards optimal paths

Persistent Autonomous FlightNicholas Lawrance Monte Carlo, TD, Sarsa & Q-learning Monte Carlo – Learn an average reward for actions taken during series of episodes Temporal Difference – Simultaneously estimate expected reward and value function Sarsa – using TD for on-policy control Q-learning – off-policy TD control

Persistent Autonomous FlightNicholas Lawrance Figure 6.13: The cliff-walking task. Off-policy Q-learning learns the optimal policy, along the edge of the cliff, but then keeps falling off because of the -greedy action selection. On-policy Sarsa learns a safer policy taking into account the action selection method. These data are from a single run, but smoothed.

Persistent Autonomous FlightNicholas Lawrance Eligibility Traces TD(0) is effectively one-step backup of V π (reward only counts to previous action) Eligibility traces extend this to reward the sequence of actions that lead to the current reward.

Persistent Autonomous FlightNicholas Lawrance Sarsa(λ) Initialize Q(s,a) arbitrarily and e(s,a) = 0, for all s, a Repeat (for each episode): Initialize s, a Repeat (for each step of episode): Take action a, observe r, s’ Choose a’ from s’ using policy derived from Q (ε-greedy) For all s,a: until s is terminal

Persistent Autonomous FlightNicholas Lawrance Sarsa(λ)

Persistent Autonomous FlightNicholas Lawrance Simplest soaring attempt Square grid, simple motion, energy sinks and sources Movement cost, turn cost, edge cost

Persistent Autonomous FlightNicholas Lawrance Simulation - Static

Persistent Autonomous FlightNicholas Lawrance Hex grid, dynamic soaring Energy based simulation Drag movement cost, turn cost Constant speed No wind motion (due to limited states)

Persistent Autonomous FlightNicholas Lawrance Hex grid, dynamic soaring

Persistent Autonomous FlightNicholas Lawrance Next Reinforcement learning has advantages to offer our group, but our contribution should probably be focused in well defined areas For most of our problems, the state spaces are very large and usually continuous; we need estimation methods We usually have a good understanding of at least some aspects of the problem; how can/should we use this information to give better solutions?

Persistent Autonomous FlightNicholas Lawrance Reinforcement Learning for Soaring CDMRG – 24 May 2010 Nick Lawrance.

Similar presentations

Presentation on theme: "Persistent Autonomous FlightNicholas Lawrance Reinforcement Learning for Soaring CDMRG – 24 May 2010 Nick Lawrance."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Persistent Autonomous FlightNicholas Lawrance Reinforcement Learning for Soaring CDMRG – 24 May 2010 Nick Lawrance.

Similar presentations

Presentation on theme: "Persistent Autonomous FlightNicholas Lawrance Reinforcement Learning for Soaring CDMRG – 24 May 2010 Nick Lawrance."— Presentation transcript:

Similar presentations

About project

Feedback