Towards Autonomous Data Ferry Route Design in Delay Tolerant Networks Daniel Henkel, Timothy X Brown University of Colorado @ Boulder WoWMoM/AOC ‘08 June 23, 2008 Actual title: Route Design for UAV-based Data Ferries in Delay Tolerant Wireless Networks TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAA
Familiar: Dial-A-Ride Dial-A-Ride: curb-to-curb, shared ride transportation service request 1 request 2 hospital request 3 The Bus request 4 school Receives calls Picks up and drops off passengers Transport people quickly ! request 5 Path Planning Problem Motivation Problems with TSP Solutions Queueing Theoretical Approach Formulation as MDP and Solution Methods depot Optimal route not trivial !
In context: Dial-A-UAV Complication: infinite data at sensors; potentially two-way traffic Delay tolerant traffic! Sensor-1 Sensor-3 Sensor-5 Monitoring Station Sensor-2 Sensor-6 Sensor-4 Have sensors all on left side Highlight ferrying to the right side SMS locations Sparsely distributed sensors, limited radios TSP solution not optimal Our approach: Queueing and MDP theory
TSP’s Problem Traveling Salesman Solution One cycle visits every node UAV New: cycle defined by visit frequencies pi pA pB hub One cycle visits every node Problem: far-away nodes with little data to send Visit them less often dA dB fA fB B TSP has no option but alternately visiting A and B.
Minimize average delay Queueing Approach Goal Minimize average delay Idea: express delay in terms of pi, then minimize over set {pi} pi as probability distribution Expected service time of any packet Inter-service time: exponential distribution with mean T/pi Weighted delay: A B UAV fB fA pA pB dA dB pC * Since the inter-service time is exp. distributed and we are picking up ALL waiting packets when visiting a node, the average delay for a node is the mean of the exponential distribution. * f_i/F is fractional visit probability for node i. C hub pD dC dD D fC fD
Solution and Algorithm Probability of choosing node i for next visit: Implementation: deterministic algorithm 1. Set ci = 0 2. ci = ci + pi while max{ci} < 1 3. k = argmax {ci} 4. Visit node k; ck = ck-1 5. Go to 2. Improvement over TSP! Pretty simplistic view of the world ! Random selection ignores many parameters.
There’s More to It! New perspective: States # people waiting at location Varying # of calls (daytime) Current bus location Actions Drive to a location Goal Short passenger wait time request 1 request 2 request 3 request 4 request 5 depot Generally unknown environment
Promising Technique Reinforcement Learning (AI technique) Features: Learning what to do without prior training Given: high-level goal; NOT: how to reach it Improving actions on the go Features: Interaction with environment Concept of Rewards & Punishments Trial & Error Search Example: riding a bike
The Framework Agent Environment Goal: Performs Actions Gives Rewards Puts Agent in situations called States Goal: Learn what to do in a given state (Policy) The Beauty: Learns model of environment and retains it.
Markov Decision Process Series of States/Actions: t . . . s a r t +1 t +2 t +3 Markov Property: reward and next state depend only on the current state and action, and not on the history of states or actions.
MDP Terms Policy: Mapping from set of States to set of Actions Sum of Rewards (:=return): from this time onwards Value function (of a state): Expected return when starting in s and following policy π. For an MDP: Solution methods Dynamic Programming, Monte Carlo simulation Temporal Difference learning
UAV Path Planning State: tuple of accumulated node traffic, here B λA λB F H D C λD λC State: tuple of accumulated node traffic, here Actions: round trip through subset of nodes, e.g., A, B, C, D, AB, AC,…DCBA state: current ferry location, flow rates, accumulated traffic at each node, ferry buffer level Rew: #packets delivered, kept buffer level low, packet queuing delay
Reward Criterion Reward:
Temporal Difference Learning Recursive state value approximation Convergence to “true value” as Extract policy from value function
Paths
Simulation Results RR = Round Robin (naive) TSP = Traveling Salesman solution RL = Reinforcement Learning RR = Round Robin (naive) STO = Stochastic Modeling
Conclusion/Extensions Shown two algorithms to route UAVs RL viable approach Extensions: Structured state space Action space (options theory) Hierarchical structure / peer-to-peer flows Interrupt current action and start over Adapt and optimize learning method
Germany – Turkey Soccer - Euro Cup 2008 [ 4 : 1 ] Wednesday, 11:45am (PST) Germany – Turkey [ 4 : 1 ]
Research & Engineering Center for Unmanned Vehicles (RECUV) Questions Research and Engineering Center for Unmanned Vehicles University of Colorado at Boulder http://recuv.colorado.edu The Research and Engineering Center for Unmanned Vehicles at the University of Colorado at Boulder is a university, government, and industry partnership dedicated to advancing knowledge and capabilities in using unmanned vehicles for scientific experiments, collecting geospatial data, mitigation of natural and man-made disasters, and defense against terrorist and hostile military activities.