Towards Autonomous Data Ferry Route Design in Delay Tolerant Networks

Towards Autonomous Data Ferry Route Design in Delay Tolerant Networks
Daniel Henkel, Timothy X Brown University of Boulder WoWMoM/AOC ‘08 June 23, 2008 Actual title: Route Design for UAV-based Data Ferries in Delay Tolerant Wireless Networks TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAA

Familiar: Dial-A-Ride
Dial-A-Ride: curb-to-curb, shared ride transportation service request 1 request 2 hospital request 3 The Bus request 4 school Receives calls Picks up and drops off passengers Transport people quickly ! request 5 Path Planning Problem Motivation Problems with TSP Solutions Queueing Theoretical Approach Formulation as MDP and Solution Methods depot Optimal route not trivial !

In context: Dial-A-UAV
Complication: infinite data at sensors; potentially two-way traffic Delay tolerant traffic! Sensor-1 Sensor-3 Sensor-5 Monitoring Station Sensor-2 Sensor-6 Sensor-4 Have sensors all on left side Highlight ferrying to the right side SMS locations Sparsely distributed sensors, limited radios TSP solution not optimal Our approach: Queueing and MDP theory

TSP’s Problem Traveling Salesman Solution One cycle visits every node
UAV New: cycle defined by visit frequencies pi pA pB hub One cycle visits every node Problem: far-away nodes with little data to send Visit them less often dA dB fA fB B TSP has no option but alternately visiting A and B.

Minimize average delay
Queueing Approach Goal Minimize average delay Idea: express delay in terms of pi, then minimize over set {pi} pi as probability distribution Expected service time of any packet Inter-service time: exponential distribution with mean T/pi Weighted delay: A B UAV fB fA pA pB dA dB pC * Since the inter-service time is exp. distributed and we are picking up ALL waiting packets when visiting a node, the average delay for a node is the mean of the exponential distribution. * f_i/F is fractional visit probability for node i. C hub pD dC dD D fC fD

Solution and Algorithm
Probability of choosing node i for next visit: Implementation: deterministic algorithm 1. Set ci = 0 2. ci = ci + pi while max{ci} < 1 3. k = argmax {ci} 4. Visit node k; ck = ck-1 5. Go to 2. Improvement over TSP! Pretty simplistic view of the world ! Random selection ignores many parameters.

There’s More to It! New perspective: States
# people waiting at location Varying # of calls (daytime) Current bus location Actions Drive to a location Goal Short passenger wait time request 1 request 2 request 3 request 4 request 5 depot  Generally unknown environment

Promising Technique Reinforcement Learning (AI technique) Features:
Learning what to do without prior training Given: high-level goal; NOT: how to reach it Improving actions on the go Features: Interaction with environment Concept of Rewards & Punishments Trial & Error Search Example: riding a bike

The Framework Agent Environment Goal: Performs Actions Gives Rewards
Puts Agent in situations called States Goal: Learn what to do in a given state (Policy) The Beauty: Learns model of environment and retains it.

Markov Decision Process
Series of States/Actions: t . . . s a r t +1 t +2 t +3 Markov Property: reward and next state depend only on the current state and action, and not on the history of states or actions.

MDP Terms Policy: Mapping from set of States to set of Actions
Sum of Rewards (:=return): from this time onwards Value function (of a state): Expected return when starting in s and following policy π. For an MDP: Solution methods Dynamic Programming, Monte Carlo simulation Temporal Difference learning

UAV Path Planning State: tuple of accumulated node traffic, here
B λA λB F H D C λD λC State: tuple of accumulated node traffic, here Actions: round trip through subset of nodes, e.g., A, B, C, D, AB, AC,…DCBA state: current ferry location, flow rates, accumulated traffic at each node, ferry buffer level Rew: #packets delivered, kept buffer level low, packet queuing delay

Reward Criterion Reward:

Temporal Difference Learning
Recursive state value approximation Convergence to “true value” as Extract policy from value function

Simulation Results RR = Round Robin (naive)
TSP = Traveling Salesman solution RL = Reinforcement Learning RR = Round Robin (naive) STO = Stochastic Modeling

Conclusion/Extensions
Shown two algorithms to route UAVs RL viable approach Extensions: Structured state space Action space (options theory) Hierarchical structure / peer-to-peer flows Interrupt current action and start over Adapt and optimize learning method

Germany – Turkey Soccer - Euro Cup 2008 [ 4 : 1 ]
Wednesday, 11:45am (PST) Germany – Turkey [ 4 : 1 ]

Research & Engineering Center for Unmanned Vehicles (RECUV)
Questions Research and Engineering Center for Unmanned Vehicles University of Colorado at Boulder The Research and Engineering Center for Unmanned Vehicles at the University of Colorado at Boulder is a university, government, and industry partnership dedicated to advancing knowledge and capabilities in using unmanned vehicles for scientific experiments, collecting geospatial data, mitigation of natural and man-made disasters, and defense against terrorist and hostile military activities.

Towards Autonomous Data Ferry Route Design in Delay Tolerant Networks

Similar presentations

Presentation on theme: "Towards Autonomous Data Ferry Route Design in Delay Tolerant Networks"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Towards Autonomous Data Ferry Route Design in Delay Tolerant Networks

Similar presentations

Presentation on theme: "Towards Autonomous Data Ferry Route Design in Delay Tolerant Networks"— Presentation transcript:

Similar presentations

About project

Feedback