Using node mobility control to enhance network performance Research Goal Using node mobility control to enhance network performance UAV2 Ferrying Relay Direct UAV1 UAV3 GS1 Combination of all three modes can optimize performance GS2
Direct Communication Shannon capacity law Signal strength Thermal noise (normalized) Data rate
However: Disc ≠ Reality Main Question What is the best use of relays? Trivial for “disc” communication What is the best use of relays … in a realistic wireless environment? A B However: Disc ≠ Reality
Problem Characteristics Focused on a single link (distance d) Task vs. Helper nodes Packet based (length L) Applies to Ad hoc networks (how many intermediate hops/relays) Sensor networks, Smart Dust (potentially many relays) Underwater and under-ground networks (high pathloss) Outdoor, Space-based, Ship-based networks (long ranges) d S D
Realistic Radio Environment Part I Shannon Capacity Relates distance and data rate. W = channel BW a = radio parameters e = pathloss exponent ≈ reality Throughput vs. Range Shannon Capacity 802.11g Throughput Disc Distance
Estimating SNR and R(d) Rappaport exponential/bi-linear Model for realistic scenario estimation
Direct transmission (zero relays) Multiple Relays d S dk D Direct transmission (zero relays) End-to-end data rate: RR Packet delay: τ = L/RR Relay transmission
“Single Tx” Relay Model a.k.a., the noise-limited case t=0 S D dk
“Parallel Tx” Relay Model a.k.a., the interference limited case > Optimal distance between transmissions? t=0 t=0 t=0 S ρ D
Performance Analysis 2km 4km 8km 16km ε = 5 PN/a = 10-15 W
Optimal Number of relays Single TX Throughput vs. # of relays Initially: rate increase higher than ‘relaying cost’ (put graph R over #relays here) Then: additional relay decreases R Optimal # of relays: one relay every dopt with
Optimal Number of Relays Parallel TX kopt = ∞ kopt ~ 9
d=10km PN/a = 4.14·10-15 (based on 802.11) Optimal Reuse Factor ρ ρopt k+1 ε 2 4 8 16 32 64 128 256 ∞ 6 3 5 d=10km PN/a = 4.14·10-15 (based on 802.11) ρopt ≈ min{k+1, 5}
Rate-Distance Phase Plot
Ferrying Analogy: Dial-A-Ride Dial-A-Ride: curb-to-curb, shared ride transportation service request 1 request 2 hospital request 3 The Bus request 4 school Receives calls Picks up and drops off passengers Transport people quickly ! request 5 Path Planning Problem Motivation Problems with TSP Solutions Queueing Theoretical Approach Formulation as MDP and Solution Methods depot Optimal route not trivial !
One cycle visits every node TSP’s Problem Traveling Salesman Solution A B UAV New: cycle defined by visit frequencies pi pA pB hub One cycle visits every node Problem: far-away nodes with little data to send Visit them less often dA dB fA fB B TSP has no option but alternately visiting A and B.
Minimize average delay Stochastic Modeling Goal Minimize average delay Idea: express delay in terms of pi, then minimize over set {pi} pi as probability distribution Expected service time of any packet Inter-service time: exponential distribution with mean T/pi Weighted delay: A B UAV fB fA pA pB dA dB pC * Since the inter-service time is exp. distributed and we are picking up ALL waiting packets when visiting a node, the average delay for a node is the mean of the exponential distribution. * f_i/F is fractional visit probability for node i. C hub pD dC dD D fC fD
Solution and Algorithm Probability of choosing node i for next visit: Implementation: deterministic algorithm 1. Set ci = 0 2. ci = ci + pi while max{ci} < 1 3. k = argmax {ci} 4. Visit node k; ck = ck-1 5. Go to 2. Improvement over TSP! Pretty simplistic view of the world ! Random selection ignores many parameters.
Reinforcement Learning Reinforcement Learning (AI technique) Learning what to do without prior training Given: high-level goal; NOT: how to reach it Improving actions on the go Features: Interaction with environment Concept of Rewards & Punishments Trial & Error Search Example: riding a bike
Reinforcement Learning Approach Training Info = evaluations (“rewards” / “penalties”) Inputs (“states”) RL System Outputs (“actions”) Objective: maximize reward over time
The Framework Agent Environment Goal: The Beauty: Performs Actions Gives Rewards Puts Agent in situations called States Goal: Learn what to do in a given state (Policy) The Beauty: Learns model of environment and retains it.
Markov Decision Process Series of States/Actions: t . . . s a r t +1 t +2 t +3 Markov Property: reward and next state depend only on the current state and action, and not on the history of states or actions.
Measure of Goodness Policy: Mapping from set of States to set of Actions Sum of Rewards (:=return): from this time onwards Value function (of a state): Expected return when starting in s and following policy π. For an MDP: Solution methods Dynamic Programming (small, easy problems) Monte Carlo simulation Temporal Difference learning (powerful, but slow)
MDP If a reinforcement learning task has the Markov Property, it is basically a Markov Decision Process (MDP). If state and action sets are finite, it is a finite MDP. To define a finite MDP, you need to give: state and action sets one-step “dynamics” defined by transition probabilities: reward expectation:
RL approach to solving MDPs Policy: Mapping from set of States to set of Actions π : S → A Sum of Rewards (:=return): from this time onwards Value function (of a state): Expected return when starting with s and following policy π. For an MDP,
Bellman Equation for Policy π Evaluating E{.}; assuming deterministic policy; π solution: Action-Value Function: Value of taking action a in state s. For an MDP,
Optimality V is partially ordered since real valued. π also ordered: Concept of V*: We approximate the optimal action
Temporal Difference Learning TD-V method tries to approximate V value function TD Methods update the Value functions (V) at each iteration, eventually reaching their optimal form (V*). A TD agent learns by reducing discrepancies in the estimates made by the agent at different times.
UA Path Planning - Complex B λA λB F H D C λD λC State: tuple of accumulated node traffic, here Actions: round trip through subset of nodes, e.g., A, B, C, D, AB, AC,…DCBA state: current ferry location, flow rates, accumulated traffic at each node, ferry buffer level Rew: #packets delivered, kept buffer level low, packet queuing delay
Reward Criterion Reward:
Ferry Path Planning Results TSP = Traveling Salesman solution RL = Reinforcement Learning RR = Round Robin (naive) STO = Stochastic Modeling