Using node mobility control to enhance network performance

Slides:

Advertisements

Similar presentations

Mobility Increase the Capacity of Ad-hoc Wireless Network Matthias Gossglauser / David Tse Infocom 2001.

Advertisements

Reinforcement Learning

Markov Decision Process

Partially Observable Markov Decision Process (POMDP)

Questions?. Setting a reward function, with and without subgoals Difference between agent and environment AI for games, Roomba Markov Property – Broken.

DYNAMIC POWER ALLOCATION AND ROUTING FOR TIME-VARYING WIRELESS NETWORKS Michael J. Neely, Eytan Modiano and Charles E.Rohrs Presented by Ruogu Li Department.

1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.

COSC 878 Seminar on Large Scale Statistical Machine Learning 1.

Markov Decision Processes

Planning under Uncertainty

The Capacity of Wireless Ad Hoc Networks

Three heuristics for transmission scheduling in sensor networks with multiple mobile sinks Damla Turgut and Lotzi Bölöni University of Central Florida.

INSTITUTO DE SISTEMAS E ROBÓTICA Minimax Value Iteration Applied to Robotic Soccer Gonçalo Neto Institute for Systems and Robotics Instituto Superior Técnico.

Application of Reinforcement Learning in Network Routing By Chaopin Zhu Chaopin Zhu.

Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)

1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.

More RL. MDPs defined A Markov decision process (MDP), M, is a model of a stochastic, dynamic, controllable, rewarding process given by: M = 〈 S, A,T,R.

A Reinforcement Learning Approach for Product Delivery by Multiple Vehicles Scott Proper Oregon State University Prasad Tadepalli Hong TangRasaratnam Logendran.

1 Algorithms for Bandwidth Efficient Multicast Routing in Multi-channel Multi-radio Wireless Mesh Networks Hoang Lan Nguyen and Uyen Trang Nguyen Presenter:

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

A Simple and Effective Cross Layer Networking System for Mobile Ad Hoc Networks Wing Ho Yuen, Heung-no Lee and Timothy Andersen.

REINFORCEMENT LEARNING LEARNING TO PERFORM BEST ACTIONS BY REWARDS Tayfun Gürel.

1 Performance Analysis of Coexisting Secondary Users in Heterogeneous Cognitive Radio Network Xiaohua Li Dept. of Electrical & Computer Engineering State.

June 21, 2007 Minimum Interference Channel Assignment in Multi-Radio Wireless Mesh Networks Anand Prabhu Subramanian, Himanshu Gupta.

The Case for Addressing the Limiting Impact of Interference on Wireless Scheduling Xin Che, Xi Ju, Hongwei Zhang {chexin, xiju,

NETE4631:Capacity Planning (2)- Lecture 10 Suronapee Phoomvuthisarn, Ph.D. /

1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.

Reinforcement Learning 主講人：虞台文 Content Introduction Main Elements Markov Decision Process (MDP) Value Functions.

Reinforcement Learning 主講人：虞台文大同大學資工所智慧型多媒體研究室.

Distributed Q Learning Lars Blackmore and Steve Block.

Reinforcement Learning Dynamic Programming I Subramanian Ramamoorthy School of Informatics 31 January, 2012.

1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 21: Dynamic Multi-Criteria RL problems Dr. Itamar Arel College of Engineering Department.

Biologically Inspired Computation Ant Colony Optimisation.

COMP 2208 Dr. Long Tran-Thanh University of Southampton Reinforcement Learning.

Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.

CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.

1 Passive Reinforcement Learning Ruti Glick Bar-Ilan university.

Towards Autonomous Data Ferry Route Design in Delay Tolerant Networks

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Reinforcement learning (Chapter 21)

Reinforcement Learning

Markov Decision Processes

Switching Techniques In large networks there might be multiple paths linking sender and receiver. Information may be switched as it travels through various.

Biomedical Data & Markov Decision Process

Routing in Wireless Ad Hoc Networks by Analogy to Electrostatic Theory

Reinforcement Learning

Markov Decision Processes

Wireless Communication Co-operative Communications

UAV Route Planning in Delay Tolerant Networks

Markov Decision Processes

Su Yi Babak Azimi-Sadjad Shivkumar Kalyanaraman

Reinforcement learning

Wireless Communication Co-operative Communications

Instructors: Fei Fang (This Lecture) and Dave Touretzky

Delay-Tolerant Communication using Aerial Mobile Robotic Helper Nodes

یادگیری تقویتی Reinforcement Learning

Capacity of Ad Hoc Networks

CMSC 471 – Fall 2011 Class #25 – Tuesday, November 29

Chapter 7: Eligibility Traces

CS 416 Artificial Intelligence

Reinforcement Learning Dealing with Partial Observability

How MAC interacts with Capacity of Ad-hoc Networks – Interference problem Capacity of Wireless Networks – Part Page 1.

Reinforcement Learning (2)

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Reinforcement Learning (2)

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Presentation transcript:

Using node mobility control to enhance network performance Research Goal Using node mobility control to enhance network performance UAV2 Ferrying Relay Direct UAV1 UAV3 GS1 Combination of all three modes can optimize performance GS2

Direct Communication Shannon capacity law Signal strength Thermal noise (normalized) Data rate

However: Disc ≠ Reality Main Question What is the best use of relays? Trivial for “disc” communication What is the best use of relays … in a realistic wireless environment? A B However: Disc ≠ Reality

Problem Characteristics Focused on a single link (distance d) Task vs. Helper nodes Packet based (length L) Applies to Ad hoc networks (how many intermediate hops/relays) Sensor networks, Smart Dust (potentially many relays) Underwater and under-ground networks (high pathloss) Outdoor, Space-based, Ship-based networks (long ranges) d S D

Realistic Radio Environment Part I Shannon Capacity Relates distance and data rate. W = channel BW a = radio parameters e = pathloss exponent ≈ reality Throughput vs. Range Shannon Capacity 802.11g Throughput Disc Distance

Estimating SNR and R(d) Rappaport exponential/bi-linear Model for realistic scenario estimation

Direct transmission (zero relays) Multiple Relays d S dk D Direct transmission (zero relays) End-to-end data rate: RR Packet delay: τ = L/RR Relay transmission

“Single Tx” Relay Model a.k.a., the noise-limited case t=0 S D dk

“Parallel Tx” Relay Model a.k.a., the interference limited case > Optimal distance between transmissions? t=0 t=0 t=0 S ρ D

Performance Analysis 2km 4km 8km 16km ε = 5 PN/a = 10-15 W

Optimal Number of relays Single TX Throughput vs. # of relays Initially: rate increase higher than ‘relaying cost’ (put graph R over #relays here) Then: additional relay decreases R Optimal # of relays:  one relay every dopt with

Optimal Number of Relays Parallel TX kopt = ∞ kopt ~ 9

d=10km PN/a = 4.14·10-15 (based on 802.11) Optimal Reuse Factor ρ ρopt k+1 ε 2 4 8 16 32 64 128 256 ∞ 6 3 5 d=10km PN/a = 4.14·10-15 (based on 802.11) ρopt ≈ min{k+1, 5}

Rate-Distance Phase Plot

Ferrying Analogy: Dial-A-Ride Dial-A-Ride: curb-to-curb, shared ride transportation service request 1 request 2 hospital request 3 The Bus request 4 school Receives calls Picks up and drops off passengers Transport people quickly ! request 5 Path Planning Problem Motivation Problems with TSP Solutions Queueing Theoretical Approach Formulation as MDP and Solution Methods depot Optimal route not trivial !

One cycle visits every node TSP’s Problem Traveling Salesman Solution A B UAV New: cycle defined by visit frequencies pi pA pB hub One cycle visits every node Problem: far-away nodes with little data to send Visit them less often dA dB fA fB B TSP has no option but alternately visiting A and B.

Minimize average delay Stochastic Modeling Goal Minimize average delay Idea: express delay in terms of pi, then minimize over set {pi} pi as probability distribution Expected service time of any packet Inter-service time: exponential distribution with mean T/pi Weighted delay: A B UAV fB fA pA pB dA dB pC * Since the inter-service time is exp. distributed and we are picking up ALL waiting packets when visiting a node, the average delay for a node is the mean of the exponential distribution. * f_i/F is fractional visit probability for node i. C hub pD dC dD D fC fD

Solution and Algorithm Probability of choosing node i for next visit: Implementation: deterministic algorithm 1. Set ci = 0 2. ci = ci + pi while max{ci} < 1 3. k = argmax {ci} 4. Visit node k; ck = ck-1 5. Go to 2. Improvement over TSP! Pretty simplistic view of the world ! Random selection ignores many parameters.

Reinforcement Learning Reinforcement Learning (AI technique) Learning what to do without prior training Given: high-level goal; NOT: how to reach it Improving actions on the go Features: Interaction with environment Concept of Rewards & Punishments Trial & Error Search Example: riding a bike

Reinforcement Learning Approach Training Info = evaluations (“rewards” / “penalties”) Inputs (“states”) RL System Outputs (“actions”) Objective: maximize reward over time

The Framework Agent Environment Goal: The Beauty: Performs Actions Gives Rewards Puts Agent in situations called States Goal: Learn what to do in a given state (Policy) The Beauty: Learns model of environment and retains it.

Markov Decision Process Series of States/Actions: t . . . s a r t +1 t +2 t +3 Markov Property: reward and next state depend only on the current state and action, and not on the history of states or actions.

Measure of Goodness Policy: Mapping from set of States to set of Actions Sum of Rewards (:=return): from this time onwards Value function (of a state): Expected return when starting in s and following policy π. For an MDP: Solution methods Dynamic Programming (small, easy problems) Monte Carlo simulation Temporal Difference learning (powerful, but slow)

MDP If a reinforcement learning task has the Markov Property, it is basically a Markov Decision Process (MDP). If state and action sets are finite, it is a finite MDP. To define a finite MDP, you need to give: state and action sets one-step “dynamics” defined by transition probabilities: reward expectation:

RL approach to solving MDPs Policy: Mapping from set of States to set of Actions π : S → A Sum of Rewards (:=return): from this time onwards Value function (of a state): Expected return when starting with s and following policy π. For an MDP,

Bellman Equation for Policy π Evaluating E{.}; assuming deterministic policy; π solution: Action-Value Function: Value of taking action a in state s. For an MDP,

Optimality V is partially ordered since real valued. π also ordered: Concept of V*: We approximate the optimal action

Temporal Difference Learning TD-V method tries to approximate V value function TD Methods update the Value functions (V) at each iteration, eventually reaching their optimal form (V*). A TD agent learns by reducing discrepancies in the estimates made by the agent at different times.

UA Path Planning - Complex B λA λB F H D C λD λC State: tuple of accumulated node traffic, here Actions: round trip through subset of nodes, e.g., A, B, C, D, AB, AC,…DCBA state: current ferry location, flow rates, accumulated traffic at each node, ferry buffer level Rew: #packets delivered, kept buffer level low, packet queuing delay

Reward Criterion Reward:

Ferry Path Planning Results TSP = Traveling Salesman solution RL = Reinforcement Learning RR = Round Robin (naive) STO = Stochastic Modeling