Towards Autonomous Data Ferry Route Design in Delay Tolerant Networks

Slides:



Advertisements
Similar presentations
Reinforcement Learning
Advertisements

Dialogue Policy Optimisation
Ai in game programming it university of copenhagen Reinforcement Learning [Outro] Marco Loog.
Decision Theoretic Planning
1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.
Planning under Uncertainty
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Reinforcement Learning.
Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)
Planning in MDPs S&B: Sec 3.6; Ch. 4. Administrivia Reminder: Final project proposal due this Friday If you haven’t talked to me yet, you still have the.
Distributed Q Learning Lars Blackmore and Steve Block.
Cooperative Q-Learning Lars Blackmore and Steve Block Expertness Based Cooperative Q-learning Ahmadabadi, M.N.; Asadpour, M IEEE Transactions on Systems,
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.
More RL. MDPs defined A Markov decision process (MDP), M, is a model of a stochastic, dynamic, controllable, rewarding process given by: M = 〈 S, A,T,R.
CS230 Project Mobility in Energy Harvesting Wireless Sensor Network Nga Dang, Henry Nguyen, Xiujuan Yi.
Roadmap-Based End-to-End Traffic Engineering for Multi-hop Wireless Networks Mustafa O. Kilavuz Ahmet Soran Murat Yuksel University of Nevada Reno.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Search and Planning for Inference and Learning in Computer Vision
Asst. Prof. Dr. Mongkut Piantanakulchai
Swarm Intelligence 虞台文.
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.
Reinforcement Learning 主講人:虞台文 Content Introduction Main Elements Markov Decision Process (MDP) Value Functions.
Mobile Agent Migration Problem Yingyue Xu. Energy efficiency requirement of sensor networks Mobile agent computing paradigm Data fusion, distributed processing.
Cooperative Q-Learning Lars Blackmore and Steve Block Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents Tan, M Proceedings of the.
Regularization and Feature Selection in Least-Squares Temporal Difference Learning J. Zico Kolter and Andrew Y. Ng Computer Science Department Stanford.
University of Colorado at Boulder
Learning to Navigate Through Crowded Environments Peter Henry 1, Christian Vollmer 2, Brian Ferris 1, Dieter Fox 1 Tuesday, May 4, University of.
CUHK Learning-Based Power Management for Multi-Core Processors YE Rong Nov 15, 2011.
Distributed Q Learning Lars Blackmore and Steve Block.
Reinforcement Learning
Reinforcement Learning Dynamic Programming I Subramanian Ramamoorthy School of Informatics 31 January, 2012.
1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 21: Dynamic Multi-Criteria RL problems Dr. Itamar Arel College of Engineering Department.
Stochastic Optimization for Markov Modulated Networks with Application to Delay Constrained Wireless Scheduling Michael J. Neely University of Southern.
Reinforcement Learning for 3 vs. 2 Keepaway P. Stone, R. S. Sutton, and S. Singh Presented by Brian Light.
CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.
Constraint-Based Routing
A Crash Course in Reinforcement Learning
Analytics and OR DP- summary.
Reinforcement Learning (1)
Reinforcement Learning in POMDPs Without Resets
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Reinforcement Learning
Switching Techniques In large networks there might be multiple paths linking sender and receiver. Information may be switched as it travels through various.
Timothy Boger and Mike Korostelev
Biomedical Data & Markov Decision Process
Multi-Agent Exploration
Sensor Data Collection Through Unmanned Aircraft Gateways
Markov Decision Processes
UAV Route Planning in Delay Tolerant Networks
Markov Decision Processes
Announcements Homework 3 due today (grace period through Friday)
Using node mobility control to enhance network performance
Switching Techniques.
Dr. Unnikrishnan P.C. Professor, EEE
Delay-Tolerant Communication using Aerial Mobile Robotic Helper Nodes
یادگیری تقویتی Reinforcement Learning
Reinforcement Learning
October 6, 2011 Dr. Itamar Arel College of Engineering
Markov Decision Problems
Javad Ghaderi, Tianxiong Ji and R. Srikant
Designing Neural Network Architectures Using Reinforcement Learning
Chapter 7: Eligibility Traces
AI Applications in Network Congestion Control
Reinforcement Learning (2)
Markov Decision Processes
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Markov Decision Processes
Reinforcement Learning (2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Presentation transcript:

Towards Autonomous Data Ferry Route Design in Delay Tolerant Networks Daniel Henkel, Timothy X Brown University of Colorado @ Boulder WoWMoM/AOC ‘08 June 23, 2008 Actual title: Route Design for UAV-based Data Ferries in Delay Tolerant Wireless Networks TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAA

Familiar: Dial-A-Ride Dial-A-Ride: curb-to-curb, shared ride transportation service request 1 request 2 hospital request 3 The Bus request 4 school Receives calls Picks up and drops off passengers Transport people quickly ! request 5 Path Planning Problem Motivation Problems with TSP Solutions Queueing Theoretical Approach Formulation as MDP and Solution Methods depot Optimal route not trivial !

In context: Dial-A-UAV Complication: infinite data at sensors; potentially two-way traffic Delay tolerant traffic! Sensor-1 Sensor-3 Sensor-5 Monitoring Station Sensor-2 Sensor-6 Sensor-4 Have sensors all on left side Highlight ferrying to the right side SMS locations Sparsely distributed sensors, limited radios TSP solution not optimal Our approach: Queueing and MDP theory

TSP’s Problem Traveling Salesman Solution One cycle visits every node UAV New: cycle defined by visit frequencies pi pA pB hub One cycle visits every node Problem: far-away nodes with little data to send Visit them less often dA dB fA fB B TSP has no option but alternately visiting A and B.

Minimize average delay Queueing Approach Goal Minimize average delay Idea: express delay in terms of pi, then minimize over set {pi} pi as probability distribution Expected service time of any packet Inter-service time: exponential distribution with mean T/pi Weighted delay: A B UAV fB fA pA pB dA dB pC * Since the inter-service time is exp. distributed and we are picking up ALL waiting packets when visiting a node, the average delay for a node is the mean of the exponential distribution. * f_i/F is fractional visit probability for node i. C hub pD dC dD D fC fD

Solution and Algorithm Probability of choosing node i for next visit: Implementation: deterministic algorithm 1. Set ci = 0 2. ci = ci + pi while max{ci} < 1 3. k = argmax {ci} 4. Visit node k; ck = ck-1 5. Go to 2. Improvement over TSP! Pretty simplistic view of the world ! Random selection ignores many parameters.

There’s More to It! New perspective: States # people waiting at location Varying # of calls (daytime) Current bus location Actions Drive to a location Goal Short passenger wait time request 1 request 2 request 3 request 4 request 5 depot  Generally unknown environment

Promising Technique Reinforcement Learning (AI technique) Features: Learning what to do without prior training Given: high-level goal; NOT: how to reach it Improving actions on the go Features: Interaction with environment Concept of Rewards & Punishments Trial & Error Search Example: riding a bike

The Framework Agent Environment Goal: Performs Actions Gives Rewards Puts Agent in situations called States Goal: Learn what to do in a given state (Policy) The Beauty: Learns model of environment and retains it.

Markov Decision Process Series of States/Actions: t . . . s a r t +1 t +2 t +3 Markov Property: reward and next state depend only on the current state and action, and not on the history of states or actions.

MDP Terms Policy: Mapping from set of States to set of Actions Sum of Rewards (:=return): from this time onwards Value function (of a state): Expected return when starting in s and following policy π. For an MDP: Solution methods Dynamic Programming, Monte Carlo simulation Temporal Difference learning

UAV Path Planning State: tuple of accumulated node traffic, here B λA λB F H D C λD λC State: tuple of accumulated node traffic, here Actions: round trip through subset of nodes, e.g., A, B, C, D, AB, AC,…DCBA state: current ferry location, flow rates, accumulated traffic at each node, ferry buffer level Rew: #packets delivered, kept buffer level low, packet queuing delay

Reward Criterion Reward:

Temporal Difference Learning Recursive state value approximation Convergence to “true value” as Extract policy from value function

Paths

Simulation Results RR = Round Robin (naive) TSP = Traveling Salesman solution RL = Reinforcement Learning RR = Round Robin (naive) STO = Stochastic Modeling

Conclusion/Extensions Shown two algorithms to route UAVs RL viable approach Extensions: Structured state space Action space (options theory) Hierarchical structure / peer-to-peer flows Interrupt current action and start over Adapt and optimize learning method

Germany – Turkey Soccer - Euro Cup 2008 [ 4 : 1 ] Wednesday, 11:45am (PST) Germany – Turkey [ 4 : 1 ]

Research & Engineering Center for Unmanned Vehicles (RECUV) Questions Research and Engineering Center for Unmanned Vehicles University of Colorado at Boulder http://recuv.colorado.edu The Research and Engineering Center for Unmanned Vehicles at the University of Colorado at Boulder is a university, government, and industry partnership dedicated to advancing knowledge and capabilities in using unmanned vehicles for scientific experiments, collecting geospatial data, mitigation of natural and man-made disasters, and defense against terrorist and hostile military activities.