Berkeley UAV / UGV Testbed

Slides:

Advertisements

Similar presentations

A Hierarchical Multiple Target Tracking Algorithm for Sensor Networks Songhwai Oh and Shankar Sastry EECS, Berkeley Nest Retreat, Jan

Advertisements

MicroCART Micro processor C ontrolled A erial R obotics T eam Abstract MicroCART is a group of EE/CprE students tasked with developing an autonomous helicopter.

Robot Sensor Networks. Introduction For the current sensor network the topography and stability of the environment is uncertain and of course time is.

A Survey on Tracking Methods for a Wireless Sensor Network Taylor Flagg, Beau Hollis & Francisco J. Garcia-Ascanio.

Luis Mejias, Srikanth Saripalli, Pascual Campoy and Gaurav Sukhatme.

Uncertain Multiagent Systems: Games and Learning H. Jin Kim, Songhwai Oh and Shankar Sastry University of California, Berkeley July 17, 2002 Decision-Making.

MASKS © 2004 Invitation to 3D vision Lecture 11 Vision-based Landing of an Unmanned Air Vehicle.

Control and Decision Making in Uncertain Multiagent Hierarchical Systems June 10 th, 2002 H. Jin Kim and Shankar Sastry University of California, Berkeley.

Automatic Control & Systems Engineering Autonomous Systems Research Mini-UAV for Urban Environments Autonomous Control of Multi-UAV Platforms Future uninhabited.

Chess Review May 11, 2005 Berkeley, CA Tracking Multiple Objects using Sensor Networks and Camera Networks Songhwai Oh EECS, UC Berkeley

PEG Breakout Mike, Sarah, Thomas, Rob S., Joe, Paul, Luca, Bruno, Alec.

Image Processing of Video on Unmanned Aircraft Video processing on-board Unmanned Aircraft Aims to develop image acquisition, processing and transmission.

Pursuit Evasion Games (PEGs) Using a Sensor Network Luca Schenato, Bruno Sinopoli Robotics and Intelligent Machines Laboratory UC Berkeley

Dr. Shankar Sastry, Chair Electrical Engineering & Computer Sciences University of California, Berkeley.

Simultaneous Localization and Map Building System for Prototype Mars Rover CECS 398 Capstone Design I October 24, 2001.

Reinforcement Learning Yishay Mansour Tel-Aviv University.

Autonomous Robotics Team Autonomous Robotics Lab: Cooperative Control of a Three-Robot Formation Texas A&M University, College Station, TX Fall Presentations.

POLI di MI tecnicolano VISION-AUGMENTED INERTIAL NAVIGATION BY SENSOR FUSION FOR AN AUTONOMOUS ROTORCRAFT VEHICLE C.L. Bottasso, D. Leonello Politecnico.

Deon Blaauw Modular Robot Design University of Stellenbosch Department of Electric and Electronic Engineering.

Aeronautics & Astronautics Autonomous Flight Systems Laboratory All slides and material copyright of University of Washington Autonomous Flight Systems.

Jason Li Jeremy Fowers Ground Target Following for Unmanned Aerial Vehicles.

June 12, 2001 Jeong-Su Han An Autonomous Vehicle for People with Motor Disabilities by G. Bourhis, O.Horn, O.Habert and A. Pruski Paper Review.

1 DARPA TMR Program Collaborative Mobile Robots for High-Risk Urban Missions Second Quarterly IPR Meeting January 13, 1999 P. I.s: Leonidas J. Guibas and.

Embedded Microcomputer Systems Andrew Karpenko 1 Prepared for Technical Presentation February 25 th, 2011.

Mobile Distributed 3D Sensing Sandia National Laboratories Intelligent Sensors and Robotics POC: Chris Lewis

Sérgio Ronaldo Barros dos Santos (ITA-Brazil) Sidney Nascimento Givigi Júnior (RMC-Canada) Cairo Lúcio Nascimento Júnior (ITA-Brazil) Autonomous Construction.

Multiple Autonomous Ground/Air Robot Coordination Exploration of AI techniques for implementing incremental learning. Development of a robot controller.

A Shaft Sensorless Control for PMSM Using Direct Neural Network Adaptive Observer Authors: Guo Qingding Luo Ruifu Wang Limei IEEE IECON 22 nd International.

Cooperating AmigoBots Framework and Algorithms

Computational Mechanics and Robotics The University of New South Wales

Landing a UAV on a Runway Using Image Registration Andrew Miller, Don Harper, Mubarak Shah University of Central Florida ICRA 2008.

Vision-based Landing of an Unmanned Air Vehicle

1 Distributed and Optimal Motion Planning for Multiple Mobile Robots Yi Guo and Lynne Parker Center for Engineering Science Advanced Research Computer.

10/19/2005 ACGSC Fall Meeting, Hilton Head SC Copyright Nascent Technology Corporation © 2005 James D. Paduano 1 NTC ACTIVITIES 2005 Outline 1)Activities.

Cooperative Air and Ground Surveillance Wenzhe Li.

Network UAV C3 Stage 1 Final Briefing Timothy X Brown University of Colorado at Boulder Interdisciplinary Telecommunications Program Electrical and Computer.

Reinforcement Learning Yishay Mansour Tel-Aviv University.

1 Structure of Aalborg University Welcome to Aalborg University.

Learning to Navigate Through Crowded Environments Peter Henry 1, Christian Vollmer 2, Brian Ferris 1, Dieter Fox 1 Tuesday, May 4, University of.

Decision Making Under Uncertainty PI Meeting - June 20, 2001 Distributed Control of Multiple Vehicle Systems Claire Tomlin and Gokhan Inalhan with Inseok.

Hybrid Systems Controller Synthesis Examples EE291E Tomlin/Sastry.

Multi-Player Pursuit Evasion Games, Learning, and Sensor Webs Shankar Sastry University of California, Berkeley ATO Novel Approaches to Information Assurance.

4/22/20031/28. 4/22/20031/28 Presentation Outline  Multiple Agents – An Introduction  How to build an ant robot  Self-Organization of Multiple Agents.

Marilyn Wolf1 With contributions from:

Cloud Cap Technologies

Intelligent Transportation System

COGNITIVE APPROACH TO ROBOT SPATIAL MAPPING

Distributed Vehicle Routing Approximation

Enabling Team Supervisory Control for Teams of Unmanned Vehicles

Paper – Stephen Se, David Lowe, Jim Little

CS b659: Intelligent Robotics

David Shim Omid Shakernia

A Vision System for Landing an Unmanned Aerial Vehicle

Pursuit Evasion Games and Multiple View Geometry

Wireless Sensor Network Architectures

Pursuit-Evasion Games with UGVs and UAVs

Vision Based Motion Estimation for UAV Landing

PAX River Competition UK Aerial Robotics Team University of Kentucky.

Probabilistic Pursuit-Evasion Games with UGVs and UAVs

Pursuit Evasion Games and Multiple View Geometry

Review and Ideas for future Projects

Timothy Boger and Mike Korostelev

"Playing Atari with deep reinforcement learning."

UAV Route Planning in Delay Tolerant Networks

Hidden Markov Models Part 2: Algorithms

Multiagent Systems Game Theory © Manfred Huber 2018.

Joe Trefilek Jeff Kubascik Paul Scheffler Matt Rockey

Distributed Sensing, Control, and Uncertainty

Distributed Control Applications Within Sensor Networks

SENSOR BASED CONTROL OF AUTONOMOUS ROBOTS

Presentation transcript:

Berkeley UAV / UGV Testbed Dr. Shankar Sastry, Chair Electrical Engineering& Computer Sciences University of California, Berkeley

Overview Overview of UAV system Selection of vehicle platform Sensor system Model Identification Hierarchical Control System Low-level vehicle stabilization and control Way-point navigation Further applications

UAV Design Procedure Objective Definition Vehicle Selection What do we need UAV for? What do we want to do with UAV? Vehicle Selection - What functions do we want from UAV? - What size of UAV do we need? What sensor do we need? What kind of computer is good for us? Avionics Selection -Helicopter model identification -Controller Design (classical/modern control) -Building hardware, mounting, enclosure… Control System Implementation Flight Test -Careful experiment planning is mandatory for safety

Key points of Successful RUAV Implementation Payload problem Find a helicopter powerful enough to carry necessary sensors&computers Navigation sensor integration Implement accurate INS/GPS combinatory sensor Helicopter Control Model Identification Obtain high-fidelity control model to design stabilizing low-level controller Hardware/software/vehicle integration

Specifications of Berkeley RUAVs Name Length Height Width Weight Payload Engine Autonomy Kyosho Concept 60 1.4 m 0.47 m 0.39 m 4.5kg airframe 5 kg payload 4.8kg avionics OS FX91 Glow engine 2.8bhp Boeing DQI NovAtel RT-2 MediaGX233 Bergen Industrial Twin 1.5 m 0.7 m 0.3 m 7kg dry weight 10 kg payload Twin Genoa Gas Engine N/A Yamaha R-50 3.58 m 1.08 m 44kg airframe 20 kg payload 10kg avionics Water-cooled 2 stroke 1 cylinder gas engine 98cc, 12ps Pentium233 2 ultrasonic altimeter Vision processor RMAX 3.63 m 0.72 m 58 kg dry weight 30 kg payload 15kg avionics Water cooled 2 cylinder Gas engine 256cc, 21ps Dual Flight Computer Dynamic Network Router Digital compass

Berkeley BEAR Fleet: Ursa Minor3 (1999- ) Boeing DQI-NP on gel mounting GPS Card GPS Antenna Wireless Modem Length: 1.4m Width: 0.39m Height: 0.47m Weight: 9.4 kg Engine Output: 2.8 bhp Rotor Diameter: 1.5m Flight time: 15 min System operation time: 30 min Navigation computer Radio Receiver

Bergen with shock-absorbing landing gear Pneumatic-operating shock-absorbing landing gear Length: 1.5m Width:0.3m Height: 0.7m Dry Weight: 8 kg Payload: 10kg

Berkeley BEAR Fleet: Ursa Magna2 (1999- ) Based on Yamaha R-50 industrial helicopter Camera GPS Antenna Wavelan Antenna Ultrasonic Height meter Integrated Nav/Comm Module Length: 3.5m Width:0.7m Height: 1.08m Dry Weight: 44 kg Payload: 20kg Engine Output: 12 hp Rotor Diameter: 3.070m Flight time: 60 min System operation time: 60 min Boeing DQI-NP on fluid mounting

Berkeley BEAR Fleet: Ursa Maxima 1 (2000- ) Based on Yamaha RMAX industrial helicopter Integrated Nav/Comm Module Length: 3.63m Width:0.72m Height: 1.08m Dry Weight: 58 kg Payload: 30kg Engine Output: 21 hp Rotor Diameter: 3.115m Flight & system operation time: 60 min

Flight Control System signal flow CHANNEL SELECTION, TAKE-OVER DECISION CONTROL MODE NAV SENSOR SUITE Flight Data Human pilot Control input YAMAHA RECEIVER PWM READING CTC #2,8,9,10 FEEDFORWARD/ FEEDBACK CONTROL PWM GENERATION CTC# 3,4,5,6,7 PWM CH1-5 PWM CH1-5 PWM CH1-5 PWM CH1-5 PWM DRIVER OPTO- ISOLATOR MECHANICAL RELAY ARRAY PWM CH1-5 PWM CH1-5 PWM CH1-5 Full manual mode YACS (YAMAHA ATTITUDE CONTROL SYSTEM) SERVO x5 PWM CH1-5

Constructed by Hoam Chung, David Shim, September 2000 Navigation Hardware: Ursa Maxima 1 NOVATELGPS RT-2 SECONDARY NAV COMP Win98 PC104 K6-400 Digital Compass ROUTER FreeBSD MediaGX233 BATT BATT BATT BATT BOEING DQI-NP DC/DC Converter (for DQI-NP;24V) NAV COMP QNX PC104 K6-400 Lucent WaveLAN Ethernet Hub Constructed by Hoam Chung, David Shim, September 2000

Navigation Software: DQI-NP-Based 4±1Hz 10Hz Nav Data to Vision computer @10Hz PERIODIC VCOMM ULREAD Ultrasonic sensors@4±1Hz APERIODIC Relative Altitude Nav data Processes running on QNX Control output at 50Hz DQICONT PERIODIC 100Hz INS Update Boeing DQI-NP Flight Status Command RX values Yamaha Receiver (using HW INT & proxy) Ground Station DGPS measurement RS-232 Shared Memory Radio link PRTK@ 5Hz PXY@1Hz DQIGPS ANYTIME PERIODIC NovAtel GPS RT-2 GPS Update Ground computer Win 98

Wireless Communication Ground Monitoring System Landing Decks Ground Mobile Robots UAVs Lucent Orinoco (WaveLAN) (Ad Hoc Mode) DGPS Correction Broadcast via WaveLAN or Wireless Modem

Hierarchy of the UAVS Management System Helicopter Platform Regulation Trajectory Generator Tactical Planner Strategic Planner Detector Discrete Event System Continuous sensory information tracking errors flight modes Control Law y_d replan control points conflict notification detect Regulation

Flight Control System Experiments Position+Heading Lock (Dec 1999) Landing scenario with SAS (Dec 1999) Attitude control with mu-syn (July 2000) Position+Heading Lock (May 2000)

Hierarchy of the UAVS Management System Helicopter Platform Regulation Trajectory Generator Tactical Planner Strategic Planner Detector Discrete Event System Continuous sensory information tracking errors flight modes Control Law y_d replan control points conflict notification detect Trajectory Generator

Vehicle Control Language Objective Develop an abstract and unified UAV mission control language environment Features Mission-independent Executed as batch or interactive mode Seamlessly integrated with existing hierarchy Can be integrated with graphic interface via automatic code generator

Flight control synthesis: way-point navigation Helicopter Mode transition Sideslip Pirouette Bank-to-turn Take-off Hover Forward Flight Land Ascend/ Descend

Hierarchy of the UAVS Management System Helicopter Platform Regulation Trajectory Generator Tactical Planner Strategic Planner Detector Discrete Event System Continuous sensory information tracking errors flight modes Control Law y_d replan control points conflict notification detect Helicopter Platform

VCL Execution Module Structure PWM CH1-5 VCL INTERPRETER BATCH or INTERACTIVE MODE VEHICLE REFERENCE COMMANDS GROUND STATION CURRENT FLIGHT FEASIBILITY DECISION REFERENCE INPUT GENERATION FEEDFORWARD/ FEEDBACK CONTROL SUITE NAV SENSOR SUITE NAV DATA

Waypoint Navigation using VCL (Aug 1, 2000)

Vision Based Motion Estimation for UAV Landing Cory Sharp, Omid Shakernia Department of EECS University of California at Berkeley

Outline Motivation Vision based ego-motion estimation Evaluation of motion estimates Vision system hardware/software Landing target design/tracking Active camera control Flight videos

Goal: Autonomous UAV landing on a ship’s flight deck Motivation Goal: Autonomous UAV landing on a ship’s flight deck Challenges Hostile operating environments high winds, pitching flight deck, ground effect UAV undergoing changing nonlinear dynamics Why the vision sensor? Passive sensor (for stealth) Gives relative UAV motion to flight deck U.S. Navy photo

Objective for Vision Based Landing

Vision/Navigation System coordination UAV Controller Monitoring Station Wireless Ethernet UAV Controller Transmit Receive Motion Estimates Vision System Monitoring Station Vision Computer Feature Tracker PTZ Camera control Image Features Video RCV Frame- grabber 2.4 GHz Video TX Image & Features

Vehicle Control Language Vision in Control Loop Camera Pan/Tilt Control Feature Point Correspondence Motion Estimation Image Processing, Corner Finding Helicopter State RS-232 Control Strategy Vehicle Control Language Navigation Computer Vision Computer

Vision System Hardware Ampro embedded PC Little Board P5/x Low power Pentium 233MHz, running LINUX 440 MB flashdisk HD, robust to body vibration Runs motion estimation algorithm Controls PTZ camera Motion estimation algorithms Written and optimized in C++ using LAPACK Give motion estimates at 30 Hz

Pan/Tilt Camera Control Feature tracking issues: Leave the field of view Pan/tilt increases the range of motion of the UAV Pan/tilt control drive all feature points to the center of the image

Flight Video

Pitching Landing Deck Landing deck to simulate motion of a ship at sea 6 electrically actuated cylindrical shafts Motion Parameters: sea state (freq, amp of waves) ship speed direction into waves Stiffened Aluminum construction Dimensions: 8’ x 6’

Moving Landing Pad

Landing on Deck

Probabilistic Pursuit-Evasion Games with UGVs and UAVs René Vidal C. Sharp, D. Shim, O. Shakernia, J. Hespanha, J. Kim, S. Rashid, S. Sastry University of California at Berkeley 04/05/01

Outline Introduction Pursuit Evasion Games Map Building Pursuit Policies Hierarchical Control Architecture Strategic Planner, Tactical Planner, Regulation, Sensing, Control System, Agent and Communication Architectures Architecture Implementation Tactical Layer: UGVs, UAVs, Hardware, Software, Sensor Fusion Strategic Layer: Map Building, Pursuit Policies, Visual Interface Experimental Results Evaluation of Pursuit Policies Pursuit Evasion Games with UGV’s and UAV’s Conclusions and Current Research

Introduction: The Pursuit-Evasion Scenario Evade!

Introduction: Theoretical Issues Probabilistic map building Coordinated multi-agent operation Networking and intelligent data sharing Path planning Identification of vehicle dynamics and control Sensor integration Vision system

Pursuit-Evasion Games Consider approach in Hespanha, Kim and Sastry Multiple pursuers catching one single evader Pursuers can only move to adjacent empty cells Pursuers have perfect knowledge of current location Sensor model: false positives (p) and negatives (q) for evader detection Evader moves randomly to adjacent cells Extensions in Rashid and Kim Multiple evaders: each one is recognized individually Supervisory agents: can “fly” over obstacles and evaders, cannot capture Sensor model for obstacle detection as well

Map Building: Map of Obstacles Sensor model: p = prob of false positive q = prob of false negative For a map, M, If sensor makes positive reading: M (x,y,t) = (1-q)*M(x,y,t-1)/((1-q)*M(x,y,t-1)+p*(1-M(x,y,t)) If sensor make negative reading: M (x,y,t) = q*M(x,y,t-1)/(q*M(x,y,t-1)+(1-p)*(1-M(x,y,t))

Map Building: Map of Evaders At each t, 1. Measurement step + y(t) ={v(t),e(t),o(t)} model for sensor 2. Prediction step model for evader’s motion

Pursuit Policies Greedy Policy Global-Max Policy Pursuer moves to the adjacent cell with the highest probability of having an evader over all maps Strategic planner assigns more importance to local measurements Global-Max Policy Pursuer moves towards the place with the highest probability of having an evader in the map May not take advantage of multiple pursuers (may move to the same place)

Pursuit Policies Theorem 1 (Hespanha, Kim, Sastry): For a greedy policy, The probability of the capture time being finite is equal to one The expected value of the capture time is finite Theorem 2 (Hespanha, Kim, Sastry): For a stay-in-place policy, The expected capture time increases as the speed of the evader decreases If the speed of the evader is zero, then the probability of the capture time being finite is less than one.

Hierarchical System Architecture position of evader(s) position of obstacles strategy planner position of pursuers map builder communications network evaders detected obstacles pursuers positions Desired pursuers positions tactical planner trajectory regulation tactical planner & regulation actuator positions [4n] lin. accel. & ang. vel. [6n] inertial [3n] height over terrain [n] obstacles detected evaders detected vehicle-level sensor fusion state of helicopter & height over terrain obstacles detected control signals [4n] agent dynamics actuator encoders INS GPS ultrasonic altimeter vision Exogenous disturbances terrain evader

Agent Architecture Segments the control of each agent into different layers of abstraction The same high-level control strategies can be applied to all agents Strategic Planner Mission planning, high level control, communication Tactical Planner Trajectory planning, Obstacle Avoidance, Regulation Regulation Low level control and sensing

Communication Architecture Map building and Strategic Planner can be Centralized: one agent will receive sensor information, build and broadcast the map Decentralized: each agent build its own map and shares its readings with the rest of the team Communication network can be Perfect: no packet loss, no transmission time, no network delay. Here all pursuers have identical map Imperfect: each agent will update its map and make decisions with the information available to it

Architecture Implementation: Part I Common Platform for UGV and UGV On board computer: Tactical Planner and Sensor Fusion GPS: Positioning Vision system: Obstacle and Evader Detection Wavelan and Ayllu: Communication Specific UGV Platform Pioneer Robot: sonars, dead-reckoning, compass Micro-controller: regulation and low-level control Saphira or Ayllu: Tactical Planning Specific UAV Platform Yamaha R-50: INS, ultrasonic sensors, inertial sensors, compass Navigation Computer: regulation and low-level control David Shim Control System

Vision System: PTZ & ACTS Hardware Onboard Computer: Linux Sony pan/tilt/zoom camera PXC200 frame grabber Camera Control Software in Linux Send PTZ Commands Receive Camera State ACTS System Captures and processes video 32 color channels 10 blobs per channel Extract color information and sends it to a TCP socket Number of blobs, Size and position of each blob

Visual based position estimation Motion Model Image Model Camera position and orientation Helicopter orientation relative to ground Camera orientation relative to helicopter Camera calibration Width, height, zoom Robot position estimate

Communication Hardware Network is setup in ad-hoc mode Lucent Wavelan wireless card (11Mbs) Network is setup in ad-hoc mode TCP/IP sockets Ayllu TBTRF: SRI mobile routing scheme Set of behaviors for distributed control of multiple mobile robots Messages can be passed among behaviors The output of a behavior can be connected to a local or remote input of another behavior

Pioneer Ground Robots Hardware Sensors Communication Micro controller: motion control Onboard computer: communication, video processing, camera control Sensors Sonars: obstacle avoidance, map building GPS & compass: positioning Video camera: map building, navigation, tracking Communication Serial Wave-LAN: communication between robots and base station Radio modem: GPS communication

Yamaha Aerial Robots Yamaha R-50 helicopter Navigation Computer Pentium 233, running QNX Low Level Control - Sensing GPS INS UAV controller David Shim Controller Vehicle Control Language Vision Computer Serial communication to receive state of the helicopter We do not send commands yet

Architecture Implementation: Part II Strategic Planner Navigation Computer Serial Vision Computer Helicopter Control GPS: Position INS: Orientation Camera Control Color Tracking UGV Position Estimation Communication Map Building Pursuit Policies Communication Runs in Simulink Same for Simulation and Experiments UAV Pursuer TCP/IP Serial Robot Micro Controller Robot Computer UGV Pursuer UGV Evader Robot Control DeadReck: Position Compass: Heading Camera Control Color Tracking GPS: Position Communication

Pursuit-Evasion Game Experiment using Simulink PEG with four UGVs Global-Max pursuit policy Simulated camera view (radius 7.5m with 50degree conic view) Pursuer=0.3m/s Evader=0.1m/s MAX

Experimental Results: Evaluation of Policies

Experimental Results: Evaluation of Policies

Experimental Results: Pursuit Evasion Games with 1UAV and 2 UGVs (Summer’ 00)

Experimental Results: Pursuit Evasion Games with 4 UGVs and 1 UAV (Spring’ 01)

Experimental Results: Pursuit Evasion Games with 4UGVs and 1 UAV (Spring’ 01)

Conclusions and Current Research The proposed architecture has been successfully applied to the control of multiple agents for the pursuit evasion scenario Experimental results confirm theoretical results Global-max outperforms greedy in a real scenario and is robust to changes in evader motion What’s missing: Vision computer controlling the helicopter Current Research Collision Avoidance and UAV Path Planning Montecarlo based learning of Pursuit Policies Communication

Maria Prandini Univ. of Brescia & UC Berkeley A Probabilistic Framework for Pursuit-Evasion Games with Unmanned Air Vehicles Maria Prandini Univ. of Brescia & UC Berkeley In collaboration with J. Hespanha, J. Kim, and S. Sastry

Key Ideas The “mission-level” control of Unmanned Air Vehicles requires a probabilistic framework. The problem of coordinating teams of autonomous agents is naturally formulated in a game theoretical setting. Exact solutions for these types of problems are often computationally intractable and, in some cases, open research problems.

The “rules” of the game obstacles UAVs evader

The “rules” of the game Terrain: with fixed obstacles obstacles not accurately mapped obstacles evader UAVs UAVs (pursuers) capable of: flying between obstacles seeing a region around them (limited by the occlusions) Evader capable of: moving between obstacles (possibly actively avoiding detection) Objective: find the evader in minimum time

Scenarios obstacles search and rescue operations UAVs person in danger

Scenarios obstacles search and rescue operations finding parts in a warehouse part

Scenarios obstacles search and rescue operations finding parts in a warehouse search and capture operations UCAVs enemy

Scenarios obstacles search and rescue operations finding parts in a warehouse search and capture operations monitoring environmental threats UCAVs fire

Strategies for pursuit-evasion games LaValle, Latombe, Guibas, et al. considered a similar problem but assume the map of the region is known, the pursuers have perfect sensors, and worst case trajectories for the evader: How many UAVs are needed to win the game in finite time? ? 1 agent is sufficient 2 agents are needed (no matter what strategy a single pursuer chooses, there is a trajectory for the evader that avoids detection)

Exploring a region to build a map Deng, Papadimitriou, et al., study the problem of building a map (seeing all points in the region) traversing the smallest possible distance. standard “keep wall to the right” algorithm algorithm that takes better advantage of the cameras capabilities

A two step solution… exploration followed by pursuit is not efficient sensors are imprecise worst case assumptions the trajectories of the evaders leads to very conservative results exploration ? pursuit

A different approach… Use a probabilistic framework to combine exploration and pursuit-evasion games. exploration Non determinism comes from: poorly mapped terrain noise and uncertainty in the sensors probabilistic models for the motion of the evader and the UAVs ? pursuit

Markov Decision Processes 2 1 3 … 16 4 5 time t  {1, 2, 3,…} state xt X := {1 ,2 ,…, 16} action ut U := {up, down, left, right} transition probability function p(x,x’,u) = P(xt+1=x’ | xt=x, ut=u) .1|right 1 2 3 5 6 .7|right … 4

Markov Decision Processes 2 1 … 16 4 5 xgoal time t  {1, 2, 3,…} state xt X := {1 ,2 ,…, 16} action ut U := {up, down, left, right} transition probability function p(x,x’,u) = P(xt+1=x’ | xt=x, ut=u) set of all sequences of elements in X control policy (deterministic)  : X*U ut =  (xt, xt-1, xt-2,…, x1) =  (Xt) .1|right 1 2 3 5 6 .7|right … 4 control policy (stochastic)  : UX*[0,1] P(ut= u| Xt=X,  ) =  (u, X)

Markov Decision Processes 2 1 … 16 4 5 xgoal time t  {1, 2, 3,…} state xt X := {1 ,2 ,…, 16} action ut U := {up, down, left, right} transition probability function p(x,x’,u) = P(xt+1=x’ | xt=x, ut=u) set of all sequences of elements in X control policy (deterministic)  : X*U ut =  (xt, xt-1, xt-2,…, x1) =  (Xt) .1|right 1 2 3 5 6 .7|right … 4 (almost surely)  control policy (stochastic)  : UX*[0,1] P(ut= u| Xt=X,  ) =  (u, X)

Markov Decision Processes xgoal 2 1 … 16 4 5 cost J = E[Tgoal | ] (to be minimize) where Tgoal := min {t : xt = xgoal}   .1|right 1 2 3 5 6 .7|right … 4 .1|right

Markov Decision Processes xgoal 2 1 … 16 4 5 cost J = E[Tgoal | ] (to be minimize) where Tgoal := min {t : xt = xgoal}   one can also write (additive) with  0 1 2 3 5 6 .7|right .1|right … 4 Ø 1|- .1|right optimal control policy * J* = min J

Competitive Markov Decision Processes 2 1 … 16 4 5 3 state xtX :={(1,1),(1,2)…(1,16),…(16,16),Ø} player U action ut U :={up, down, left, right} player D action dt D :={up, down, left, right} transition probability function p(x,x’,u,d) = P(xt+1=x’ | xt=x, ut=u , dt=d) in pursuit-evasion games xt = (t, t)  S  S or xt = Ø p((,),(’,’),u,d)=P(t+1=’ |t=,ut=u) ·P(t+1=’ |t=,dt=d) 1 2 3 5 6 4 7 8 9 10 11 12 13 14 15 16 Ø 1|-

Competitive, Hidden MDPs 2 1 … 16 4 5 3 state xtX :={(1,1),(1,2)…(1,16),…(16,16),Ø} player U action ut U :={up, down, left, right} player D action dt D :={up, down, left, right} transition probability function p(x,x’,u,d) = P(xt+1=x’ | xt=x, ut=u , dt=d) player U observation yt Y := {found, not-found, Ø} player D observation zt Z := {found, not-found, Ø} 1 2 3 5 6 4 7 8 9 10 11 12 13 14 15 16 Ø 1|- observation probability functions pU( y,x ) = P( yt=y | xt=x ) pD( z,x ) = P( zt=z | xt=x ) (the state Ø is not “hidden” to any of the players)

Competitive, Hidden MDPs 2 1 … 16 4 5 3 state xtX :={(1,1),(1,2)…(1,16),…(16,16),Ø} player U action ut U :={up, down, left, right} player D action dt D :={up, down, left, right} player U obs. yt Y := {found, not-found, Ø} player D obs. zt Z := {found, not-found, Ø} sequence of t obs. for player U sequence of t-1 actions for player U control policies (stochastic) for player U  : UY*U* [0,1] P(ut= u| Yt=Y, Ut-1=U,  ) =  (u, Y,U) 1 2 3 5 6 4 7 8 9 10 11 12 13 14 15 16 Ø 1|- for player D  : D Z*D* [0,1] P(dt= d| Zt=Z, Dt-1=D,  ) =  (d, Z,D) sequence of t obs. for player D sequence of t-1 actions for player D

Competitive, Hidden MDPs 2 1 … 16 4 5 3 cost (time to capture) e.g.,  0 to be minimized by player U and maximized by player D (zero sum) Stackelberg equilibria: leader (D) chooses a worst case policy * = argmax min J, (max-min) and announces it follower (U) chooses its policy to better counteract the leader’s policy, i.e., * = argmin J ,* (min) 1 2 3 5 6 4 7 8 9 10 11 12 13 14 15 16 Ø 1|- (vice-versa if player U is the leader)

Competitive, Hidden MDPs “worst-case” policy for the leader in that he will never do worse than J* := max min J, the leader has nothing to fear from the follower discovering his strategy the follower’s policy may be “fragile” in that he may do much worse if the leader deviates from the announced policy 2 1 … 16 4 5 3 cost (time to capture) e.g.,  0 to be minimized by player U and maximized by player D (zero sum) Stackelberg equilibria: leader (D) chooses a worst case policy * = argmax min J, (max-min) and announces it follower (U) chooses its policy to better counteract the leader’s policy, i.e., * = argmin J ,* (min) 1 2 3 5 6 4 7 8 9 10 11 12 13 14 15 16 Ø 1|- (vice-versa if player U is the leader)

Competitive, Hidden MDPs 1 cost 1|- paper rock scissors Ø 1|- X := {1, } U := {rock,paper,scissors} D := {rock,paper,scissors} Stackelberg equilibria: leader (D) chooses a worst case policy *(d,1) = 1/3 d (max-min) and announces it follower (U) chooses its policy to better counteract the leader’s policy, i.e., (min) J*,* = 1 but for this choice of * the follower will lose systematically if leader discovers his policy and deviates from *

Competitive, Hidden MDPs not all Stackelberg equilibria were created equal 1 cost 1|- paper rock scissors Ø 1|- X := {1, } U := {rock,paper,scissors} D := {rock,paper,scissors} Stackelberg equilibria: leader (D) chooses a worst case policy *(d,1) = 1/3 d (max-min) and announces it follower (U) chooses its policy to better counteract the leader’s policy, i.e., *(u,1) = 1/3 u (min) J*,* = 1 but for this choice of * the leaders will not benefit from learning the follower’s policy

Competitive, Hidden MDPs 1 cost 1|- paper rock scissors Ø 1|- X := {1, } U := {rock,paper,scissors} D := {rock,paper,scissors} Nash equilibria (or saddle-point): pair of policies (*, *) such that J*,  J*,*  J,* , *(u,1) = 1/3 u *(d,1) = 1/3 d

Competitive, Hidden MDPs “robust” in that any unilateral deviation from the equilibrium will make that player do worse a player playing at a Nash equilibria has nothing to fear from the other discovering his strategy but Nash equilibria don’t always exist 1 cost 1|- paper rock scissors Ø 1|- X := {1, } U := {rock,paper,scissors} D := {rock,paper,scissors} Nash equilibria (or saddle-point): pair of policies (*, *) such that J*,  J*,*  J,* , *(u,1) = 1/3 u *(d,1) = 1/3 d value of the game all Nash equilibria were created equal If (*, *) and (+, +) are Nash eq. then (*, +) and (+, *) are also Nash eq. and J*,* = J+,+ = J*,+ = J+,* = = min max J, = max min J,

Competitive, Hidden MDPs 2 1 … 16 4 5 3 cost (time to capture) e.g.,  0 to be minimized by player U and maximized by player D (zero sum) Stackelberg equilibria: leader (D) chooses a worst case policy * = argmax min J, (max-min) and announces it follower (U) chooses its policy to better counteract the leader’s policy, i.e., * = argmin J ,* (min) 1 2 3 5 6 4 7 8 9 10 11 12 13 14 15 16 Ø 1|-

Competitive, Hidden MDPs 2 1 … 16 4 5 3 follower’s optimal cost-to-go for sequence Y  Y* of  observations and sequence U  U* of -1 actions Bellman’s equation 1 2 3 5 6 4 7 8 9 10 11 12 13 14 15 16 Ø 1|- V(Yt, Ut-1) = 0 a.s., conditioned to xt= Ø uniquely define cost-to-go V optimal follower’s control policy *

Competitive, Hidden MDPs 2 1 … 16 4 5 3 follower’s optimal policy * * = argmin J ,* deterministic (at least one of the optimal) *(u, Y, U) {0,1} non Markov, but *(u, Y, U) = f (u, I(Y,U)) where 1 2 3 5 6 4 7 8 9 10 11 12 13 14 15 16 Ø 1|- information state (size grows with t)

Competitive, Hidden MDPs 2 1 … 16 4 5 3 follower’s optimal policy * * = argmin J ,* follower’s optimal policy is finite dimensional only if * uses finite memory even when * is memoryless, the size of the information state tends to be prohibitively large, e.g., 2N . N (m+1)  2.61069 deterministic (at least one of the optimal) *(u, Y, U) {0,1} non Markov, but *(u, Y, U) = f (u, I(Y,U)) where 1 2 3 5 6 4 7 8 9 10 11 12 13 14 15 16 Ø 1|- number of cells (200 with uniformly distributed obstacles) number of pursuers (3) information state (size grows with t)

Competitive, Hidden MDPs 2 1 … 16 4 5 3 cost e.g.,  0 (time to capture) to be minimized by player U and maximized by player D (zero sum) Stackelberg equilibria: leader (D) * = argmax min J, follower (U) * = argmin J ,* 1 2 3 5 6 4 7 8 9 10 11 12 13 14 15 16 Ø 1|- Nash equilibria: pair of policies (*, *) such that J*,  J*,*  J,* , open research problems

Conclusions The “mission-level” control of Unmanned Air Vehicles requires a probabilistic framework. The problem of coordinating teams of autonomous agents is naturally formulated in a game theoretical setting. Exact solutions for these types of problems are often computationally intractable and, in some cases, open research problems.

Some References Kumar & Varaiya, Stochastic Systems. Prentice Hall, 1986. Fudenberg & Tirole, Game Theory. MIT Press, 1993. Basar & Olsder, Dynamic Noncooperative Game Theory. 2nd ed. SIAM, 1999. Bertsekas, Dynamic Programming and Optimal Control. Athena Scientific, vols. 1 & 2, 1995. Filar & Vrieze, Competitive Markov Decision Processes. Springer-Verlag, 1997. Patek & Bertsekas, Stochastic Shortest Path Games. SIAM J. Control Optim., 37(3):804-824, 1999. see also papers by Michael L. Littman, Junling Hu, Michael P. Wellman on reinforcement learning in Markov games. Intelligent Control Architectures for Unmanned Air Vehicles Home Page: http://robotics.eecs.berkeley.edu/~sastry/ONRhomepage.html Software Enabled Control Home Page: http://sec.eecs.berkeley.edu/