Download presentation
Presentation is loading. Please wait.
1
Berkeley UAV / UGV Testbed
Dr. Shankar Sastry, Chair Electrical Engineering& Computer Sciences University of California, Berkeley
2
Overview Overview of UAV system Selection of vehicle platform
Sensor system Model Identification Hierarchical Control System Low-level vehicle stabilization and control Way-point navigation Further applications
3
UAV Design Procedure Objective Definition Vehicle Selection
What do we need UAV for? What do we want to do with UAV? Vehicle Selection - What functions do we want from UAV? - What size of UAV do we need? What sensor do we need? What kind of computer is good for us? Avionics Selection -Helicopter model identification -Controller Design (classical/modern control) -Building hardware, mounting, enclosure… Control System Implementation Flight Test -Careful experiment planning is mandatory for safety
4
Key points of Successful RUAV Implementation
Payload problem Find a helicopter powerful enough to carry necessary sensors&computers Navigation sensor integration Implement accurate INS/GPS combinatory sensor Helicopter Control Model Identification Obtain high-fidelity control model to design stabilizing low-level controller Hardware/software/vehicle integration
5
Specifications of Berkeley RUAVs
Name Length Height Width Weight Payload Engine Autonomy Kyosho Concept 60 1.4 m 0.47 m 0.39 m 4.5kg airframe 5 kg payload 4.8kg avionics OS FX91 Glow engine 2.8bhp Boeing DQI NovAtel RT-2 MediaGX233 Bergen Industrial Twin 1.5 m 0.7 m 0.3 m 7kg dry weight 10 kg payload Twin Genoa Gas Engine N/A Yamaha R-50 3.58 m 1.08 m 44kg airframe 20 kg payload 10kg avionics Water-cooled 2 stroke 1 cylinder gas engine 98cc, 12ps Pentium233 2 ultrasonic altimeter Vision processor RMAX 3.63 m 0.72 m 58 kg dry weight 30 kg payload 15kg avionics Water cooled 2 cylinder Gas engine 256cc, 21ps Dual Flight Computer Dynamic Network Router Digital compass
6
Berkeley BEAR Fleet: Ursa Minor3 (1999- )
Boeing DQI-NP on gel mounting GPS Card GPS Antenna Wireless Modem Length: 1.4m Width: 0.39m Height: 0.47m Weight: 9.4 kg Engine Output: 2.8 bhp Rotor Diameter: 1.5m Flight time: 15 min System operation time: 30 min Navigation computer Radio Receiver
7
Bergen with shock-absorbing landing gear
Pneumatic-operating shock-absorbing landing gear Length: 1.5m Width:0.3m Height: 0.7m Dry Weight: 8 kg Payload: 10kg
8
Berkeley BEAR Fleet: Ursa Magna2 (1999- )
Based on Yamaha R-50 industrial helicopter Camera GPS Antenna Wavelan Antenna Ultrasonic Height meter Integrated Nav/Comm Module Length: 3.5m Width:0.7m Height: 1.08m Dry Weight: 44 kg Payload: 20kg Engine Output: 12 hp Rotor Diameter: 3.070m Flight time: 60 min System operation time: 60 min Boeing DQI-NP on fluid mounting
9
Berkeley BEAR Fleet: Ursa Maxima 1 (2000- )
Based on Yamaha RMAX industrial helicopter Integrated Nav/Comm Module Length: 3.63m Width:0.72m Height: 1.08m Dry Weight: 58 kg Payload: 30kg Engine Output: 21 hp Rotor Diameter: 3.115m Flight & system operation time: 60 min
10
Flight Control System signal flow
CHANNEL SELECTION, TAKE-OVER DECISION CONTROL MODE NAV SENSOR SUITE Flight Data Human pilot Control input YAMAHA RECEIVER PWM READING CTC #2,8,9,10 FEEDFORWARD/ FEEDBACK CONTROL PWM GENERATION CTC# 3,4,5,6,7 PWM CH1-5 PWM CH1-5 PWM CH1-5 PWM CH1-5 PWM DRIVER OPTO- ISOLATOR MECHANICAL RELAY ARRAY PWM CH1-5 PWM CH1-5 PWM CH1-5 Full manual mode YACS (YAMAHA ATTITUDE CONTROL SYSTEM) SERVO x5 PWM CH1-5
11
Constructed by Hoam Chung, David Shim, September 2000
Navigation Hardware: Ursa Maxima 1 NOVATELGPS RT-2 SECONDARY NAV COMP Win98 PC104 K6-400 Digital Compass ROUTER FreeBSD MediaGX233 BATT BATT BATT BATT BOEING DQI-NP DC/DC Converter (for DQI-NP;24V) NAV COMP QNX PC104 K6-400 Lucent WaveLAN Ethernet Hub Constructed by Hoam Chung, David Shim, September 2000
12
Navigation Software: DQI-NP-Based
4±1Hz 10Hz Nav Data to Vision computer @10Hz PERIODIC VCOMM ULREAD Ultrasonic APERIODIC Relative Altitude Nav data Processes running on QNX Control output at 50Hz DQICONT PERIODIC 100Hz INS Update Boeing DQI-NP Flight Status Command RX values Yamaha Receiver (using HW INT & proxy) Ground Station DGPS measurement RS-232 Shared Memory Radio link 5Hz DQIGPS ANYTIME PERIODIC NovAtel GPS RT-2 GPS Update Ground computer Win 98
13
Wireless Communication
Ground Monitoring System Landing Decks Ground Mobile Robots UAVs Lucent Orinoco (WaveLAN) (Ad Hoc Mode) DGPS Correction Broadcast via WaveLAN or Wireless Modem
14
Hierarchy of the UAVS Management System
Helicopter Platform Regulation Trajectory Generator Tactical Planner Strategic Planner Detector Discrete Event System Continuous sensory information tracking errors flight modes Control Law y_d replan control points conflict notification detect Regulation
15
Flight Control System Experiments
Position+Heading Lock (Dec 1999) Landing scenario with SAS (Dec 1999) Attitude control with mu-syn (July 2000) Position+Heading Lock (May 2000)
16
Hierarchy of the UAVS Management System
Helicopter Platform Regulation Trajectory Generator Tactical Planner Strategic Planner Detector Discrete Event System Continuous sensory information tracking errors flight modes Control Law y_d replan control points conflict notification detect Trajectory Generator
17
Vehicle Control Language
Objective Develop an abstract and unified UAV mission control language environment Features Mission-independent Executed as batch or interactive mode Seamlessly integrated with existing hierarchy Can be integrated with graphic interface via automatic code generator
18
Flight control synthesis: way-point navigation
Helicopter Mode transition Sideslip Pirouette Bank-to-turn Take-off Hover Forward Flight Land Ascend/ Descend
19
Hierarchy of the UAVS Management System
Helicopter Platform Regulation Trajectory Generator Tactical Planner Strategic Planner Detector Discrete Event System Continuous sensory information tracking errors flight modes Control Law y_d replan control points conflict notification detect Helicopter Platform
20
VCL Execution Module Structure
PWM CH1-5 VCL INTERPRETER BATCH or INTERACTIVE MODE VEHICLE REFERENCE COMMANDS GROUND STATION CURRENT FLIGHT FEASIBILITY DECISION REFERENCE INPUT GENERATION FEEDFORWARD/ FEEDBACK CONTROL SUITE NAV SENSOR SUITE NAV DATA
21
Waypoint Navigation using VCL (Aug 1, 2000)
22
Vision Based Motion Estimation for UAV Landing
Cory Sharp, Omid Shakernia Department of EECS University of California at Berkeley
23
Outline Motivation Vision based ego-motion estimation
Evaluation of motion estimates Vision system hardware/software Landing target design/tracking Active camera control Flight videos
24
Goal: Autonomous UAV landing on a ship’s flight deck
Motivation Goal: Autonomous UAV landing on a ship’s flight deck Challenges Hostile operating environments high winds, pitching flight deck, ground effect UAV undergoing changing nonlinear dynamics Why the vision sensor? Passive sensor (for stealth) Gives relative UAV motion to flight deck U.S. Navy photo
25
Objective for Vision Based Landing
26
Vision/Navigation System coordination
UAV Controller Monitoring Station Wireless Ethernet UAV Controller Transmit Receive Motion Estimates Vision System Monitoring Station Vision Computer Feature Tracker PTZ Camera control Image Features Video RCV Frame- grabber 2.4 GHz Video TX Image & Features
27
Vehicle Control Language
Vision in Control Loop Camera Pan/Tilt Control Feature Point Correspondence Motion Estimation Image Processing, Corner Finding Helicopter State RS-232 Control Strategy Vehicle Control Language Navigation Computer Vision Computer
28
Vision System Hardware
Ampro embedded PC Little Board P5/x Low power Pentium 233MHz, running LINUX 440 MB flashdisk HD, robust to body vibration Runs motion estimation algorithm Controls PTZ camera Motion estimation algorithms Written and optimized in C++ using LAPACK Give motion estimates at 30 Hz
29
Pan/Tilt Camera Control
Feature tracking issues: Leave the field of view Pan/tilt increases the range of motion of the UAV Pan/tilt control drive all feature points to the center of the image
30
Flight Video
31
Pitching Landing Deck Landing deck to simulate motion of a ship at sea
6 electrically actuated cylindrical shafts Motion Parameters: sea state (freq, amp of waves) ship speed direction into waves Stiffened Aluminum construction Dimensions: 8’ x 6’
32
Moving Landing Pad
33
Landing on Deck
34
Probabilistic Pursuit-Evasion Games with UGVs and UAVs
René Vidal C. Sharp, D. Shim, O. Shakernia, J. Hespanha, J. Kim, S. Rashid, S. Sastry University of California at Berkeley 04/05/01
35
Outline Introduction Pursuit Evasion Games
Map Building Pursuit Policies Hierarchical Control Architecture Strategic Planner, Tactical Planner, Regulation, Sensing, Control System, Agent and Communication Architectures Architecture Implementation Tactical Layer: UGVs, UAVs, Hardware, Software, Sensor Fusion Strategic Layer: Map Building, Pursuit Policies, Visual Interface Experimental Results Evaluation of Pursuit Policies Pursuit Evasion Games with UGV’s and UAV’s Conclusions and Current Research
36
Introduction: The Pursuit-Evasion Scenario
Evade!
37
Introduction: Theoretical Issues
Probabilistic map building Coordinated multi-agent operation Networking and intelligent data sharing Path planning Identification of vehicle dynamics and control Sensor integration Vision system
38
Pursuit-Evasion Games
Consider approach in Hespanha, Kim and Sastry Multiple pursuers catching one single evader Pursuers can only move to adjacent empty cells Pursuers have perfect knowledge of current location Sensor model: false positives (p) and negatives (q) for evader detection Evader moves randomly to adjacent cells Extensions in Rashid and Kim Multiple evaders: each one is recognized individually Supervisory agents: can “fly” over obstacles and evaders, cannot capture Sensor model for obstacle detection as well
39
Map Building: Map of Obstacles
Sensor model: p = prob of false positive q = prob of false negative For a map, M, If sensor makes positive reading: M (x,y,t) = (1-q)*M(x,y,t-1)/((1-q)*M(x,y,t-1)+p*(1-M(x,y,t)) If sensor make negative reading: M (x,y,t) = q*M(x,y,t-1)/(q*M(x,y,t-1)+(1-p)*(1-M(x,y,t))
40
Map Building: Map of Evaders
At each t, 1. Measurement step + y(t) ={v(t),e(t),o(t)} model for sensor 2. Prediction step model for evader’s motion
41
Pursuit Policies Greedy Policy Global-Max Policy
Pursuer moves to the adjacent cell with the highest probability of having an evader over all maps Strategic planner assigns more importance to local measurements Global-Max Policy Pursuer moves towards the place with the highest probability of having an evader in the map May not take advantage of multiple pursuers (may move to the same place)
42
Pursuit Policies Theorem 1 (Hespanha, Kim, Sastry):
For a greedy policy, The probability of the capture time being finite is equal to one The expected value of the capture time is finite Theorem 2 (Hespanha, Kim, Sastry): For a stay-in-place policy, The expected capture time increases as the speed of the evader decreases If the speed of the evader is zero, then the probability of the capture time being finite is less than one.
43
Hierarchical System Architecture
position of evader(s) position of obstacles strategy planner position of pursuers map builder communications network evaders detected obstacles pursuers positions Desired pursuers positions tactical planner trajectory regulation tactical planner & regulation actuator positions [4n] lin. accel. & ang. vel. [6n] inertial [3n] height over terrain [n] obstacles detected evaders detected vehicle-level sensor fusion state of helicopter & height over terrain obstacles detected control signals [4n] agent dynamics actuator encoders INS GPS ultrasonic altimeter vision Exogenous disturbances terrain evader
44
Agent Architecture Segments the control of each agent into different layers of abstraction The same high-level control strategies can be applied to all agents Strategic Planner Mission planning, high level control, communication Tactical Planner Trajectory planning, Obstacle Avoidance, Regulation Regulation Low level control and sensing
45
Communication Architecture
Map building and Strategic Planner can be Centralized: one agent will receive sensor information, build and broadcast the map Decentralized: each agent build its own map and shares its readings with the rest of the team Communication network can be Perfect: no packet loss, no transmission time, no network delay. Here all pursuers have identical map Imperfect: each agent will update its map and make decisions with the information available to it
46
Architecture Implementation: Part I
Common Platform for UGV and UGV On board computer: Tactical Planner and Sensor Fusion GPS: Positioning Vision system: Obstacle and Evader Detection Wavelan and Ayllu: Communication Specific UGV Platform Pioneer Robot: sonars, dead-reckoning, compass Micro-controller: regulation and low-level control Saphira or Ayllu: Tactical Planning Specific UAV Platform Yamaha R-50: INS, ultrasonic sensors, inertial sensors, compass Navigation Computer: regulation and low-level control David Shim Control System
47
Vision System: PTZ & ACTS
Hardware Onboard Computer: Linux Sony pan/tilt/zoom camera PXC200 frame grabber Camera Control Software in Linux Send PTZ Commands Receive Camera State ACTS System Captures and processes video 32 color channels 10 blobs per channel Extract color information and sends it to a TCP socket Number of blobs, Size and position of each blob
48
Visual based position estimation
Motion Model Image Model Camera position and orientation Helicopter orientation relative to ground Camera orientation relative to helicopter Camera calibration Width, height, zoom Robot position estimate
49
Communication Hardware Network is setup in ad-hoc mode
Lucent Wavelan wireless card (11Mbs) Network is setup in ad-hoc mode TCP/IP sockets Ayllu TBTRF: SRI mobile routing scheme Set of behaviors for distributed control of multiple mobile robots Messages can be passed among behaviors The output of a behavior can be connected to a local or remote input of another behavior
50
Pioneer Ground Robots Hardware Sensors Communication
Micro controller: motion control Onboard computer: communication, video processing, camera control Sensors Sonars: obstacle avoidance, map building GPS & compass: positioning Video camera: map building, navigation, tracking Communication Serial Wave-LAN: communication between robots and base station Radio modem: GPS communication
51
Yamaha Aerial Robots Yamaha R-50 helicopter Navigation Computer
Pentium 233, running QNX Low Level Control - Sensing GPS INS UAV controller David Shim Controller Vehicle Control Language Vision Computer Serial communication to receive state of the helicopter We do not send commands yet
52
Architecture Implementation: Part II
Strategic Planner Navigation Computer Serial Vision Computer Helicopter Control GPS: Position INS: Orientation Camera Control Color Tracking UGV Position Estimation Communication Map Building Pursuit Policies Communication Runs in Simulink Same for Simulation and Experiments UAV Pursuer TCP/IP Serial Robot Micro Controller Robot Computer UGV Pursuer UGV Evader Robot Control DeadReck: Position Compass: Heading Camera Control Color Tracking GPS: Position Communication
53
Pursuit-Evasion Game Experiment using Simulink
PEG with four UGVs Global-Max pursuit policy Simulated camera view (radius 7.5m with 50degree conic view) Pursuer=0.3m/s Evader=0.1m/s MAX
54
Experimental Results: Evaluation of Policies
55
Experimental Results: Evaluation of Policies
56
Experimental Results: Pursuit Evasion Games with 1UAV and 2 UGVs (Summer’ 00)
57
Experimental Results: Pursuit Evasion Games with 4 UGVs and 1 UAV (Spring’ 01)
58
Experimental Results: Pursuit Evasion Games with 4UGVs and 1 UAV (Spring’ 01)
59
Conclusions and Current Research
The proposed architecture has been successfully applied to the control of multiple agents for the pursuit evasion scenario Experimental results confirm theoretical results Global-max outperforms greedy in a real scenario and is robust to changes in evader motion What’s missing: Vision computer controlling the helicopter Current Research Collision Avoidance and UAV Path Planning Montecarlo based learning of Pursuit Policies Communication
60
Maria Prandini Univ. of Brescia & UC Berkeley
A Probabilistic Framework for Pursuit-Evasion Games with Unmanned Air Vehicles Maria Prandini Univ. of Brescia & UC Berkeley In collaboration with J. Hespanha, J. Kim, and S. Sastry
61
Key Ideas The “mission-level” control of Unmanned Air Vehicles requires a probabilistic framework. The problem of coordinating teams of autonomous agents is naturally formulated in a game theoretical setting. Exact solutions for these types of problems are often computationally intractable and, in some cases, open research problems.
62
The “rules” of the game obstacles UAVs evader
63
The “rules” of the game Terrain: with fixed obstacles obstacles
not accurately mapped obstacles evader UAVs UAVs (pursuers) capable of: flying between obstacles seeing a region around them (limited by the occlusions) Evader capable of: moving between obstacles (possibly actively avoiding detection) Objective: find the evader in minimum time
64
Scenarios obstacles search and rescue operations UAVs person in danger
65
Scenarios obstacles search and rescue operations
finding parts in a warehouse part
66
Scenarios obstacles search and rescue operations
finding parts in a warehouse search and capture operations UCAVs enemy
67
Scenarios obstacles search and rescue operations
finding parts in a warehouse search and capture operations monitoring environmental threats UCAVs fire
68
Strategies for pursuit-evasion games
LaValle, Latombe, Guibas, et al. considered a similar problem but assume the map of the region is known, the pursuers have perfect sensors, and worst case trajectories for the evader: How many UAVs are needed to win the game in finite time? ? 1 agent is sufficient 2 agents are needed (no matter what strategy a single pursuer chooses, there is a trajectory for the evader that avoids detection)
69
Exploring a region to build a map
Deng, Papadimitriou, et al., study the problem of building a map (seeing all points in the region) traversing the smallest possible distance. standard “keep wall to the right” algorithm algorithm that takes better advantage of the cameras capabilities
70
A two step solution… exploration followed by pursuit is not efficient
sensors are imprecise worst case assumptions the trajectories of the evaders leads to very conservative results exploration ? pursuit
71
A different approach… Use a probabilistic framework to combine exploration and pursuit-evasion games. exploration Non determinism comes from: poorly mapped terrain noise and uncertainty in the sensors probabilistic models for the motion of the evader and the UAVs ? pursuit
72
Markov Decision Processes
2 1 3 … 16 4 5 time t {1, 2, 3,…} state xt X := {1 ,2 ,…, 16} action ut U := {up, down, left, right} transition probability function p(x,x’,u) = P(xt+1=x’ | xt=x, ut=u) .1|right 1 2 3 5 6 .7|right … 4
73
Markov Decision Processes
2 1 … 16 4 5 xgoal time t {1, 2, 3,…} state xt X := {1 ,2 ,…, 16} action ut U := {up, down, left, right} transition probability function p(x,x’,u) = P(xt+1=x’ | xt=x, ut=u) set of all sequences of elements in X control policy (deterministic) : X*U ut = (xt, xt-1, xt-2,…, x1) = (Xt) .1|right 1 2 3 5 6 .7|right … 4 control policy (stochastic) : UX*[0,1] P(ut= u| Xt=X, ) = (u, X)
74
Markov Decision Processes
2 1 … 16 4 5 xgoal time t {1, 2, 3,…} state xt X := {1 ,2 ,…, 16} action ut U := {up, down, left, right} transition probability function p(x,x’,u) = P(xt+1=x’ | xt=x, ut=u) set of all sequences of elements in X control policy (deterministic) : X*U ut = (xt, xt-1, xt-2,…, x1) = (Xt) .1|right 1 2 3 5 6 .7|right … 4 (almost surely) control policy (stochastic) : UX*[0,1] P(ut= u| Xt=X, ) = (u, X)
75
Markov Decision Processes
xgoal 2 1 … 16 4 5 cost J = E[Tgoal | ] (to be minimize) where Tgoal := min {t : xt = xgoal} .1|right 1 2 3 5 6 .7|right … 4 .1|right
76
Markov Decision Processes
xgoal 2 1 … 16 4 5 cost J = E[Tgoal | ] (to be minimize) where Tgoal := min {t : xt = xgoal} one can also write (additive) with 0 1 2 3 5 6 .7|right .1|right … 4 Ø 1|- .1|right optimal control policy * J* = min J
77
Competitive Markov Decision Processes
2 1 … 16 4 5 3 state xtX :={(1,1),(1,2)…(1,16),…(16,16),Ø} player U action ut U :={up, down, left, right} player D action dt D :={up, down, left, right} transition probability function p(x,x’,u,d) = P(xt+1=x’ | xt=x, ut=u , dt=d) in pursuit-evasion games xt = (t, t) S S or xt = Ø p((,),(’,’),u,d)=P(t+1=’ |t=,ut=u) ·P(t+1=’ |t=,dt=d) 1 2 3 5 6 4 7 8 9 10 11 12 13 14 15 16 Ø 1|-
78
Competitive, Hidden MDPs
2 1 … 16 4 5 3 state xtX :={(1,1),(1,2)…(1,16),…(16,16),Ø} player U action ut U :={up, down, left, right} player D action dt D :={up, down, left, right} transition probability function p(x,x’,u,d) = P(xt+1=x’ | xt=x, ut=u , dt=d) player U observation yt Y := {found, not-found, Ø} player D observation zt Z := {found, not-found, Ø} 1 2 3 5 6 4 7 8 9 10 11 12 13 14 15 16 Ø 1|- observation probability functions pU( y,x ) = P( yt=y | xt=x ) pD( z,x ) = P( zt=z | xt=x ) (the state Ø is not “hidden” to any of the players)
79
Competitive, Hidden MDPs
2 1 … 16 4 5 3 state xtX :={(1,1),(1,2)…(1,16),…(16,16),Ø} player U action ut U :={up, down, left, right} player D action dt D :={up, down, left, right} player U obs. yt Y := {found, not-found, Ø} player D obs. zt Z := {found, not-found, Ø} sequence of t obs. for player U sequence of t-1 actions for player U control policies (stochastic) for player U : UY*U* [0,1] P(ut= u| Yt=Y, Ut-1=U, ) = (u, Y,U) 1 2 3 5 6 4 7 8 9 10 11 12 13 14 15 16 Ø 1|- for player D : D Z*D* [0,1] P(dt= d| Zt=Z, Dt-1=D, ) = (d, Z,D) sequence of t obs. for player D sequence of t-1 actions for player D
80
Competitive, Hidden MDPs
2 1 … 16 4 5 3 cost (time to capture) e.g., 0 to be minimized by player U and maximized by player D (zero sum) Stackelberg equilibria: leader (D) chooses a worst case policy * = argmax min J, (max-min) and announces it follower (U) chooses its policy to better counteract the leader’s policy, i.e., * = argmin J ,* (min) 1 2 3 5 6 4 7 8 9 10 11 12 13 14 15 16 Ø 1|- (vice-versa if player U is the leader)
81
Competitive, Hidden MDPs
“worst-case” policy for the leader in that he will never do worse than J* := max min J, the leader has nothing to fear from the follower discovering his strategy the follower’s policy may be “fragile” in that he may do much worse if the leader deviates from the announced policy 2 1 … 16 4 5 3 cost (time to capture) e.g., 0 to be minimized by player U and maximized by player D (zero sum) Stackelberg equilibria: leader (D) chooses a worst case policy * = argmax min J, (max-min) and announces it follower (U) chooses its policy to better counteract the leader’s policy, i.e., * = argmin J ,* (min) 1 2 3 5 6 4 7 8 9 10 11 12 13 14 15 16 Ø 1|- (vice-versa if player U is the leader)
82
Competitive, Hidden MDPs
1 cost 1|- paper rock scissors Ø 1|- X := {1, } U := {rock,paper,scissors} D := {rock,paper,scissors} Stackelberg equilibria: leader (D) chooses a worst case policy *(d,1) = 1/3 d (max-min) and announces it follower (U) chooses its policy to better counteract the leader’s policy, i.e., (min) J*,* = 1 but for this choice of * the follower will lose systematically if leader discovers his policy and deviates from *
83
Competitive, Hidden MDPs
not all Stackelberg equilibria were created equal 1 cost 1|- paper rock scissors Ø 1|- X := {1, } U := {rock,paper,scissors} D := {rock,paper,scissors} Stackelberg equilibria: leader (D) chooses a worst case policy *(d,1) = 1/3 d (max-min) and announces it follower (U) chooses its policy to better counteract the leader’s policy, i.e., *(u,1) = 1/3 u (min) J*,* = 1 but for this choice of * the leaders will not benefit from learning the follower’s policy
84
Competitive, Hidden MDPs
1 cost 1|- paper rock scissors Ø 1|- X := {1, } U := {rock,paper,scissors} D := {rock,paper,scissors} Nash equilibria (or saddle-point): pair of policies (*, *) such that J*, J*,* J,* , *(u,1) = 1/3 u *(d,1) = 1/3 d
85
Competitive, Hidden MDPs
“robust” in that any unilateral deviation from the equilibrium will make that player do worse a player playing at a Nash equilibria has nothing to fear from the other discovering his strategy but Nash equilibria don’t always exist 1 cost 1|- paper rock scissors Ø 1|- X := {1, } U := {rock,paper,scissors} D := {rock,paper,scissors} Nash equilibria (or saddle-point): pair of policies (*, *) such that J*, J*,* J,* , *(u,1) = 1/3 u *(d,1) = 1/3 d value of the game all Nash equilibria were created equal If (*, *) and (+, +) are Nash eq. then (*, +) and (+, *) are also Nash eq. and J*,* = J+,+ = J*,+ = J+,* = = min max J, = max min J,
86
Competitive, Hidden MDPs
2 1 … 16 4 5 3 cost (time to capture) e.g., 0 to be minimized by player U and maximized by player D (zero sum) Stackelberg equilibria: leader (D) chooses a worst case policy * = argmax min J, (max-min) and announces it follower (U) chooses its policy to better counteract the leader’s policy, i.e., * = argmin J ,* (min) 1 2 3 5 6 4 7 8 9 10 11 12 13 14 15 16 Ø 1|-
87
Competitive, Hidden MDPs
2 1 … 16 4 5 3 follower’s optimal cost-to-go for sequence Y Y* of observations and sequence U U* of -1 actions Bellman’s equation 1 2 3 5 6 4 7 8 9 10 11 12 13 14 15 16 Ø 1|- V(Yt, Ut-1) = 0 a.s., conditioned to xt= Ø uniquely define cost-to-go V optimal follower’s control policy *
88
Competitive, Hidden MDPs
2 1 … 16 4 5 3 follower’s optimal policy * * = argmin J ,* deterministic (at least one of the optimal) *(u, Y, U) {0,1} non Markov, but *(u, Y, U) = f (u, I(Y,U)) where 1 2 3 5 6 4 7 8 9 10 11 12 13 14 15 16 Ø 1|- information state (size grows with t)
89
Competitive, Hidden MDPs
2 1 … 16 4 5 3 follower’s optimal policy * * = argmin J ,* follower’s optimal policy is finite dimensional only if * uses finite memory even when * is memoryless, the size of the information state tends to be prohibitively large, e.g., 2N . N (m+1) 2.61069 deterministic (at least one of the optimal) *(u, Y, U) {0,1} non Markov, but *(u, Y, U) = f (u, I(Y,U)) where 1 2 3 5 6 4 7 8 9 10 11 12 13 14 15 16 Ø 1|- number of cells (200 with uniformly distributed obstacles) number of pursuers (3) information state (size grows with t)
90
Competitive, Hidden MDPs
2 1 … 16 4 5 3 cost e.g., 0 (time to capture) to be minimized by player U and maximized by player D (zero sum) Stackelberg equilibria: leader (D) * = argmax min J, follower (U) * = argmin J ,* 1 2 3 5 6 4 7 8 9 10 11 12 13 14 15 16 Ø 1|- Nash equilibria: pair of policies (*, *) such that J*, J*,* J,* , open research problems
91
Conclusions The “mission-level” control of Unmanned Air Vehicles requires a probabilistic framework. The problem of coordinating teams of autonomous agents is naturally formulated in a game theoretical setting. Exact solutions for these types of problems are often computationally intractable and, in some cases, open research problems.
92
Some References Kumar & Varaiya, Stochastic Systems. Prentice Hall, 1986. Fudenberg & Tirole, Game Theory. MIT Press, 1993. Basar & Olsder, Dynamic Noncooperative Game Theory. 2nd ed. SIAM, 1999. Bertsekas, Dynamic Programming and Optimal Control. Athena Scientific, vols. 1 & 2, 1995. Filar & Vrieze, Competitive Markov Decision Processes. Springer-Verlag, 1997. Patek & Bertsekas, Stochastic Shortest Path Games. SIAM J. Control Optim., 37(3): , 1999. see also papers by Michael L. Littman, Junling Hu, Michael P. Wellman on reinforcement learning in Markov games. Intelligent Control Architectures for Unmanned Air Vehicles Home Page: Software Enabled Control Home Page:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.