Download presentation
Presentation is loading. Please wait.
Published byKyla Burris Modified over 9 years ago
1
1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California paruchur@usc.edu
2
2 University of Southern California Motivation: The Prediction Game Police vehicle Patrols 4 regions Can you predict the patrol pattern ? Pattern 1 Pattern 2 Randomization decreases Predictability Increases Security Region 1Region 2 Region 3Region 4
3
3 University of Southern California Domains Police patrolling groups of houses Scheduled activities at airports like security check, refueling etc Adversary monitors activities Randomized policies
4
4 University of Southern California Problem Definition Problem : Security for agents in uncertain adversarial domains Assumptions for Agent/agent-team: Variable information about adversary –Adversary cannot be modeled (Part 1) n Action/payoff structure unavailable –Adversary is partially modeled (Part 2) n Probability distribution over adversaries Assumptions for Adversary: Knows agents plan/policy Exploits the action predictability
5
5 University of Southern California Outline Security via Randomization No Adversary ModelPartial Adversary Model Randomization + Quality Constraints MDP/Dec-POMDP Mixed strategies: Bayesian Stackelberg Games Contributions: New, Efficient Algorithms
6
6 University of Southern California No Adversary Model: Solution Technique Intentional policy randomization for security Information Minimization Game MDP/POMDP: Sequential decision making under uncertainty –POMDP Partially Observable Markov Decision Process Maintain Quality Constraints Resource constraints (Time, Fuel etc) Frequency constraints (Likelihood of crime, Property Value)
7
7 University of Southern California Randomization with quality constraints Fuel used < Threshold
8
8 University of Southern California No Adversary Model: Contributions Two main contributions Single Agent Case: –Nonlinear program: Entropy based metric n Hard to solve (Exponential) –Convert to Linear Program: BRLP (Binary search for randomization) Multi Agent Case: RDR (Rolling Down Randomization) –Randomized policies for decentralized POMDPs
9
9 University of Southern California MDP based single agent case MDP is tuple S – Set of states A – Set of actions P – Transition function R – Reward function Basic terms used : x(s,a) : Expected times action a is taken in state s Policy (as function of MDP flows) :
10
10 University of Southern California Entropy : Measure of randomness Randomness or information content quantified using Entropy ( Shannon 1948 ) Entropy for MDP - Additive Entropy – Add entropies of each state Weighted Entropy – Weigh each state by it contribution to total flow
11
11 University of Southern California Randomized Policy Generation Non-linear Program: Max entropy, Reward above threshold Exponential Algorithm Linearize: Obtain Poly-time Algorithm BRLP (Binary Search for Randomization LP) Entropy as function of flows
12
12 University of Southern California BRLP: Efficient Randomized Policy Inputs: and target reward can be any high entropy policy (uniform policy) LP for BRLP Entropy control with
13
13 University of Southern California BRLP in Action = 1 - Max entropy = 0 Deterministic Max Reward Target Reward Beta =.5 Increasing scale of
14
14 University of Southern California Results (Averaged over 10 MDPs) For a given reward threshold, Highest entropy : Weighted Entropy : 10% avg gain over BRLP Fastest : BRLP : 7 fold average speedup over Expected Entropy
15
15 University of Southern California Multi Agent Case: Problem Maximize entropy for agent teams subject to reward threshold For agent team: Decentralized POMDP framework No communication between agents For adversary: Knows the agents policy Exploits the action predictability
16
16 University of Southern California Policy trees : Deterministic vs Randomized A1 A2 O1 O2 O1 O2 O1 A1A2 A1A2 A1A2 O1 O2 O1 O2 Deterministic Policy Tree Randomized Policy Tree
17
17 University of Southern California RDR : Rolling Down Randomization Input : Best ( local or global ) deterministic policy Percent of reward loss d parameter – Number of turns each agent gets –Ex: d =.5 => Number of steps = 1/d = 2 –Each agent gets one turn ( for 2 agent case ) –Single agent MDP problem at each step
18
18 University of Southern California RDR : d =.5 M = Max Reward 80% of M Agent 1 Fix Agent 2’s policy Maximize joint entropy Joint Reward > 90% 90% of M Agent 2 Fix Agent 1’s policy Maximize joint entropy Joint reward > 80%
19
19 University of Southern California RDR Details To derive single agent MDP: New Transition, Observation and Belief Update rules needed Original Belief Update Rule – New Belief Update Rule –
20
20 University of Southern California Experimental Results : Reward Threshold vs Weighted Entropy ( Averaged 10 instances )
21
21 University of Southern California Security with Partial Adversary Modeled Police agent patrolling a region. Many adversaries (robbers) Different motivations, different times and places Model (Action & Payoff) of each adversary known Probability distribution known over adversaries Modeled as Bayesian Stackelberg game
22
22 University of Southern California Bayesian Game It contains: Set of agents: N (Police and robbers) A set of types θm (Police and robber types) Set of strategies σi for each agent i Probability distribution over types Пj: θj [0,1] Utility function: Ui : θ1 * θ2 * σ1 * σ2 R
23
23 University of Southern California Stackelberg Game Agent as leader Commits to strategy first: Patrol policy Adversaries as followers Optimize against leaders fixed strategy –Observe patrol patterns to leverage information Nash Equilibrium: : [2,1] Leader commits to uniform random strategy {.5,.5} Follower plays b: [3.5,1] ab a2,14,0 b1,03,2 Agent Adversary
24
24 University of Southern California Previous work: Conitzer, Sandholm AAAI’05, EC’06 MIP-Nash (AAAI’05): Efficient best Nash procedure Multiple LPs Method (EC’06): Given normal form game Finds optimal leader strategy to commit to Bayesian to Normal Form Game Harsanyi Transformation: Exponential adversary strategies NP-hard For every joint pure strategy j of adversary: (R, C: Agent, Adversary)
25
25 University of Southern California Bayesian Stackelberg Game: Approach Two Approaches: 1. Heuristic solution ASAP: Agent Security via Approximate Policies 2. Exact Solution DOBSS: Decomposed Optimal Bayesian Stackelberg Solver Exponential savings –No Harsanyi Transformation –No exponential # of LP’s n One MILP program (Mixed Integer Linear Program)
26
26 University of Southern California ASAP vs DOBSS ASAP: Heuristic Control probability of strategy –Discrete probability space Generates k-uniform policies –k = 3 => Probability = {0, 1/3, 2/3, 1} –Simple and easy to implement DOBSS: Exact Modify ASAP Algorithm Discrete to continuous probability space Focus of rest of talk
27
27 University of Southern California DOBSS Details Previous work: Fix adversary (joint) pure strategy Solve LP to find best agent strategy My approach: For each agent mixed strategy Find adversary best response Advantages: Decomposition technique –Given agent strategy n Each adversary can find Best-response independently Mathematical technique obtains single MILP
28
28 University of Southern California Obtaining MILP Decomposing Substitute
29
29 University of Southern California Experiments: Domain Patrolling Domain: Security agent and robber Security agent patrols houses Ex: Visit house a Observe house and its neighbor Plan for patrol length 2 6 or 12 strategies : 3 or 4 houses Robbers can attack any house 3 possible choices for 3 houses Reward dependent on house and agent position Joint space of robbers exponential – strategies: 3 houses, 10 robbers
30
30 University of Southern California Sample Patrolling Domain: 3 & 4 houses 3 houses LPs: 7 followers DOBSS: 20 4 houses LP’s: 6 followers DOBSS: 12
31
31 University of Southern California Conclusion Agent cannot model adversary Intentional randomization algorithms for MDP/Dec-POMDP Agent has partial model of adversary Efficient MILP solution for Bayesian Stackelberg games
32
32 University of Southern California Vision Incorporating machine learning Dynamic environments Resource constrained agents Constraints might be unknown in advance Developing real world applications Police patrolling, Airport security
33
33 University of Southern California Thank You Any comments/questions ?
34
34 University of Southern California
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.