When Security Games Go Green

Slides:



Advertisements
Similar presentations
Market Institutions: Oligopoly
Advertisements

Markov Decision Process
1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.
Markov Game Analysis for Attack and Defense of Power Networks Chris Y. T. Ma, David K. Y. Yau, Xin Lou, and Nageswara S. V. Rao.
Fast Bayesian Matching Pursuit Presenter: Changchun Zhang ECE / CMR Tennessee Technological University November 12, 2010 Reading Group (Authors: Philip.
Patch to the Future: Unsupervised Visual Prediction
An Introduction to Markov Decision Processes Sarah Hickmott
Planning under Uncertainty
1 On the Agenda(s) of Research on Multi-Agent Learning by Yoav Shoham and Rob Powers and Trond Grenager Learning against opponents with bounded memory.
Commitment without Regrets: Online Learning in Stackelberg Security Games Nika Haghtalab Carnegie Mellon University Joint work with Maria-Florina Balcan,
MAKING COMPLEX DEClSlONS
When Security Games Go Green: Designing Defender Strategies to Prevent Poaching and Illegal Fishing Fei Fang 1, Peter Stone 2, Milind Tambe 1 University.
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.
1 Demand for Repeated Insurance Contracts with Unknown Loss Probability Emilio Venezian Venezian Associates Chwen-Chi Liu Feng Chia University Chu-Shiu.
1 - CS7701 – Fall 2004 Review of: Detecting Network Intrusions via Sampling: A Game Theoretic Approach Paper by: – Murali Kodialam (Bell Labs) – T.V. Lakshman.
Human and Optimal Exploration and Exploitation in Bandit Problems Department of Cognitive Sciences, University of California. A Bayesian analysis of human.
Unique Games Approximation Amit Weinstein Complexity Seminar, Fall 2006 Based on: “Near Optimal Algorithms for Unique Games" by M. Charikar, K. Makarychev,
Sampling Design and Analysis MTH 494 Lecture-21 Ossam Chohan Assistant Professor CIIT Abbottabad.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Confidential and Proprietary Business Information. For Internal Use Only. Statistical modeling of tumor regrowth experiment in xenograft studies May 18.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Decision Making Under Uncertainty
Game theory Chapter 28 and 29
Keep the Adversary Guessing: Agent Security by Policy Randomization
Learning Profiles from User Interactions
Types of risk Market risk
Lesson 8: Basic Monte Carlo integration
Carnegie Mellon University
Figure 5: Change in Blackjack Posterior Distributions over Time.
What is a Hidden Markov Model?
Deep Feedforward Networks
C. Kiekintveld et al., AAMAS 2009 (USC) Presented by Neal Gupta
Lecture 18 Expectation Maximization
Prof. Dr. Holger Schlingloff 1,2 Dr. Esteban Pavese 1
Reinforcement Learning (1)
Task: It is necessary to choose the most suitable variant from some set of objects by those or other criteria.
LECTURE 10: EXPECTATION MAXIMIZATION (EM)
Reinforcement Learning
Application of network flow: Protecting coral reef ecosystems
3.7. Other Game Physics Approaches
Game theory Chapter 28 and 29
Presented by Prashant Duhoon
Comparing Genetic Algorithm and Guided Local Search Methods
Deploying PAWS: Field Optimization of the Protection Assistant for Wildlife Security Fang et al.
Digital Signature Schemes and the Random Oracle Model
Latent Variables, Mixture Models and EM
Roland Kwitt & Tobias Strohmeier
Professor Arne Thesen, University of Wisconsin-Madison
Propagating Uncertainty In POMDP Value Iteration with Gaussian Process
Hidden Markov Models Part 2: Algorithms
More about Posterior Distributions
Presented By Aaron Roth
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Instructors: Fei Fang (This Lecture) and Dave Touretzky
Robust Mechanism Design with Correlated Distributions
October 6, 2011 Dr. Itamar Arel College of Engineering
Multiagent Systems Repeated Games © Manfred Huber 2018.
Boltzmann Machine (BM) (§6.4)
Sequence comparison: Significance of similarity scores
Learning From Observed Data
Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.
Information Theoretical Analysis of Digital Watermarking
Reinforcement Learning (2)
Dong Xuan*, Sriram Chellappan*, Xun Wang* and Shengquan Wang+
Markov Decision Processes
Markov Decision Processes
Reinforcement Learning
Reinforcement Learning (2)
Presentation transcript:

When Security Games Go Green Suraj Nair

Outline Green Security Game Planning algorithms Planning and learning Results I will go through Green Security Game model, and provide the high level ideas of the planning and learning algorithm, followed by experimental results

Green Security Domains: Protecting Fish and Wildlife Application of traditional game theoretic models to green security domains The work is motivated by green security problems, including protecting wildlife A similar challenge is faced by officers who try to protect fish stocks from illegal fishing.

Features Green security games Generalized Stackelberg assumption Repeated and frequent attacks Significant amounts data Attacker bounded rationality Limited surveillance/planning Green Security Games can be viewed as generalized Stackelberg assumption. The SA is the standard assumption used in infrastructure security games that the defender will commit to a strategy first, and then the attacker will conduct surveillance to fully observe the defender’s strategy before taking action. However, this assumption may not always hold as the poachers or the illegal fishmen take actions frequently and may not have time to fully observe the defender’s current strategy. when the defender decides to change her strategy in one day, the poachers may not get a timely update about this change, and may still respond to what they believe the defender strategy is, which is already outdated. So the defender can benefit from the strategy change if planned carefully.

Green Security Game Model 𝑇 round game, K defenders, N targets where N≥𝐾 Coverage vector 𝑐= 𝑐 𝑖 where 𝑐 𝑖 denotes probability that target i is covered 𝑐𝑡 denotes the defender strategy profile for round t Jan Feb Mar Apr May 0.3 0.4 0.1 0.2 0.5 0.7 0.6 0.9 To model the frequent and repeated interaction, we model the problem as a repeated game with T rounds, each round corresponds to a period of time, say a month The protected area is discretized into multiple subareas, each can be seen as a target. In each round, the defender chooses a mixed strategy, for example There are K patrollers who will decide where to patrol every day. In this talk, we consider the special case where each patroller will choose a sub-area every day and do extensive patrols in that area. The defender is faced with a population of attackers, and each attacker will choose in which subarea to place snares every day.

Green Security Game Model L attackers who respond to convex combination of defender strategy in recent rounds 𝜂 𝑡 denotes the strategy of attacker for round t Payoff values for target i 𝑃 𝑖 a , R i a , P i d , R i d Where P stands for Penalty, R for reward a for attacker, d for defender Expected utility for defender d if attacker targets i 𝑈 𝑖 𝑑 𝑐 = 𝑐 𝑖 𝑅 𝑖 𝑑 + 1− 𝑐 𝑖 𝑃 𝑖 𝑑 Jan Feb Mar Apr May 𝑐 1 𝑐 2 𝑐 3 =? 𝜂 3 =0.3 𝑐 1 +0.7 𝑐 2 We know that the attackers may not have up-to-date information about the defender strategy. For example, on one day in March, when the poacher needs to decide where to place snares, he does not know what is the defender strategy that is currently used in March. Naturally, he will take into account what he observed or learned from other poachers from the past. So we estimate the attackers’ belief about the defender strategy in current round, denoted by eta^t, as a convex combination of the defender strategies used in recent rounds. This is a generalization of the perfect Stackelberg assumption which says the attackers belief is always up-to-date. This generalized assumption provides more flexibility and better fits the situation in green security domains.

Green Security Game Model Attacker chooses target with bounded rationality Following the SUQR model Choose more promising targets with higher probability Probability that an attacker attacks target i is 𝑞 𝑖 𝜔,𝜂 = 𝑒 𝜔 1 𝜂 𝑖 + 𝜔 2 𝑅 𝑖 𝑎 + 𝜔 3 𝑃 𝑖 𝑎 𝑗 𝑒 𝜔 1 𝜂 𝑖 + 𝜔 2 𝑅 𝑖 𝑎 + 𝜔 3 𝑃 𝑖 𝑎 Create a defender strategy profile 𝑐 =⟨ 𝑐 1 ,…, 𝑐 𝑇 ⟩ Expected utility of defender in round t 𝐸 𝑡 ⌊𝑐⌋ = 𝑙 𝑖 𝑞 𝑖 𝜔 𝑙 , 𝜂 𝑡 𝑈 𝑖 𝑑 ( 𝑐 𝑡 ) We also know that the attackers are boundedly rational in choosing targets, meaning they may not always go for the target with the highest expected utility. So we adopt the SUQR model, the best model for attackers’ bounded rationality when tested against human subjects. This models says that the attacker will choose more promising targets with higher probability. And the judgement on which target is more promising is based on the attacker’s evaluation on key properties of the targets will choose a target i with probability q_i, which is a non-convex function of the reward, penalty and defender strategy. Note that omega 1, 2, 3 are parameters that describes how the attacker evaluate different targets and the parameters are different from attacker to attacker.

Green Security Game Model Outline Green Security Game Model Planning algorithms Planning and Learning Results Based on this model, when all the parameters are known, we propose planning algorithms that design defender strategies for each round of the game. In this talk, I will use a special case as an example, which is the attackers always respond to the defender strategy in the last round, meaning the attackers’ understanding about the defender strategy is always delayed by one round. 8

N S Planning Exploit attackers’ delayed observation ( 𝜂 𝑡 = 𝑐 𝑡−1 ) A simple example: Patrol Plan A: always uniformly random Patrol Plan B: change her strategy deliberately, detect more snares overall N S Jan Feb N 80% 20% S In this case, we need to plan for defender strategy change to exploit the attackers’ delayed observation. Assume, w is fixed

x Planning Solve directly Optimize over all rounds → computationally expensive Jan Feb Mar Apr May ? Build a large Mathematical Program (MP), optimize over all rounds simultaneously Non-convex, large number of variables → computationally expensive when 𝑇 is large

PlanAhead-M PlanAhead-M Look ahead M steps: find an optimal strategy for current round as if it is the Mth last round of the game Sliding window of size M. Example with M=2 𝑬 𝟏 + 𝑬 𝟐 𝑬 𝟑 + 𝑬 𝟒 Jan Feb Mar Apr May We play as if on M'th last round of game. 𝑬 𝟐 + 𝑬 𝟑 𝑬 𝟒 + 𝑬 𝟓 Add discount factor 𝛾 to compensate the over-estimation

PlanAhead-M Mathematical program

FixedSequence-M Require the defender to execute the sequence of length M repeatedly Example with M=2: find best strategy A and B Theoretical guarantee: 1− 1 𝑀 approximation of the optimal strategy profile Jan Feb Mar Apr May A B

FixedSequence-M We use an example to illustrate the problem, three targets moves between terminals A and B. The schedule of a target F_q is a mapping from time to distance. A patroller moves along AB, and provide protection to targets within protection radius. In this example, patroller P1 is protecting target F2.

Green Security Game Model Outline Green Security Game Model Planning algorithms Planning and Learning Results Based on this model, when all the parameters are known, we propose planning algorithms that design defender strategies for each round of the game. In this talk, I will use a special case as an example, which is the attackers always respond to the defender strategy in the last round, meaning the attackers’ understanding about the defender strategy is always delayed by one round. 15

Planning and Learning Learn parameters in attackers’ bounded rationality model from attack data Previous work Apply Maximum Likelihood Estimation (MLE) May lead to highly biased results Proposed learning algorithm Calculate posterior distribution for each data point Focus on learning the omega parameters that describe the attackers’ characteristic in choosing which target to attack. Previous work on learning these parameters in green security domains focus on maximum likelihood estimation, which may lead to highly biased results. as there may not be sufficient number of data, and in many cases, the data in green security domains could be anonymous. For example, the patrollers may only find the snares placed on the ground without knowing who placed it.

Planning and Learning 𝜒 𝑖 - number of attacks on target i learning algorithms calculates a posterior distribution of each data point in every round, and use the average distribution to calculate the strategy for next round. 𝜒 𝑖 - number of attacks on target i discrete set 𝜔 - 𝜔 1 ,…, 𝜔 𝑆 prior 𝑝 - ⟨ 𝑝 1 ,…, 𝑝 𝑆 ⟩

General Framework of Green Security Game Defender plans her strategy Start New Round Learn from data: Improve model Local guards executes patrols So the framework of green security games is that the defender first plans her strategy, local guards then executes patrols, and the poachers will choose where to attack based on a maybe-delayed observation about the defender strategy. Then the defender can learn from the attack data collected, improve her model about the poachers’ behavior and re-plan her strategy for next round. Poachers attack targets

Green Security Game Model Outline Green Security Game Model Planning algorithms Planning and Learning Results 19

Experimental Results Planning Baseline: FS-1 (Stackelberg), PA-1 (Myopic) Attacker respond to last round strategy, 10 Targets, 4 Patrollers Baseline

Experimental Results Planning and Learning Baseline: Maximum Likelihood Estimation (MLE) Solution Quality Runtime

Thank you!