Markov Game Analysis for Attack and Defense of Power Networks Chris Y. T. Ma, David K. Y. Yau, Xin Lou, and Nageswara S. V. Rao.

Slides:

Advertisements

Similar presentations

Markov Decision Process

Advertisements

1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.

Value Iteration & Q-learning CS 5368 Song Cui. Outline Recap Value Iteration Q-learning.

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Solving POMDPs Using Quadratically Constrained Linear Programs Christopher Amato.

Game Theoretical Insights in Strategic Patrolling: Model and Analysis Nicola Gatti – DEI, Politecnico di Milano, Piazza Leonardo.

SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.

An Introduction to... Evolutionary Game Theory

Markov Game Analysis for Attack and Defense of Power Networks Chris Y. T. Ma, David K. Y. Yau, Xin Lou, and Nageswara S. V. Rao.

CSE-573 Artificial Intelligence Partially-Observable MDPS (POMDPs)

Game Theory Asia Burrill, Marc Relford, Bridgette Mallet.

A Game Theoretic Model of Strategic Conflict in Cyberspace Operations Research Department Naval Postgraduate School, Monterey, CA 80 th MORS 12 June, 2012.

9.4 Linear programming and m x n Games: Simplex Method and the Dual Problem In this section, the process of solving 2 x 2 matrix games will be generalized.

Part 3: The Minimax Theorem

主講人：虞台文大同大學資工所智慧型多媒體研究室

COSC 878 Seminar on Large Scale Statistical Machine Learning 1.

Markov Decision Processes

Planning under Uncertainty

Defending Complex System Against External Impacts Gregory Levitin (IEC, UESTC)

INSTITUTO DE SISTEMAS E ROBÓTICA Minimax Value Iteration Applied to Robotic Soccer Gonçalo Neto Institute for Systems and Robotics Instituto Superior Técnico.

Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.

Nash Q-Learning for General-Sum Stochastic Games Hu & Wellman March 6 th, 2006 CS286r Presented by Ilan Lobel.

Planning in MDPs S&B: Sec 3.6; Ch. 4. Administrivia Reminder: Final project proposal due this Friday If you haven’t talked to me yet, you still have the.

1 University of Southern California Security in Multiagent Systems by Policy Randomization Praveen Paruchuri, Milind Tambe, Fernando Ordonez University.

1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

Planning to learn. Progress report Last time: Transition functions & stochastic outcomes Markov chains MDPs defined Today: Exercise completed Value functions.

CS121 Heuristic Search Planning CSPs Adversarial Search Probabilistic Reasoning Probabilistic Belief Learning.

Reinforcement Learning Game playing: So far, we have told the agent the value of a given board position. How can agent learn which positions are important?

More RL. MDPs defined A Markov decision process (MDP), M, is a model of a stochastic, dynamic, controllable, rewarding process given by: M = 〈 S, A,T,R.

Reinforcement Learning Yishay Mansour Tel-Aviv University.

1 Quality of Experience Control Strategies for Scalable Video Processing Wim Verhaegh, Clemens Wüst, Reinder J. Bril, Christian Hentschel, Liesbeth Steffens.

Making Decisions CSE 592 Winter 2003 Henry Kautz.

1 Introduction of MDP Speaker ： Xu Jia-Hao Adviser ： Ke Kai-Wei.

MDP Reinforcement Learning. Markov Decision Process “Should you give money to charity?” “Would you contribute?” “Should you give money to charity?” $

Utility Theory & MDPs Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart.

Instructor: Vincent Conitzer

MAKING COMPLEX DEClSlONS

Reinforcement Learning on Markov Games Nilanjan Dasgupta Department of Electrical and Computer Engineering Duke University Durham, NC Machine Learning.

CSE-473 Artificial Intelligence Partially-Observable MDPS (POMDPs)

1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.

Reinforcement Learning Presentation Markov Games as a Framework for Multi-agent Reinforcement Learning Mike L. Littman Jinzhong Niu March 30, 2004.

Chapter 11 Game Theory Math Game Theory What is it? – a way to model conflict and competition – one or more "players" make simultaneous decisions.

Reinforcement Learning 主講人：虞台文 Content Introduction Main Elements Markov Decision Process (MDP) Value Functions.

Regret Minimizing Equilibria of Games with Strict Type Uncertainty Stony Brook Conference on Game Theory Nathanaël Hyafil and Craig Boutilier Department.

Reinforcement Learning Yishay Mansour Tel-Aviv University.

1 Introduction to Reinforcement Learning Freek Stulp.

Algorithmic, Game-theoretic and Logical Foundations

MDPs (cont) & Reinforcement Learning

Decision Theoretic Planning. Decisions Under Uncertainty  Some areas of AI (e.g., planning) focus on decision making in domains where the environment.

Announcements  Upcoming due dates  Wednesday 11/4, 11:59pm Homework 8  Friday 10/30, 5pm Project 3  Watch out for Daylight Savings and UTC.

1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002.

Slide 1/20 Defending Against Strategic Adversaries in Dynamic Pricing Markets for Smart Grids Paul Wood, Saurabh Bagchi Purdue University

Chapter 5 Adversarial Search. 5.1 Games Why Study Game Playing? Games allow us to experiment with easier versions of real-world situations Hostile agents.

Keep the Adversary Guessing: Agent Security by Policy Randomization

On-Line Markov Decision Processes for Learning Movement in Video Games

A Game Theoretic Study of Attack and Defense in Cyber-Physical Systems

Non-additive Security Games

Markov Decision Processes

Vincent Conitzer CPS Repeated games Vincent Conitzer

For modeling conflict and cooperation Schwartz/Teneketzis

Solutions Sample Games 1

Game Theory Solutions 1 Find the saddle point for the game having the following payoff table. Use the minimax criterion to find the best strategy for.

Announcements Homework 3 due today (grace period through Friday)

Network Optimization Research Laboratory

CASE − Cognitive Agents for Social Environments

9.3 Linear programming and 2 x 2 games : A geometric approach

Reinforcement Learning Dealing with Partial Observability

Reinforcement learning

Richard Ma, Sam Lee, John Lui (CUHK) David Yau (Purdue)

Vincent Conitzer CPS Repeated games Vincent Conitzer

Presentation transcript:

Markov Game Analysis for Attack and Defense of Power Networks Chris Y. T. Ma, David K. Y. Yau, Xin Lou, and Nageswara S. V. Rao

Power Networks are Important Infrastructures (And Vulnerable to Attacks) Growing reliance on electricity Aging infrastructure Introduced more connected digital sensing and control devices (and attract attacks on cyber space) Hard and expensive to protect Limited budget How to allocate the limited resources? – Optimal deployment to maximize long-term payoff

Modeling the Interactions – Game Theoretic Approaches Static game – Each player has a set of actions available – Players act simultaneously – Outcome and payoff determined by action of all players

Static Game Example Defend & Attack Defend & No Attack No defend & Attack No defend & No Attack

Modeling the Interactions – Game Theoretic Approaches Leader-follower game (Stackelberg game) – Defender as the leader – Adversary as the follower – Bi-level optimization – minimax operation Inner level: follower maximizes its payoff given a leader’s strategy Outer level: leader maximizes its payoff subject to the follower’s solution of the inner problem

Stackelberg Game Example Defend No defend Attack No Attack Attack No Attack Only model one-time interactions

Modeling the Interactions – Markov Decision Process Markov Decision Process (MDP) – System modeled as set of states with Markov transitions between them – Transition depends on action of one player and some passive disruptors of known probabilistic behaviors (acts of nature)

Markov Decision Process (MDP) Example (2 states, each has 2 actions available) updown Defend No defend Recover No recover Only models one intelligent player

Our Approach – Markov Game Generalization of MDP to an adversarial setting – Models the continual interactions between multiple players Players interact in the new state with different payoffs – Models probabilistic state transition because of inherent uncertainty in the system (e.g., random acts of nature)

Problem Formulation Defender and adversary of a power network Game formulation: – Adversary Actions: which link to attack Payoff: cost of load shedding by the defender because of the attack – Defender Actions: which (up) link to reinforce or which (down) link to recover Payoff: cost of load shedding because of the attack – Two-player zero-sum game

Markov Game – Reward Overview Assume five links; link 4 both attacked and defended (u,u,u,u,u) (u,u,u,d,u) (u,u,u,u,u) (u,u,u,d,u) p1p1 1-p 1 Immediate reward of such actions is the weighted sum of successful attack and successful defense Assume at state (u,u,u,d,u), link 4 both attacked and defended again p2p2 1-p 2 Immediate reward at state (u,u,u,d,u) is then the weighted sum of successful recovery and failed recovery This immediate reward is further “propagated” back to the original state (u,u,u,u,u) with a discount factor Hence, actions taken in a state will accrue a long-term reward

Solving the Markov Game – Definitions

Finding the Optimal Strategy – Solving a Linear Program

Solving the Markov Game – Value Iteration Dynamic program (value iteration) to solve the Markov game

Experiment Results Link diagram State {u,u,u,u,u} Links 4 and 5 both connect to generator, and generator at bus 4 has higher output

Experiment Results Payoff Matrix of state {u,u,u,u,u} for the static game. Payoff Matrix of state {u,u,u,u,u} for the Markov game. (ϒ = 0.3)

Conclusions Using Markov game to model the attack and defense of a power network between two players Results show the action of players depends not only on current state, but also later states – To obtain the optimal long term benefit