6/30/00UAI 20001 Regret Minimization in Stochastic Games Shie Mannor and Nahum Shimkin Technion, Israel Institute of Technology Dept. of Electrical Engineering.

Slides:



Advertisements
Similar presentations
IDSIA Lugano Switzerland Master Algorithms for Active Experts Problems based on Increasing Loss Values Jan Poland and Marcus Hutter Defensive Universal.
Advertisements

Module 4 Game Theory To accompany Quantitative Analysis for Management, Tenth Edition, by Render, Stair, and Hanna Power Point slides created by Jeff Heyl.
Lecturer: Moni Naor Algorithmic Game Theory Uri Feige Robi Krauthgamer Moni Naor Lecture 8: Regret Minimization.
15 THEORY OF GAMES CHAPTER.
Joint Strategy Fictitious Play Sherwin Doroudi. “Adapted” from J. R. Marden, G. Arslan, J. S. Shamma, “Joint strategy fictitious play with inertia for.
Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc
Study Group Randomized Algorithms 21 st June 03. Topics Covered Game Tree Evaluation –its expected run time is better than the worst- case complexity.
Part 3: The Minimax Theorem
Chapter 9 Perceptrons and their generalizations. Rosenblatt ’ s perceptron Proofs of the theorem Method of stochastic approximation and sigmoid approximation.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Convergent Learning in Unknown Graphical Games Dr Archie Chapman, Dr David Leslie, Dr Alex Rogers and Prof Nick Jennings School of Mathematics, University.
主講人:虞台文 大同大學資工所 智慧型多媒體研究室
Study of the periodic time-varying nonlinear iterative learning control ECE 6330: Nonlinear and Adaptive Control FISP Hyo-Sung Ahn Dept of Electrical and.
Dynamic Adversarial Conflict with Restricted Information Jason L. Speyer Research Asst.: Ashitosh Swarup Mechanical and Aerospace Engineering Department.
Pieter Abbeel and Andrew Y. Ng Apprenticeship Learning via Inverse Reinforcement Learning Pieter Abbeel Stanford University [Joint work with Andrew Ng.]
Competitive Routing in Multi-User Communication Networks Presentation By: Yuval Lifshitz In Seminar: Computational Issues in Game Theory (2002/3) By: Prof.
INSTITUTO DE SISTEMAS E ROBÓTICA Minimax Value Iteration Applied to Robotic Soccer Gonçalo Neto Institute for Systems and Robotics Instituto Superior Técnico.
Bayesian Reinforcement Learning with Gaussian Processes Huanren Zhang Electrical and Computer Engineering Purdue University.
AWESOME: A General Multiagent Learning Algorithm that Converges in Self- Play and Learns a Best Response Against Stationary Opponents Vincent Conitzer.
Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.
Reinforcement Learning
No Free Lunch (NFL) Theorem Many slides are based on a presentation of Y.C. Ho Presentation by Kristian Nolde.
A Scalable Network Resource Allocation Mechanism With Bounded Efficiency Loss IEEE Journal on Selected Areas in Communications, 2006 Johari, R., Tsitsiklis,
1 Markov Chains Algorithms in Computational Biology Spring 2006 Slides were edited by Itai Sharon from Dan Geiger and Ydo Wexler.
1 A Stochastic Pursuit-Evasion Game with no Information Sharing Ashitosh Swarup Jason Speyer Johnathan Wolfe School of Engineering and Applied Science.
Reshef Meir School of Computer Science and Engineering Hebrew University, Jerusalem, Israel Joint work with Maria Polukarov, Jeffery S. Rosenschein and.
Pieter Abbeel and Andrew Y. Ng Reinforcement Learning and Apprenticeship Learning Pieter Abbeel and Andrew Y. Ng Stanford University.
Reinforcement Learning Yishay Mansour Tel-Aviv University.
Network Formation Games. Netwok Formation Games NFGs model distinct ways in which selfish agents might create and evaluate networks We’ll see two models:
1 On the Agenda(s) of Research on Multi-Agent Learning by Yoav Shoham and Rob Powers and Trond Grenager Learning against opponents with bounded memory.
Learning and Planning for POMDPs Eyal Even-Dar, Tel-Aviv University Sham Kakade, University of Pennsylvania Yishay Mansour, Tel-Aviv University.
Alternating-Offers Bargaining under One-Sided Uncertainty on Deadlines Francesco Di Giunta and Nicola Gatti Dipartimento di Elettronica e Informazione.
QUASI MAXIMUM LIKELIHOOD BLIND DECONVOLUTION QUASI MAXIMUM LIKELIHOOD BLIND DECONVOLUTION Alexander Bronstein.
Hardness-Aware Restart Policies Yongshao Ruan, Eric Horvitz, & Henry Kautz IJCAI 2003 Workshop on Stochastic Search.
Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011.
CS6800 Advanced Theory of Computation Fall 2012 Vinay B Gavirangaswamy
Scheduling of Flexible Resources in Professional Service Firms Arun Singh CS 537- Dr. G.S. Young Dept. of Computer Science Cal Poly Pomona.
Game Theory.
MDP Reinforcement Learning. Markov Decision Process “Should you give money to charity?” “Would you contribute?” “Should you give money to charity?” $
Students: Lior Kupfer Pavel Lifshits Supervisor: Andrey Bernstein Advisor: Prof. Nahum Shimkin Technion – Israel Institute of Technology Faculty of Electrical.
Reinforcement Learning on Markov Games Nilanjan Dasgupta Department of Electrical and Computer Engineering Duke University Durham, NC Machine Learning.
Bayesian and non-Bayesian Learning in Games Ehud Lehrer Tel Aviv University, School of Mathematical Sciences Including joint works with: Ehud Kalai, Rann.
Auction Seminar Optimal Mechanism Presentation by: Alon Resler Supervised by: Amos Fiat.
Derivative Action Learning in Games Review of: J. Shamma and G. Arslan, “Dynamic Fictitious Play, Dynamic Gradient Play, and Distributed Convergence to.
Monetary Economics Game and Monetary Policymaking.
Predicting Content Change on the Web Kira Radinsky Technion, Israel Paul Bennettt Microsoft Research.
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.
Dynamic Programming for Partially Observable Stochastic Games Daniel S. Bernstein University of Massachusetts Amherst in collaboration with Christopher.
Game Theory: introduction and applications to computer networks Game Theory: introduction and applications to computer networks Lecture 2: two-person non.
On information theory and association rule interestingness Loo Kin Kong 5 th July, 2002.
Regret Minimizing Equilibria of Games with Strict Type Uncertainty Stony Brook Conference on Game Theory Nathanaël Hyafil and Craig Boutilier Department.
I.3 Introduction to the theory of convex conjugated function.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
Reinforcement Learning Yishay Mansour Tel-Aviv University.
Issues on the border of economics and computation נושאים בגבול כלכלה וחישוב Speaker: Dr. Michael Schapira Topic: Dynamics in Games (Part III) (Some slides.
Designing Games for Distributed Optimization Na Li and Jason R. Marden IEEE Journal of Selected Topics in Signal Processing, Vol. 7, No. 2, pp ,
Shall we play a game? Game Theory and Computer Science Game Theory /06/05 - Zero-sum games - General-sum games.
ECE 8443 – Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem Proof EM Example – Missing Data Intro to Hidden Markov Models.
Reinforcement Learning
1 On the Channel Capacity of Wireless Fading Channels C. D. Charalambous and S. Z. Denic School of Information Technology and Engineering, University of.
5.1.Static Games of Incomplete Information
Possible actions: up, down, right, left Rewards: – 0.04 if non-terminal state Environment is observable (i.e., agent knows where it is) MDP = “Markov Decision.
CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.
(5) Notes on the Least Squares Estimate
Making complex decisions
Policy Gradient in Continuous Time
Aviv Rosenberg 10/01/18 Seminar on Experts and Bandits
Note: For the following concept tests about time-dependent perturbation theory, The general state of a two-state system system at time
October 6, 2011 Dr. Itamar Arel College of Engineering
Signal Processing on Graphs: Performance of Graph Structure Estimation
Presentation transcript:

6/30/00UAI Regret Minimization in Stochastic Games Shie Mannor and Nahum Shimkin Technion, Israel Institute of Technology Dept. of Electrical Engineering

6/30/00UAI Introduction Modeling of a dynamic decision process as a stochastic game: Non stationarity of the environment Environments are not (necessarily) hostile Looking for the best possible strategy in light of the environment ’ s actions.

6/30/00UAI Repeated Matrix Games The sets of single stage strategies P and Q are simplical. Rewards are defined by a reward matrix G: r(p,q)=pGq Reward criteria - average reward Need not converge – stationarity is not assumed

6/30/00UAI Regret for Repeated Matrix Games  Suppose by time t, average reward is, opponent empirical strategy is q t.  The regret is defined as:  A policy is called regret minimizing if:

6/30/00UAI Regret minimization for repeated matrix games Such policies do exist (Hannan, 56) A proof using Approachability theory (Blackwell, 56) Also for games with partial observation (Auer et al.,1995 ; Rustichini, 1999)

6/30/00UAI Stochastic Games Formal Model: S={1, …,s} state space A=A(s) actions of Regret minimizing player, P1 B=B(s) actions of the “ environment ”, P2 r - reward function, r(s,a,b) P - transition kernel, P(s`|s,a,b) Expected average for pP, qQ is r(p,q) Single state recurrence assumption

6/30/00UAI Bayes Reward in Strategy Space  For every stationary strategy qQ, the Bayes reward is defined as:  Problems:  P2 ’ s strategy is not completely observed  P1 ’ s observations may depends on the strategies of both players

6/30/00UAI Bayes Reward in State- Action Space Let  sb be the observed frequency of P2 ’ s action b and state s. A natural estimate of q is: The associated Bayes envelope is:

6/30/00UAI Approachability Theory  A standard tool in the theory of repeated matrix games (Blackwell, 1956)  For a game with vector reward and average reward  A set is approachable by P1 with a policy  if:  Was extended to recurrent stochastic games (Shimkin and Shwartz, 1993)

6/30/00UAI The Convex Bayes Envelope In general BE is not approachable. Define CBE=co(BE), that is where is the lower convex hull of Theorem: CBE is approachable. (val is the value of the game)

6/30/00UAI Single Controller Games Theorem: Assume that P2 alone controls the transitions, i.e. then BE itself is approachable.

6/30/00UAI An Application to Prediction with Expert Advice  Given a channel and a set of experts  At each time epoch each expert states his prediction of the next symbol and P1 has to choose his prediction,   Then a letter  appears in the channel and P1 receives his prediction reward r(, ) Problem can be formulated as stochastic game, P2 stands for all experts and the channel

6/30/00UAI Prediction Example (cont ’ ) Theorem: P1 has a zero regret strategy. 0 (0,0,0) (k-1,k,k) (k,k,k) Expert recommendation 0 r(a,b) r=0

6/30/00UAI An example in which BE is not approachable It can be proved that BE for the above game is not approachable r=b S 0 r=b S 1 a=0 a=1 P=0.99 B(0)=B(1)={-1,1}

6/30/00UAI Example (cont ’ ) In r*(q) space the envelopes are:

6/30/00UAI Open questions Characterization of minimal approachable sets in reward- state-actions space On-line learning schemes for stochastic games with unknown parameters Other ways of formulating optimality with respect to observed state action frequencies

6/30/00UAI Conclusions  The problem of regret minimization for stochastic games was considered  The proposed solution concept, CBE, is based on convexification of the Bayes envelope in the natural state action space.  The concept of CBE ensures an average reward that is higher than value when the opponent is sub optimal

6/30/00UAI Regret Minimization in Stochastic Games Shie Mannor and Nahum Shimkin Technion, Israel Institute of Technology Dept. of Electrical Engineering

6/30/00UAI Approachability Theory Let m(p,q) be the average vector valued reward in a game when P1 and P2 play p and q Define Theorem [Blackwell 56]: A convex set C is approachable if and only if for every qQ Extended to stochastic games (Shimkin and Shwartz, 1993)

6/30/00UAI A related Vector Valued Game Define the following vector valued game: If in state s action b is played by P2 and a reward r is gained then the vector valued m t :