The Big Match in small space Kristoffer Arnsfelt Hansen Rasmus Ibsen-Jensen Michal Koucký.

Slides:



Advertisements
Similar presentations
Approximate Nash Equilibria in interesting games Constantinos Daskalakis, U.C. Berkeley.
Advertisements

Sampling From a Moving Window Over Streaming Data Brian Babcock * Mayur Datar Rajeev Motwani * Speaker Stanford University.
THE PRICE OF STOCHASTIC ANARCHY Christine ChungUniversity of Pittsburgh Katrina LigettCarnegie Mellon University Kirk PruhsUniversity of Pittsburgh Aaron.
Vincent Conitzer CPS Repeated games Vincent Conitzer
Winning concurrent reachability games requires doubly-exponential patience Michal Koucký IM AS CR, Prague Kristoffer Arnsfelt Hansen, Peter Bro Miltersen.
Game Theory Assignment For all of these games, P1 chooses between the columns, and P2 chooses between the rows.
RL for Large State Spaces: Value Function Approximation
Partially Observable Markov Decision Process (POMDP)
Falcon on a Cloudy Day A Ro Sham Bo Algorithm by Andrew Post.
Introduction to Game theory Presented by: George Fortetsanakis.
Chapter 14 Infinite Horizon 1.Markov Games 2.Markov Solutions 3.Infinite Horizon Repeated Games 4.Trigger Strategy Solutions 5.Investing in Strategic Capital.
Study Group Randomized Algorithms 21 st June 03. Topics Covered Game Tree Evaluation –its expected run time is better than the worst- case complexity.
Randy Whitehead. What is a Game?  We all know how to play games. Whether they involve cards, sports equipment, boards, dice, or a multitude of other.
EC941 - Game Theory Lecture 7 Prof. Francesco Squintani
1 Learning with continuous experts using Drifting Games work with Robert E. Schapire Princeton University work with Robert E. Schapire Princeton University.
Randomness for Free Laurent Doyen LSV, ENS Cachan & CNRS joint work with Krishnendu Chatterjee, Hugo Gimbert, Tom Henzinger.
Concurrent Reachability Games Peter Bro Miltersen Aarhus University 1CTW 2009.
Online Vertex-Coloring Games in Random Graphs Revisited Reto Spöhel (joint work with Torsten Mütze and Thomas Rast; appeared at SODA ’11)
The Evolution of Conventions H. Peyton Young Presented by Na Li and Cory Pender.
11 - Markov Chains Jim Vallandingham.
Planning under Uncertainty
Center for the Study of Rationality Hebrew University of Jerusalem 1/39 n-Player Stochastic Games with Additive Transitions.
Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger.
Game Theory and Applications following H. Varian Chapters 28 & 29.
CSCE 411 Design and Analysis of Algorithms Andreas Klappenecker TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAA.
Game Theory Here we study a method for thinking about oligopoly situations. As we consider some terminology, we will see the simultaneous move, one shot.
Advanced Algorithms for Massive Datasets Basics of Hashing.
1 Markov Chains Algorithms in Computational Biology Spring 2006 Slides were edited by Itai Sharon from Dan Geiger and Ydo Wexler.
Stochastic Games Krishnendu Chatterjee CS 294 Game Theory.
Mixing Times of Markov Chains for Self-Organizing Lists and Biased Permutations Prateek Bhakta, Sarah Miracle, Dana Randall and Amanda Streib.
Game Theory Statistics 802. Lecture Agenda Overview of games 2 player games representations 2 player zero-sum games Render/Stair/Hanna text CD QM for.
Asaf Cohen (joint work with Rami Atar) Department of Mathematics University of Michigan Financial Mathematics Seminar University of Michigan March 11,
MAKING COMPLEX DEClSlONS
General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Duke University Machine Learning Group Discussion Leader: Kai Ni June 17, 2005.
Markov Decision Processes1 Definitions; Stationary policies; Value improvement algorithm, Policy improvement algorithm, and linear programming for discounted.
Great Theoretical Ideas in Computer Science.
Mini-course on algorithmic aspects of stochastic games and related models Marcin Jurdzinski (University of Warwick) Peter Bro Miltersen (Aarhus University)
Reinforcement Learning
Dominance Since Player I is maximizing her security level, she prefers “large” payoffs. If one row is smaller (element- wise) than another,
CPSC 411 Design and Analysis of Algorithms
M ONTE C ARLO SIMULATION Modeling and Simulation CS
06/12/2015Applied Algorithmics - week41 Non-periodicity and witnesses  Periodicity - continued If string w=w[0..n-1] has periodicity p if w[i]=w[i+p],
>> x = [ ]; y = 2*x y = Arrays x and y are one dimensional arrays called vectors. In MATLAB all variables are arrays. They allow functions.
Sorting: Implementation Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.
Decision Making Under Uncertainty CMSC 471 – Spring 2041 Class #25– Tuesday, April 29 R&N, material from Lise Getoor, Jean-Claude Latombe, and.
Pseudorandom generators for group products Michal Koucký Institute of Mathematics, Prague Prajakta Nimbhorkar Pavel Pudlák IMSC, Chenai IM, Prague IMSC,
February 2, 2016 Stochastic Games Mr Sujit P Gujar. e-Enterprise Lab Computer Science and Automation IISc, Bangalore.
Statistics Overview of games 2 player games representations 2 player zero-sum games Render/Stair/Hanna text CD QM for Windows software Modeling.
1 (Chapter 3 of) Planning and Control in Stochastic Domains with Imperfect Information by Milos Hauskrecht CS594 Automated Decision Making Course Presentation.
9.2 Mixed Strategy Games In this section, we look at non-strictly determined games. For these type of games the payoff matrix has no saddle points.
Markov Decision Processes AIMA: 17.1, 17.2 (excluding ), 17.3.
P2P storage trading system (A preliminary idea) Presenter: Lin Wing Kai (Kai)
Krishnendu ChatterjeeFormal Methods Class1 MARKOV CHAINS.
a card game overview groups of two; one is Row, the other Column
Kristoffer Arnsfelt Hansen Rasmus Ibsen-Jensen Peter Bro Miltersen
Matrices Rules & Operations.
A new characterization of ACC0 and probabilistic CC0
Bolin Ding Silu Huang* Surajit Chaudhuri Kaushik Chakrabarti Chi Wang
Path Coupling And Approximate Counting
Factoring Quadratics December 1st, 2016.
Turnstile Streaming Algorithms Might as Well Be Linear Sketches
Thomas Dueholm Hansen – Aarhus Univ. Uri Zwick – Tel Aviv Univ.
Reinforcement Learning
Multiagent Systems Repeated Games © Manfred Huber 2018.
Markov Decision Problems
Space-Saving Strategies for Computing Δ-points
Guess a random word with n errors
Space-Saving Strategies for Computing Δ-points
The dilemma overview paired into groups of two; one is Row, the other Column each simultaneously chooses either C or D a player choosing D increases own.
Integer Programming (IP)
Presentation transcript:

The Big Match in small space Kristoffer Arnsfelt Hansen Rasmus Ibsen-Jensen Michal Koucký

Overview Playing the Big Match in small spacePlaying the Big Match in small space Open problem: Playing CMPGs in small spaceOpen problem: Playing CMPGs in small space 2 of 17 Next slide

The Big Match Gillette ‘ of 17 Next slide

The Big Match Gillette ‘ of 17

The Big Match Gillette ‘ of 17

The Big Match Gillette ‘ of 17

The Big Match Gillette ‘ of 17

The Big Match Gillette ‘ of 17

The Big Match Gillette ‘ , 1 3 of 17

The Big Match Gillette ‘ , 1 3 of 17

The Big Match Gillette ‘ , 1 3 of 17

The Big Match Gillette ‘ , 1 3 of 17

The Big Match Gillette ‘ , 1, 1 3 of 17

The Big Match Gillette ‘ , 1, 1 3 of 17

The Big Match Gillette ‘ , 1, 1 3 of 17

The Big Match Gillette ‘ , 1, 1 3 of 17

The Big Match Gillette ‘ , 1, 1, 1 3 of 17

Outcome Column player gives lim (either sup or inf) avg. of nr.s written down to row playerColumn player gives lim (either sup or inf) avg. of nr.s written down to row player 4 of 17 Next slide

Value Each player can ensure ½ (in limit)Each player can ensure ½ (in limit) Blackwell and Ferguson ’68Blackwell and Ferguson ’68 Column player has simple optimal strategyColumn player has simple optimal strategy Play uniformly at randomPlay uniformly at random Row players strategy is complicated and not optimalRow players strategy is complicated and not optimal Blackwell and Ferguson ’68Blackwell and Ferguson ’68 5 of 17 Next slide

The Big Match Gillette ‘ ½ ½ x 1-x 6 of 17 Next slide

Worthless strategies for row player Worthless strategies = ensures outcome 0Worthless strategies = ensures outcome 0 Markov (and stationary) strategiesMarkov (and stationary) strategies Markov = depends only on length of historyMarkov = depends only on length of history Blackwell and Ferguson ’68Blackwell and Ferguson ’68 Finite memory strategiesFinite memory strategies Sorin ‘02Sorin ‘02 Markov strategies ext. w. deterministic-update finite memoryMarkov strategies ext. w. deterministic-update finite memory Deterministic-update: One history ⇒ one memory stateDeterministic-update: One history ⇒ one memory state Hansen, I-J, KouckýHansen, I-J, Koucký 7 of 17 Next slide

Memory based strategies Memories M ⊆ {0,1} * Strategy σ a : M × S → Δ (A) σ a : M × S → Δ (A) σ u : M × S × A × A → Δ (M) σ u : M × S × A × A → Δ (M) In round T, memory m T, state s T : Pr[a T ] is assigned by σ a (m T, s T ) Pr[a T ] is assigned by σ a (m T, s T ) Play goes to s T+1, other player used b T : Pr[m T+1 ] is assigned by σ u (m T, s T+1, a T, b T ) Pr[m T+1 ] is assigned by σ u (m T, s T+1, a T, b T ) 8 of 17 Next slide

Asymptotic memory usage MDP w. states in uses log f(T) space whp. if only states numbered below f(T) has been visited before round T whp. for all T, no matter the choices of the playerMDP w. states in uses log f(T) space whp. if only states numbered below f(T) has been visited before round T whp. for all T, no matter the choices of the player Strategy uses f(T) space whp. if MDP defined from it on the memory states for the other player uses f(T) space whp.Strategy uses f(T) space whp. if MDP defined from it on the memory states for the other player uses f(T) space whp. 9 of 17 Next slide

Results ε -optimal strategy using O(log T) space ε -optimal strategy using O(log T) space Both inf and supBoth inf and sup Blackwell and Ferguson ’67Blackwell and Ferguson ’67 ε -optimal strategy using O(S(T)) space whp. ε -optimal strategy using O(S(T)) space whp. For any given increasing unbounded function SFor any given increasing unbounded function S Only supOnly sup Hansen, I-J, KouckýHansen, I-J, Koucký ε -optimal strategy using O(log log (T)) space whp. ε -optimal strategy using O(log log (T)) space whp. Both inf and supBoth inf and sup Hansen, I-J, KouckýHansen, I-J, Koucký 10 of 17 Next slide

Simplified strategy construction Rounds 11 of 17 Next slide

Simplified strategy construction Split into epochs, s.t. |epoch i| = S(i)Split into epochs, s.t. |epoch i| = S(i) Rounds S(1)S(2)S(3) 11 of 17

Simplified strategy construction Sample i times in epoch i uniformly at randomSample i times in epoch i uniformly at random Rounds S(1)S(2)S(3) 11 of 17

Simplified strategy construction Play like classic ε -optimal strategy on samples and otherwise play top rowPlay like classic ε -optimal strategy on samples and otherwise play top row Rounds S(1)S(2)S(3) 11 of 17

Simplified strategy construction Strategy is ε -optimal for sup avg. (inf avg. if S(T) = 2 T ) and uses space log S -1 (T) whp.Strategy is ε -optimal for sup avg. (inf avg. if S(T) = 2 T ) and uses space log S -1 (T) whp. Rounds S(1)S(2)S(3) Real strategy: more messy Real strategy: more messy 11 of 17

Extensions Concurrent mean-payoff game w. 1 state that can be leftConcurrent mean-payoff game w. 1 state that can be left Hansen, I-J, KouckýHansen, I-J, Koucký Result is similarResult is similar Any concurrent mean-payoff game (CMPG)Any concurrent mean-payoff game (CMPG) OpenOpen 12 of 17 Next slide

Overview Playing the Big Match in small spacePlaying the Big Match in small space Open problem: Playing CMPGs in small spaceOpen problem: Playing CMPGs in small space 13 of 17 Next slide

Extension to CMPGs Known: O(log T)Known: O(log T) Mertens and Neyman ‘81Mertens and Neyman ‘81 14 of 17 Next slide

Known strategy Mertens and Neyman ‘81 Rounds 15 of 17 Next slide

Known strategy Mertens and Neyman ‘81 Split into epochs, s.t. |epoch i| = L(i)Split into epochs, s.t. |epoch i| = L(i) Rounds L(1)L(2)L(3) 15 of 17

Known strategy Mertens and Neyman ‘81 For each epoch i:For each epoch i: LetLet Find discount factor λ (s(i))Find discount factor λ (s(i)) Play optimally in game with discount factor λ (s(i)) in epoch i+1Play optimally in game with discount factor λ (s(i)) in epoch i+1 Rounds L(1)L(2)L(3) λ (s(0)) λ (s(1)) λ (s(2)) λ (s(3)) 15 of 17

Problems Remembering enough for s(i) and #roundsRemembering enough for s(i) and #rounds Requires O(log T) memoryRequires O(log T) memory Likely: Can use approximationLikely: Can use approximation Find using samplingFind using sampling The function L(i) can only increase linearly in iThe function L(i) can only increase linearly in i Similar result req. fast growthSimilar result req. fast growth UnsureUnsure 16 of 17 Next slide

Thanks! 17 of 17 Next slide