No-Regret Algorithms for Online Convex Programs Geoffrey J. Gordon Carnegie Mellon University Presented by Nicolas Chapados 21 February 2007.

Slides:



Advertisements
Similar presentations
Blind online optimization Gradient descent without a gradient Abie Flaxman CMU Adam Tauman Kalai TTI Brendan McMahan CMU.
Advertisements

How to win at poker using game theory A review of the key papers in this field.
Lecture 13. Poker Two basic concepts: Poker is a game of skill, not luck. If you want to win at poker, – make sure you are very skilled at the game, and.
On-line learning and Boosting
Tuomas Sandholm, Andrew Gilpin Lossless Abstraction of Imperfect Information Games Presentation : B 趙峻甫 B 蔡旻光 B 駱家淮 B 李政緯.
Randomized Strategies and Temporal Difference Learning in Poker Michael Oder April 4, 2002 Advisor: Dr. David Mutchler.
Joint Strategy Fictitious Play Sherwin Doroudi. “Adapted” from J. R. Marden, G. Arslan, J. S. Shamma, “Joint strategy fictitious play with inertia for.
Regret Minimizing Audits: A Learning-theoretic Basis for Privacy Protection Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha Carnegie Mellon.
Learning in games Vincent Conitzer
Online learning, minimizing regret, and combining expert advice
Follow the regularized leader
1 Learning with continuous experts using Drifting Games work with Robert E. Schapire Princeton University work with Robert E. Schapire Princeton University.
Outline Dudo Rules Regret and Counterfactual Regret Imperfect Recall Abstraction Counterfactual Regret Minimization (CFR) Difficulties Fixed-Strategy.
FilterBoost: Regression and Classification on Large Datasets Joseph K. Bradley 1 and Robert E. Schapire 2 1 Carnegie Mellon University 2 Princeton University.
Games of Prediction or Things get simpler as Yoav Freund Banter Inc.
Minimax and Alpha-Beta Reduction Borrows from Spring 2006 CS 440 Lecture Slides.
Regret Minimizing Audits: A Learning-theoretic Basis for Privacy Protection Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha Carnegie Mellon.
CompSci Recursion & Minimax Playing Against the Computer Recursion & the Minimax Algorithm Key to Acing Computer Science If you understand everything,
1 Policies for POMDPs Minqing Hu. 2 Background on Solving POMDPs MDPs policy: to find a mapping from states to actions POMDPs policy: to find a mapping.
INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 1 Games, Optimization, and Online Algorithms Martin Zinkevich University of Alberta.
This time: Outline Game playing The minimax algorithm
AWESOME: A General Multiagent Learning Algorithm that Converges in Self- Play and Learns a Best Response Against Stationary Opponents Vincent Conitzer.
Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.
Experts and Boosting Algorithms. Experts: Motivation Given a set of experts –No prior information –No consistent behavior –Goal: Predict as the best expert.
Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011.
Commitment without Regrets: Online Learning in Stackelberg Security Games Nika Haghtalab Carnegie Mellon University Joint work with Maria-Florina Balcan,
Online Learning Algorithms
CPS Learning in games Vincent Conitzer
Game Theory.
Reinforcement Learning in the Presence of Hidden States Andrew Howard Andrew Arnold {ah679
MAKING COMPLEX DEClSlONS
online convex optimization (with partial information)
The Multiplicative Weights Update Method Based on Arora, Hazan & Kale (2005) Mashor Housh Oded Cats Advanced simulation methods Prof. Rubinstein.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Introduction Many decision making problems in real life
Presenter: Jen Hua Chi Adviser: Yeong Sung Lin Network Games with Many Attackers and Defenders.
Trust-Aware Optimal Crowdsourcing With Budget Constraint Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department.
1 Multiplicative Weights Update Method Boaz Kaminer Andrey Dolgin Based on: Aurora S., Hazan E. and Kale S., “The Multiplicative Weights Update Method:
Benk Erika Kelemen Zsolt
Copyright Robert J. Marks II ECE 5345 Multiple Random Variables: A Gambling Sequence.
SARTRE: System Overview A Case-Based Agent for Two-Player Texas Hold'em Jonathan Rubin & Ian Watson University of Auckland Game AI Group
1.3 Simulations and Experimental Probability (Textbook Section 4.1)
The challenge of poker NDHU CSIE AI Lab 羅仲耘. 2004/11/04the challenge of poker2 Outline Introduction Texas Hold’em rules Poki’s architecture Betting Strategy.
Poker as a Testbed for Machine Intelligence Research By Darse Billings, Dennis Papp, Jonathan Schaeffer, Duane Szafron Presented By:- Debraj Manna Gada.
For Friday Finish chapter 6 Program 1, Milestone 1 due.
Neural Network Implementation of Poker AI
1 Parrondo's Paradox. 2 Two losing games can be combined to make a winning game. Game A: repeatedly flip a biased coin (coin a) that comes up head with.
Connections between Learning Theory, Game Theory, and Optimization Maria Florina (Nina) Balcan Lecture 2, August 26 th 2010.
Lecture 12. Game theory So far we discussed: roulette and blackjack Roulette: – Outcomes completely independent and random – Very little strategy (even.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Vincent Conitzer CPS Learning in games Vincent Conitzer
5.1.Static Games of Incomplete Information
CSC321: Introduction to Neural Networks and Machine Learning Lecture 23: Linear Support Vector Machines Geoffrey Hinton.
CompSci Recursion & Minimax Recursion & the Minimax Algorithm Key to Acing Computer Science If you understand everything, ace your computer science.
Probability Distributions. Constructing a Probability Distribution Definition: Consists of the values a random variable can assume and the corresponding.
OPPONENT EXPLOITATION Tuomas Sandholm. Traditionally two approaches to tackling games Game theory approach (abstraction+equilibrium finding) –Safe in.
Extensive-Form Game Abstraction with Bounds
Lecture 13.
LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.
Game Theory Just last week:
Chapter 11 Dynamic Programming.
Task: It is necessary to choose the most suitable variant from some set of objects by those or other criteria.
Extensive-form games and how to solve them
Strategies for Poker AI player
CSCI B609: “Foundations of Data Science”
The
Stat 35b: Introduction to Probability with Applications to Poker
CS639: Data Management for Data Science
The Improved Iterative Scaling Algorithm: A gentle Introduction
Presentation transcript:

No-Regret Algorithms for Online Convex Programs Geoffrey J. Gordon Carnegie Mellon University Presented by Nicolas Chapados 21 February 2007

Outline Online learning setting Definition of Regret Safe Set Lagrangian Hedging (gradient form) Lagrangian Hedging (optimization form) Mention of Theoretical Results Application: One-Card Poker

Online Learning Sequence of trials 1, 2, … At each trial we must pick a hypothesis y i Correct answer revealed in the form of a convex loss function l t (y t ) Just before seeing t-th example, total loss is given by

Goal of Paper Introduce Lagrangian Hedging algorithm Generalization of other algorithms –Hedge (Freund and Schapire) –Weighted Majority (Littlestone and Warmuth) –External-regret Matching (Hart and Mas- Colell) (CMU Technical Report is much clearer than NIPS paper)

Regret If we had used a fixed hypothesis y, the loss would have been The regret is the difference between the total loss of the adaptive and fixed hypotheses: Positive regret means that we should have preferred the fxed hypothesis

Hypothesis Set Assume that hypothesis set Y is a convex subset of R d For example, the simplex of probability distributions The corners of Y represent pure actions and the middle region a probability distribution over actions

Loss Function Minimize a linear loss

Regret Vector Keep the state of the learning algorithm Vector that keeps information about actual losses and gradient of loss function Define regret vector s t by the recursion Arbitrary vector u which satisfies for all Example: if y is a probability, then u can be the vector of all ones.

Use of Regret Vector Given any hypothesis y, we can use the regret vector to compute its regret:

Safe Set Region of the regret space in which the regret is guaranteed to be nonpositive for all hypotheses Goal of the Lagrangian Hedging algorithm is to keep its regret vector « near » the safe set

Safe Set (continued) Hypothesis set Y Safe Set S

Unnormalized Hypotheses Consider the cone of unnormalized hypotheses: The safe set is a cone that is polar to this cone of unnormalized hypotheses:

Lagrangian Hedging (Setting) At each step, the algorithm chooses its play according to the current regret vector and a closed convex potential function F(s) Define (sub)gradient of F(s) as f(s) Potential function is what defines the problem to be solved E.g. Hedge / Weighted Majority:

Lagrangian Hedging (Gradient)

Optimization Form In practice, may be difficult to define, evaluate and differentiate an appropriate potential function Optimization form: same pseudo-code as previously, but define F in terms of a simpler hedging function W Example corresponding to previous F 1

Optimization Form (cont’d) Then may obtain F as: And the (sub)gradient as: Which we may plug into the previous pseudo-code

Theoretical Results (In a nutshell: it all works)

One-Card Poker Hypothesis space is the set of sequence weight vectors –information about when it is player i’s turn to move and the actions available at that time Two players: gambler and dealer Ante = $1 / given 1 card from 13-card deck Gambler Bets / Dealer Bets / Gambler Bets A player may fold If neither folds: player with highest card wins pot

Why is it interesting? Elements of more complicated games: –Incomplete information –Chance events –Multiple stages Optimal play requires randomization and bluffing

Results in Self-Play

Results Against Fixed Opponent