Use of Markov Chains to Design an Agent Bidding Strategy for Continuous Double Auctions Sunju Park Management Science and Information Systems Department.

Slides:

Advertisements

Similar presentations

Reinforcement Learning

Advertisements

Programming exercises: Angel – lms.wsu.edu – Submit via zip or tar – Write-up, Results, Code Doodle: class presentations Student Responses First visit.

Nash’s Theorem Theorem (Nash, 1951): Every finite game (finite number of players, finite number of pure strategies) has at least one mixed-strategy Nash.

Mechanism Design, Machine Learning, and Pricing Problems Maria-Florina Balcan.

CROWN “Thales” project Optimal ContRol of self-Organized Wireless Networks WP1 Understanding and influencing uncoordinated interactions of autonomic wireless.

1 Dynamic Programming Week #4. 2 Introduction Dynamic Programming (DP) –refers to a collection of algorithms –has a high computational complexity –assumes.

SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.

1 A class of Generalized Stochastic Petri Nets for the performance Evaluation of Mulitprocessor Systems By M. Almone, G. Conte Presented by Yinglei Song.

Understand Merchandise Planning in Retailing. The Merchandise Plan A budgeting tool that helps retailer or buyer to meet department goals ▫Planned sales.

Fundamentals of Markets © 2011 D. Kirschen and the University of Washington 1.

Flows and Networks (158052) Richard Boucherie Stochastische Operations Research -- TW wwwhome.math.utwente.nl/~boucherierj/onderwijs/158052/ html.

Planning under Uncertainty

Maria-Florina Balcan Approximation Algorithms and Online Mechanisms for Item Pricing Maria-Florina Balcan & Avrim Blum CMU, CSD.

POMDPs: Partially Observable Markov Decision Processes Advanced AI

A Heuristic Bidding Strategy for Multiple Heterogeneous Auctions Patricia Anthony & Nicholas R. Jennings Dept. of Electronics and Computer Science University.

CSE 3504: Probabilistic Analysis of Computer Systems Topics covered: Continuous time Markov chains (Sec )

Markov Decision Models for Order Acceptance/Rejection Problems Florian Defregger and Heinrich Kuhn Florian Defregger and Heinrich Kuhn Catholic University.

InfoCenters and Information E-markets Itai Yarom PhD Researcher - AI Lab

Sequences of Take-It-or-Leave-it Offers: Near-Optimal Auctions Without Full Valuation Revelation Tuomas Sandholm and Andrew Gilpin Carnegie Mellon University.

1 Auction or Tâtonnement – Finding Congestion Prices for Adaptive Applications Xin Wang Henning Schulzrinne Columbia University.

Auction Design for Atypical Situations. Overview General review of common auctions General review of common auctions Auction design for agents with hard.

Competitive Analysis of Incentive Compatible On-Line Auctions Ron Lavi and Noam Nisan SISL/IST, Cal-Tech Hebrew University.

Department of Computer Science Undergraduate Events More

Exploration in Reinforcement Learning Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham, UK

Introducing Information into RM to Model Market Behavior INFORMS 6th RM and Pricing Conference, Columbia University, NY Darius Walczak June 5, 2006.

Learning dynamics,genetic algorithms,and corporate takeovers Thomas H. Noe,Lynn Pi.

UCT (Upper Confidence based Tree Search) An efficient game tree search algorithm for the game of Go by Levente Kocsis and Csaba Szepesvari [1]. The UCB1.

Instructor: Vincent Conitzer

MAKING COMPLEX DEClSlONS

Sequences of Take-It-or-Leave-it Offers: Near-Optimal Auctions Without Full Valuation Revelation Tuomas Sandholm and Andrew Gilpin Carnegie Mellon University.

Chapter 6 Sourcing. Objectives After reading the chapter and reviewing the materials presented the students will be able to: Explain the difference between.

General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Duke University Machine Learning Group Discussion Leader: Kai Ni June 17, 2005.

Optimal Degree Distribution for LT Codes with Small Message Length Esa Hyytiä, Tuomas Tirronen, Jorma Virtamo IEEE INFOCOM mini-symposium

1 Near-Optimal Play in a Social Learning Game Ryan Carr, Eric Raboin, Austin Parker, and Dana Nau Department of Computer Science, University of Maryland.

Effect of Learning and Market Structure on Price Level and Volatility in a Simple Market Walt Beyeler 1 Kimmo Soramäki 2 Robert J. Glass 1 1 Sandia National.

Session 8 University of Southern California ISE544 June 18, 2009 Geza P. Bottlik Page 1 Outline Two party distributive negotiations (Win/Lose) –Case history.

1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.

Analysis of Algorithms CSCI Previous Evaluations of Programs Correctness – does the algorithm do what it is supposed to do? Generality – does it.

CS 415 – A.I. Slide Set 5. Chapter 3 Structures and Strategies for State Space Search – Predicate Calculus: provides a means of describing objects and.

CPSC 7373: Artificial Intelligence Lecture 10: Planning with Uncertainty Jiang Bian, Fall 2012 University of Arkansas at Little Rock.

George Boulougaris, Kostas Kolomvatsos, Stathes Hadjiefthymiades Building the Knowledge Base of a Buyer Agent Using Reinforcement Learning Techniques Pervasive.

Gabor Fath Research Institute for Solid State Physics, HAS, Budapest Miklos Sarvary INSEAD, Fontainebleau Should prices be always lower on the Internet?

Bargaining as Constraint Satisfaction Simple Bargaining Game Edward Tsang

Markup, Markdown, Inventory Management Madam Zakiah Hassan 8 March 2012.

ECE 466/658: Performance Evaluation and Simulation Introduction Instructor: Christos Panayiotou.

© 2015 McGraw-Hill Education. All rights reserved. Chapter 19 Markov Decision Processes.

Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 9 of 42 Wednesday, 14.

George F Luger ARTIFICIAL INTELLIGENCE 6th edition Structures and Strategies for Complex Problem Solving Machine Learning: Probabilistic Luger: Artificial.

Evaluation of harvest control rules (HCRs): simple vs. complex strategies Dorothy Housholder Harvest Control Rules Workshop Bergen, Norway September 14,

An Introduction to the Principles of Marketing M A R K E T I N G Click to Begin.

CPS 570: Artificial Intelligence Markov decision processes, POMDPs

Flows and Networks (158052) Richard Boucherie Stochastische Operations Research -- TW wwwhome.math.utwente.nl/~boucherierj/onderwijs/158052/ html.

Definition and Complexity of Some Basic Metareasoning Problems Vincent Conitzer and Tuomas Sandholm Computer Science Department Carnegie Mellon University.

Business Modeling Lecturer: Ing. Martina Hanová, PhD.

TOPIC 3 NOTES. AN INTRODUCTION TO DEMAND Demand depends on two variables: the price of a product and the quantity available at a given point in time.

Multi-Agents System CMSC 691B Gunjan Kalra Peter DSouza.

Introducing Information into RM to Model Market Behavior INFORMS 6th RM and Pricing Conference, Columbia University, NY Darius Walczak June 5, 2006.

Chapter 5 Capacity Planning.

Evolutionary Technique for Combinatorial Reverse Auctions

Chapter 5 Capacity Planning.

th IEEE International Conference on Sensing, Communication and Networking Online Incentive Mechanism for Mobile Crowdsourcing based on Two-tiered.

Professor Arne Thesen, University of Wisconsin-Madison

Hidden Markov Models Part 2: Algorithms

Game Theory in Wireless and Communication Networks: Theory, Models, and Applications Lecture 2 Bayesian Games Zhu Han, Dusit Niyato, Walid Saad, Tamer.

Instructors: Fei Fang (This Lecture) and Dave Touretzky

Instructor: Vincent Conitzer

October 6, 2011 Dr. Itamar Arel College of Engineering

Chapter 6: Temporal Difference Learning

Use of Markov Chains to Design an Agent Bidding Strategy for Continuous Double Auctions Sunju Park Management Science and Information Systems Department.

Reinforcement Learning Dealing with Partial Observability

Presentation transcript:

Use of Markov Chains to Design an Agent Bidding Strategy for Continuous Double Auctions Sunju Park Management Science and Information Systems Department Rutgers Business School, Rutgers University Edmund H. Durfee Artificial Intelligence Laboratory, University of Michigan William P. Birmingham Math & Computer Science Department, Grove City College Presenter: TinTin Yu

Introduction Not like tradition auctions  Single seller and multiple buyers (e.g. eBay) Continuous Double Auctions (CDA)  Buyers place bids, and sellers place offers to the same items.  We have a match whenever a buyer’s bid is higher than a seller’s offer.  (e.g. Name your price (hotel.com?) Goal  To determine the optimal price/offer for a seller in order to gain the maximum profit.

Definitions Notation: bbss p  b: buyer’s bid; s: seller’s offer  s p : seller’s offer that was just submitted  bbss p : a queue in ascending order (of price) Clearing Price (CP)  bs p bs: When an offer is less than a bid  s p <=CP<=b (the right most b)  We use s p in this paper.

Definitions Markov Chains (Markov state machine)  Probabilistic finite state machine  Input is ignored  We uses first-order Markov chain only First-order means the probability of the present state is a function only to its direct predecessor states.

p-strategy Algorithm (1/2)

p-strategy Algorithm (2/2) Information used by p-strategy

Step1: Building Markov Chains (1/3) Given a current state (bbs).  When the p-seller (a seller use p-strategy) submit its offer s p, there are four possible next auction states.  We make these states the initial states of the Markov Chain.

Step1: Building Markov Chains (2/3) From the initial states, we keep populate the (bbss) queue by either submitting a new buyer bid or a seller offer. If we have a match, it goes to the SUCCESS state. If it goes out of the bound (maximum number of standing offers), it goes to the FAIL state.

Step1: Building Markov Chains (3/3) The MC model of the CDA with starting state (bbs) and the number of bids and offers are limited to 5 each.

Step2: Compute Utilities (1/5) Step2.1: The utilities function  P s (p): probability of success at price p  U(Payoff s (p)): utilities of payoff if the offer receives a match  CP: clearing price  C: cost  TD(  s/f ): delay overhead

Step2: Compute Utilities (2/5) Things we need to compute for each p

Step2: Compute Utilities (3/5) Step2.2.1: Transition Probabilities  Going from state (bbs) to (bbss p ) at time step n  That is P (bbss p | bbs);  Applying Baye’s rule;  Evaluating using probability density function (PDF), f(s); bababa…

Step2: Compute Utilities (4/5) Step2.2.2: TD(  s/f ): delay overhead  Too complex to cover in details  It involves building a transition probability matrix P from the states of the Markov Chain we built in step1.  Here is listed equations:   : reward = c (a constant) except for the initial states and the absorbing states   : the number of visits to state (…) until it goes to S.

Step2: Compute Utilities (5/5) Plug in the numbers and we will get a expected utility value associated with price p. The algorithm find the optimal price p by looping through all p in a possible range. Time complexity of the algorithm is O(  n 3 ), where  is the number of possible prices, n is the number of MC states.

Benchmark (1/6) Agents used for comparison  FM: Fixed-Markup bids its cost plus some predefined markup  RM: Random-Markup bids its cost plus some random markup  CP: Clearing-Price obtains a clearing-price quote (similar to FM agent)  OPT: Post-facto Optimal our benchmark strategy. Given it “knows” exactly everything about the future (no uncertainty at all), it returns the maximum profit an agent may have achieved.

Benchmark (2/6)

Benchmark (3/6): p-strategy vs other Results: Arrival rate:  0.4=high  0.1=low negotiation zone  narrow:  =5

Benchmark (4/6): p-strategy vs other Results: Arrival rate:  0.4=high  0.1=low negotiation zone  narrow:  =25

Benchmark (5/6) : p-strategy vs itself Results Profit of individual p-agent decrease as the number of p-agents increase. However, when there is more buyers, p-agents are able to gain similar profit at the expense of buyers.

Benchmark (6/6) : CP vs multiple p and CP Results CP-strategy agents are able to raises profit as the number mixed p-agents and CP-agents increase.

Conclusion Summary:  p-strategy is based on stochastic modeling of the auction process.  It works while it does not need to consider much about the other individual agents. Time complexity only depends on the number of MC states, not the number of agents.  It out performs other agents (FM/ RM/ CP) Future Work  Similar strategy can be apply to buyers.  Analysis shows an average of 20% gap between p-strategy and the optimal one.  Ongoing work: hybrid strategy. This adaptive approach allow the agent to figure out when to use stochastic model and when to use some simpler strategies.

Question to think about Human can think very differently:  e.g. Selling a 50” plasma HDTV Place a very low selling price like $1.00 without a hidden limit. Shipping cost = $ ?! Can artificial intelligent agents think outside the box?

Your Questions

Bibliography Park, S., Durfee, E.H. and Birmingham, W.P. (2004) "Use of Markov Chains to Design an Agent Bidding Strategy for Continuous Double Auctions", Volume 22, pages