Presentation is loading. Please wait.

Presentation is loading. Please wait.

George Boulougaris, Kostas Kolomvatsos, Stathes Hadjiefthymiades Building the Knowledge Base of a Buyer Agent Using Reinforcement Learning Techniques Pervasive.

Similar presentations


Presentation on theme: "George Boulougaris, Kostas Kolomvatsos, Stathes Hadjiefthymiades Building the Knowledge Base of a Buyer Agent Using Reinforcement Learning Techniques Pervasive."— Presentation transcript:

1 George Boulougaris, Kostas Kolomvatsos, Stathes Hadjiefthymiades Building the Knowledge Base of a Buyer Agent Using Reinforcement Learning Techniques Pervasive Computing Research Group, Department of Informatics and Telecommunications University of Athens, Greece WCCI – IJNN 2010 Barcelona - Spain

2 Outline 2 Introduction Market Members Scenario Buyer Q-Table Buyer Purchase Behavior Results

3 Introduction 3 Intelligent Agents Autonomous software components Represent Users Learn from their owners Electronic Markets Places where entities not known in advance can negotiate over the exchange of products Reinforcement Learning General framework for sequential decision making Leads to the maximum long-term reward at every state of the world

4 Market Members 4 Buyers Sellers Middle entities (matchmakers, brokers, market entities)  Intelligent agents may represent each of these entities Entities do not have any information about the rest in the market

5 Scenario (1/2) 5 Buyers: could interact with sellers could interact with brokers or matchmakers (matchmakers cannot sell products) want to buy the most appropriate product in the most profitable price We focus on the interaction between buyers and selling entities (sellers or brokers) Most of the research efforts focus only on the reputation of entities We utilize Q-Learning that is appropriate to result actions that lead to the maximum long-term reward (based on a number of parameters) at every state of the world

6 Scenario (2/2) 6 The products parameters for each selling entity are: ID Time validity Price Time availability Relevance Each selling entity represents the state that the buyer is

7 Buyer Q-Table (1/3) 7 The buyer has one Q-Table for each product Rows represent states and columns represent actions There are M+1 columns (M is the number of selling entities) Actions [1..M] represent the transition to the [1..M] entity (row of the Q-Table) The transition to another entity corresponds to a ‘not-buy- from-this-entity’ action Action M+1 represent the purchase action (from the specific entity) The buyer final Q-Table is a 3D table

8 Buyer Q-Table (2/3) 8 The buyer takes into consideration the following information in order to build the Q-Table: Relevancy factor Price Response time Number of transitions The equation used is: where l is the learning rate, r is the reward, γ is the future reward discount factor, s t and a t is the state and the action at the time t

9 Buyer Q-Table (3/3) 9 Issues concerning the reward: has 5% decrement when deal with entities not having the product is based on: the reward for the relevancy the reward for the price the reward for the response time the reward for the required transitions the greater the relevancy is the greater the reward becomes the smaller the price is the greater the reward becomes the smaller the response time is the greater the reward becomes the smaller the number of transitions is the greater the reward becomes

10 Buyer Purchase Behavior 10 The buyer is based on the Q-Table for the purchase action There are two phases in its behavior First Phase It creates the Q-Table It uses a specific number of episodes in the training phase Second Phase It utilizes the Q-Table for its purchases At first randomly selects an entity (row) for a specific product Accordingly selects the action with the highest reward If the best action is to return to a previous visited entity with inability to deliver, the purchase is not feasible

11 Results (1/4) 11 We consider a dynamic market where the number and the characteristics of entities is not static In our experiments we take into consideration the following probabilities: 2% that a new product is available in an entity 5% that a product is totally new in the market 5% that a product is no longer available in an entity 2% that an entity is totally new in the market 1% that an entity is not able anymore for negotiations We examine the purchases of 400 products in each experiment

12 Results (2/4) 12 Tables creation time results Entities Number (5 Products each) First Table creation time (ms) Average tables creation time (except first table) (ms) 4150 12517.86 501685402.73 100165203546.44 20020808841846.64 Entities Number (40 Products each) First Table creation time (ms) Average tables creation time (except first table) (ms) 615615.50 15561114.33 303510453.73 60326672386.50 1001913038254.22 Products Number (15 Entities) First Table creation time (ms) Average tables creation time (except first table) (ms) 512517.86 40561114.33 801029210.60 1501731374.57 50061551319.14 1000149172940.86 500019364414914.00

13 Results (3/4) 13 Q-Learning reduces the required purchase steps Entities Number (5 Products each) Total moves for 400 Products Total moves for 400 Products (without using Q-learning) Moves reduction using Q-learning 46532000-67.35% 157166400-88.81% 5071420400-96.50% 10073240400-98.19% 20076880400-99.04% Entities Number (40 Products each) Total moves for 400 Products Total moves for 400 Products (without using Q-learning) Moves reduction using Q-learning 67182800-74.36% 157056400-88.98% 3070312400-94.33% 6069324400-97.16% 10071240400-98.24%

14 Results (4/4) 14 Q-Learning reduces the average price and the average response time as the number of entities increases Q-Learning does not affect basic parameters as the number of products increases

15 15 Thank you http://p-comp.di.uoa.gr


Download ppt "George Boulougaris, Kostas Kolomvatsos, Stathes Hadjiefthymiades Building the Knowledge Base of a Buyer Agent Using Reinforcement Learning Techniques Pervasive."

Similar presentations


Ads by Google