Download presentation
Presentation is loading. Please wait.
Published byFrancine Blake Modified over 9 years ago
1
August 20 20016th Computer Olympiad1 Learning Opponent-type Probabilities for PrOM search Jeroen Donkers IKAT Universiteit Maastricht
2
August 20 20016th Computer Olympiad2 Contents OM search and PrOM search Learning for PrOM search Off-line Learning On-line Learning Conclusions & Future research
3
August 20 20016th Computer Olympiad3 OM search –MAX player uses evaluation function V 0 –Opponent uses different evaluation function (V op ) –At MIN nodes: predict which move the opponent will select (using standard search and V op ) –At MAX nodes, pick the move that maximizes the search value (based on V 0 ) –At leaf nodes: use V 0
4
August 20 20016th Computer Olympiad4 PrOM search Extended Opponent Model: –a set of opponent types (e.g. evaluation functions) –a probability distribution over this set Interpretation: At every move, the opponent uses a random device to pick one of the opponent types, and plays using the selected type.
5
August 20 20016th Computer Olympiad5 PrOM search algorithm At MIN nodes: determine for every opponent type which move would be selected. Compute the MAX player’s value for these moves Use opponent-type probabilities to compute the expected value of the MIN node at MAX nodes: select maximum child
6
August 20 20016th Computer Olympiad6 Learning in PrOM search How do we assess the probabilities on the opponent types? –Off line: use games previously played by the opponent, to estimate the probabilities. (lot of time and - possibly - data available) –On line: use the observed moves during a game to adjust the probabilities. (only little time and few observations) prior probabilities are needed.
7
August 20 20016th Computer Olympiad7 Off-Line Learning Ultimate Learning Goal: find P ** (opp) for a given opponent and given opponent types such that PrOM search plays the best against that opponent. Assumption: PrOM search plays the best if P ** = P *, where P * (opp) is the mixed strategy that predicts the moves of the opponent the best.
8
August 20 20016th Computer Olympiad8 Off-Line Learning How to obtain P*(opp)? Input: a set of positions and the moves that the given opponent and all the given opponent types would select “Algorithm”: P*(opp i ) = N i / N But: leave out all ambiguous positions! (e.g. when more than one opponent type agree with the opponent)
9
August 20 20016th Computer Olympiad9 Off-Line Learning Case I: The opponent is using a mixed strategy P # (opp) of the given opponent types –Effective learning is possible (P*(opp) P # (opp)) –More difficult if the opponent types are not independent
10
August 20 20016th Computer Olympiad10 5 opponent types P = (a,b,b,b,b) 20 moves 100 - 100,000 runs 100 samples Not leaving out ambiguous events
11
August 20 20016th Computer Olympiad11 5 opponent types P = (a,b,b,b,b) 20 moves 10 - 100,000 runs 100 samples Leaving out ambiguous events
12
August 20 20016th Computer Olympiad12 2-20 opponent types P = (a,b,b,b,b) 20 moves 100,000 runs 100 samples Varying number of opponent types
13
August 20 20016th Computer Olympiad13 Off-Line Learning Case 2: The opponent is using a different strategy. –Opponent types behave random but dependent (distribution of type i depends on type i-1) –Real opponent selects a fixed move
14
August 20 20016th Computer Olympiad14 Learning error Learned probabilities
15
August 20 20016th Computer Olympiad15 Fast On-Line Learning At the principal MIN node, only the best moves for every opponent type are needed Increase the probability of an opponent type slightly if the observed move is the same as the selected move of this opponent type only. Normalize all probabilities. Drift to one opponent type is possible.
16
August 20 20016th Computer Olympiad16 Slower On-Line Learning Naive Bayesian (Duda & Hart’73) Compute the value of every move at the principal MIN node for every opponent type Transform these values into conditional probabilities P(move | opp). Compute P(opp | move obs ) using P*(opp) (Bayes rule) take P*(opp) = a.P*(opp) + (1- a) P(opp | move obs )
17
August 20 20016th Computer Olympiad17 Naïve Bayesian Learning In the end, drifting to 1-0 probabilities will occur almost always Parameter a is very important for the actual performance: –amount of change in the probabilities –convergence –drifting speed It should be tuned in a real setting
18
August 20 20016th Computer Olympiad18 Conclusions Effective off-line learning of probabilities is possible, when ambiguous events are disregarded. Off-line learning also works if the opponent does not use a mixed strategy of known opponent types. On-line learning must be tuned precisely to a given situation
19
August 20 20016th Computer Olympiad19 Future Research PrOM search and learning in real game playing –Zanzibar Bao (8x4 mancala) –LOA (some experiment with OM-search done) –Chess endgames
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.