Intelligence for Games and Puzzles1 Berliner’s 1979 B* Algorithm: Motivations The B* algorithm is primarily a best-first game tree proof procedure. Effort limits on et al are artificial, and lead to horizon effects. Depth-First searches tend to investigate huge trees, large parts of which have nothing to do with the solution. Skilled players do not do this. Most game-tree searches do not hope to find a complete path to a solution, but only a best move toward a solution. It is desirable to discontinue searching a branch whose value is already sufficient for the proof being attempted. Grandmasters say things like “In case of time pressure I would play B-N5” If there is only one possible move, no search at all is needed Traditional best-first algorithms still spend considerable effort searching within the best branch before confirming that it is in fact best (with NegaMax, lowest child is best choice)
Intelligence for Games and Puzzles2 Dual evaluations: Optimistic and Pessimistic The B* Algorithm uses two evaluations for every node: An optimistic one, hopefully valid as an upper bound on the true value A pessimistic one, hopefully valid as a lower bound on the true value But maybe only one evaluation function is needed, an optimistic one, because what is optimistic for one player can be treated as pessimistic for the other, &vv. At leaf nodes, null moves will allow evaluation from the other point of view When computing values for interior nodes, from player A’s point of view: A’s Optimistic arises from the best-for-A of the children B-Pess values A’s Pessimistic arises from the best-for-A of the children B-Opt values
Intelligence for Games and Puzzles3 A’s Optimism from B’s Pessimism (Note, Berliner’s paper does not present the B* algorithm in NegaMax style, rather it speaks of players MAX and MIN, with all evaluations interpreted as bigger-numbers-better-for-MAX. I adapt it to NegaMax.) A’s Upper Bound comes from the best-for-A of B’s Lower Bounds, And Vice Versa. +30,+15+19,+10+22,+8 -8, ,+3+39,+10+42,+8 -3,-39 A nodes B nodes -15,-30
Intelligence for Games and Puzzles4 A’s Optimism from B’s Pessimism (Note, Berliner’s paper does not present the B* algorithm in NegaMax style, rather it speaks of players MAX and MIN, with all evaluations interpreted as bigger-numbers-better-for-MAX. I adapt it to NegaMax.) A’s Upper Bound comes from the best-for-A of B’s Lower Bounds, And Vice Versa. +ve good for A A nodes Opt >= Pess +ve good for B B nodes Opt >= Pess +30,+15+19,+10+22,+8 -8, ,+3+39,+10+42,+8 -3, ,-30
Intelligence for Games and Puzzles5 A’s Optimism from B’s Pessimism (Note, Berliner’s paper does not present the B* algorithm in NegaMax style, rather it speaks of players MAX and MIN, with all evaluations interpreted as bigger-numbers-better-for-MAX. I adapt it to NegaMax.) A’s Upper Bound comes from the best-for-A of B’s Lower Bounds, And Vice Versa. +ve good for A A nodes Opt >= Pess +ve good for B B nodes Opt >= Pess +39,+8 +30,+15+19,+10+22,+8 -8, ,+3+39,+10+42,+8 -3, ,-30
Intelligence for Games and Puzzles6 Dual values rather than single values Evaluations are used to choose moves. What is to be proven about a chosen move? With single valued evaluations, Minimaxing algorithms (including alphabeta etc) seek to prove that no other move at the root gives a better value than the chosen move. With dual evaluations, The B* algorithm seeks to prove that no other move at the root has a better optimistic value than the pessimistic value of the chosen move. This is used as its terminating condition.
Intelligence for Games and Puzzles7 B*’s two proof strategies When at the root of the game tree for choosing a move, The Prove_Best Strategy hopes to improve (from A’s point of view) the pessimistic bound of the most optimistic child node so that it is not worse than the optimistic bound of any of its siblings. The Disprove_Rest Strategy hopes to disimprove (from A’s point of view) the upper bounds of the competing siblings so that they are not better than the pessimistic bound of the most optimistic. To change any bound of a node requires further search in its subtree. -18, ,-3-22, -24 root -18,-26+17,-3-22,-24 root +26,+18+24,+22+3,-17
Intelligence for Games and Puzzles8 B* PseudoCode { initialise; repeat { expand CurrentNode if necessary; assign BestNode, AltNode, MaxOpt, MaxPess; while CurrentNode.BoundsAreWrong() { CurrentNode.UpdateBounds(); if CurrentNode=TopNode then {if Altnode.Opt >= BestNode.Pess then Return BestNode else Break } /* from While */ else CurrentNode:=CurrentNode.Parent} if CurrentNode=Topnode then Strategy:=Decide() if Strategy=‘ProveBest’ then CurrentNode:=BestNode else CurrentNode:=AltNode}
Intelligence for Games and Puzzles9 B* PseudoCode { initialise; repeat { expand CurrentNode if necessary; assign BestNode, AltNode, MaxOpt, MaxPess; while CurrentNode.BoundsAreWrong() { CurrentNode.UpdateBounds(); if CurrentNode=TopNode then {if Altnode.Opt >= BestNode.Pess then Return BestNode else Break } /* from While */ else CurrentNode:=CurrentNode.Parent} if CurrentNode=Topnode then Strategy=Decide() if Strategy=‘ProveBest’ then CurrentNode:=BestNode else CurrentNode:=AltNode}
Intelligence for Games and Puzzles10 B* PseudoCode CurrentNode := TopNode TopNode.Opt := - TopNode.Pess := + { initialise; repeat { expand CurrentNode if necessary; assign BestNode, AltNode, MaxOpt, MaxPess; while CurrentNode.BoundsAreWrong() { CurrentNode.UpdateBounds(); if CurrentNode=TopNode then {if Altnode.Opt >= BestNode.Pess then Return BestNode else Break } /* from While */ else CurrentNode:=CurrentNode.Parent} if CurrentNode=Topnode then Strategy=Decide() if Strategy=‘ProveBest’ then CurrentNode:=BestNode else CurrentNode:=AltNode}
Intelligence for Games and Puzzles11 B* PseudoCode BestNode := child of CurNode with best Opt AltNode := child of CurNode with 2nd best Opt MaxOpt := BestNode.Opt MaxPess := best Pess of children { initialise; repeat { expand CurrentNode if necessary; assign BestNode, AltNode, MaxOpt, MaxPess; while CurrentNode.BoundsAreWrong() { CurrentNode.UpdateBounds(); if CurrentNode=TopNode then {if Altnode.Opt >= BestNode.Pess then Return BestNode else Break } /* from While */ else CurrentNode:=CurrentNode.Parent} if CurrentNode=Topnode then Strategy=Decide() if Strategy=‘ProveBest’ then CurrentNode:=BestNode else CurrentNode:=AltNode}
Intelligence for Games and Puzzles12 B* PseudoCode MaxOpt worse than CurrentNode.Pess or MaxPess better than CurrentNode.Opt { initialise; repeat { expand CurrentNode if necessary; assign BestNode, AltNode, MaxOpt, MaxPess; while CurrentNode.BoundsAreWrong() { CurrentNode.UpdateBounds(); if CurrentNode=TopNode then {if Altnode.Opt >= BestNode.Pess then Return BestNode else Break } /* from While */ else CurrentNode:=CurrentNode.Parent} if CurrentNode=Topnode then Strategy=Decide() if Strategy=‘ProveBest’ then CurrentNode:=BestNode else CurrentNode:=AltNode}
Intelligence for Games and Puzzles13 B* PseudoCode (NegaMax assumed here) { initialise; repeat { expand CurrentNode if necessary; assign BestNode, AltNode, MaxOpt, MaxPess; while CurrentNode.BoundsAreWrong() { CurrentNode.UpdateBounds(); if CurrentNode=TopNode then {if Altnode.Opt >= BestNode.Pess then Return BestNode else Break } /* from While */ else CurrentNode:=CurrentNode.Parent} if CurrentNode=Topnode then Strategy=Decide() if Strategy=‘ProveBest’ then CurrentNode:=BestNode else CurrentNode:=AltNode}
Intelligence for Games and Puzzles14 B* PseudoCode (Thereby hangs a tale …) { initialise; repeat { expand CurrentNode if necessary; assign BestNode, AltNode, MaxOpt, MaxPess; while CurrentNode.BoundsAreWrong() { CurrentNode.UpdateBounds(); if CurrentNode=TopNode then {if Altnode.Opt >= BestNode.Pess then Return BestNode else Break } /* from While */ else CurrentNode:=CurrentNode.Parent} if CurrentNode=Topnode then Strategy=Decide() if Strategy=‘ProveBest’ then CurrentNode:=BestNode else CurrentNode:=AltNode}
Intelligence for Games and Puzzles15 B* Observations B* outperforms conventional best-first algorithm. Like conventional best-first algorithm, it requires bookkeeping - nodes in the tree are generated but not destroyed. The determination of when to pursue the DisproveRest strategy can be made on various criteria, from which different variants of the algorithm will follow. The more complex this determination,in terms of number of parameters considered, the greater the reduction of search space achievable The larger the search tree, the more pronounced the effect of a good algorithm The DisproveRest strategy means that B* is not strictly speaking a best-first algorithm; sometimes it explores avenues believing them to be not best Terminating search without determining an exact value is reminiscent of a still earlier idea, that AlphaBeta search might terminate whenever Alpha and Beta get to be close enough to each other. Hans Berliner: The B* Tree Search Algorithm: A Best-First Proof Procedure Artificial Intelligence 12 (1979) pp 23-40