Presentation is loading. Please wait.

Presentation is loading. Please wait.

Othello Processor Bret Taylor, Olatunji Ruwase, and Jim Norris CS343, Spring 2003.

Similar presentations


Presentation on theme: "Othello Processor Bret Taylor, Olatunji Ruwase, and Jim Norris CS343, Spring 2003."— Presentation transcript:

1 Othello Processor Bret Taylor, Olatunji Ruwase, and Jim Norris CS343, Spring 2003

2 Project Overview AI algorithms for games like Othello are highly parallelizable  Generally, algorithm performance is increased by adding more, generic processors What price-performance can we achieve with a custom processor?  What granularity of custom instructions achieves greatest price-performance?  How generic can we make the instructions?

3 Overview of Othello Goal: Have more pieces when the game ends You flip your opponent’s pieces when you surround them with your pieces on each end

4 Overview of Othello Goal: Have more pieces when the game ends You flip your opponent’s pieces when you surround them with your pieces on each end

5 Overview of Othello Goal: Have more pieces when the game ends You flip your opponent’s pieces when you surround them with your pieces on each end

6 Othello in Academia Software  Rosenbloom, Paul S.: A World-Championship-Level Othello Program  Lee, K.; Mahajan, S.: The Development of a World Class Othello Program No specific hardware implementations, but related work for MiniMax-style algorithms  Powley, Curtis Nelson: Parallel Tree Search on a Single-Instruction, Multiple-Data (SIMD) Machine

7 Common Algorithm Structure MiniMax search to a depth determined by global or per-move time limit Heuristics evaluate “value” of move at leaves

8 Common Algorithm Properties The deeper the search, the better the processor  Assumes heuristic is “reasonable,” i.e., it does not get worse with more information Effectively infinitely parallelizable Many operations are expensive in software:  Determining successors (“is valid move”)  Calculating successors (“make move”)  Heuristic calculation

9 Our Othello Implementation Based on Iago, concentrating on high-quality heuristic variables:  Stability – Number of pieces that can never be flipped  Mobility – Number of available moves  Piece differential  Vulnerability – Entrance points to stable squares on the corners and sides Heuristic value is weighted sum of variables  Weights learned through reinforcement learning  Weights vary over the course of the game

10 Software Overview Boards are 128-bit entries (2 bits per piece) Lookup tables for things like stability Lookup tables are indexed by the ternary number represented by the row or column: 1 + 2 * 3 2 + 2 * 3 3 + 3 4 154

11 Software Trace Profile Vast majority of CPU time consumed in IsLegalMove and MakeMove  53.21% DoOneDirection  30.64% DoAllDirections Loops in all rows/diagonals to find/flip valid directions Called to find successors, to calculate mobility, and to make moves Operations are common to all Othello players (extensions are at least slightly generic)!

12 Flip Instruction Granularity Split a single MakeMove or IsLegalMove operation into a sequence of four operations corresponding the four flip directions (row, column, rdiag, ldiag) Made a lookup table of the 3 8 row/column/diagonal configurations to lookup which pieces get flipped on an axis given a piece placement

13 Reducing Die Area We only implemented FLIPROW and FLIPDIAGONAL instructions  We do a 90 o rotation of the board and back again to flip the other two directions Saved on die area and cycle time; transposing and rotating are very cheap instructions

14 New DoAllDirections Output dependencies galore! B = FLIPROW(B, row, col); B = FLIPDIAG(B, row, col); B = ROTATEBOARD(B, CLW); B = FLIPROW(B, col, 7 – row); B = FLIPDIAG(B, col, 7 – row); B = ROTATEBOARD(B, ACLW);

15 State Registers Store 64-bit FLIPTABLE state register to keep track of which pieces should be flipped: no output dependencies between instructions Added benefit: seeing if a move is valid simply amounts to ( FLIPTABLE != 0) after flip operations

16 Results New instructions are extremely effective with relatively little complexity compared to optimizing for a multi-processor environment ~4.1 times better performance than base processor CPUCyclesCycleTimeAreaPricePerf Base492143906~10 ns~4.2 mm 2 ? Extended12683215910.63 ns20389??

17 Conclusions Positives  Optimizations can be used for all Othello players  With very little work, we could reduce the cycle time to that of the base processor Negatives  Cost of custom processors is prohibitive  It may be more effective to exploit parallelism of search algorithm Combining custom processors with MP parallelism for best results?


Download ppt "Othello Processor Bret Taylor, Olatunji Ruwase, and Jim Norris CS343, Spring 2003."

Similar presentations


Ads by Google