Download presentation
Presentation is loading. Please wait.
Published bySydney Marshall Modified over 9 years ago
1
Othello Processor Bret Taylor, Olatunji Ruwase, and Jim Norris CS343, Spring 2003
2
Project Overview AI algorithms for games like Othello are highly parallelizable Generally, algorithm performance is increased by adding more, generic processors What price-performance can we achieve with a custom processor? What granularity of custom instructions achieves greatest price-performance? How generic can we make the instructions?
3
Overview of Othello Goal: Have more pieces when the game ends You flip your opponent’s pieces when you surround them with your pieces on each end
4
Overview of Othello Goal: Have more pieces when the game ends You flip your opponent’s pieces when you surround them with your pieces on each end
5
Overview of Othello Goal: Have more pieces when the game ends You flip your opponent’s pieces when you surround them with your pieces on each end
6
Othello in Academia Software Rosenbloom, Paul S.: A World-Championship-Level Othello Program Lee, K.; Mahajan, S.: The Development of a World Class Othello Program No specific hardware implementations, but related work for MiniMax-style algorithms Powley, Curtis Nelson: Parallel Tree Search on a Single-Instruction, Multiple-Data (SIMD) Machine
7
Common Algorithm Structure MiniMax search to a depth determined by global or per-move time limit Heuristics evaluate “value” of move at leaves
8
Common Algorithm Properties The deeper the search, the better the processor Assumes heuristic is “reasonable,” i.e., it does not get worse with more information Effectively infinitely parallelizable Many operations are expensive in software: Determining successors (“is valid move”) Calculating successors (“make move”) Heuristic calculation
9
Our Othello Implementation Based on Iago, concentrating on high-quality heuristic variables: Stability – Number of pieces that can never be flipped Mobility – Number of available moves Piece differential Vulnerability – Entrance points to stable squares on the corners and sides Heuristic value is weighted sum of variables Weights learned through reinforcement learning Weights vary over the course of the game
10
Software Overview Boards are 128-bit entries (2 bits per piece) Lookup tables for things like stability Lookup tables are indexed by the ternary number represented by the row or column: 1 + 2 * 3 2 + 2 * 3 3 + 3 4 154
11
Software Trace Profile Vast majority of CPU time consumed in IsLegalMove and MakeMove 53.21% DoOneDirection 30.64% DoAllDirections Loops in all rows/diagonals to find/flip valid directions Called to find successors, to calculate mobility, and to make moves Operations are common to all Othello players (extensions are at least slightly generic)!
12
Flip Instruction Granularity Split a single MakeMove or IsLegalMove operation into a sequence of four operations corresponding the four flip directions (row, column, rdiag, ldiag) Made a lookup table of the 3 8 row/column/diagonal configurations to lookup which pieces get flipped on an axis given a piece placement
13
Reducing Die Area We only implemented FLIPROW and FLIPDIAGONAL instructions We do a 90 o rotation of the board and back again to flip the other two directions Saved on die area and cycle time; transposing and rotating are very cheap instructions
14
New DoAllDirections Output dependencies galore! B = FLIPROW(B, row, col); B = FLIPDIAG(B, row, col); B = ROTATEBOARD(B, CLW); B = FLIPROW(B, col, 7 – row); B = FLIPDIAG(B, col, 7 – row); B = ROTATEBOARD(B, ACLW);
15
State Registers Store 64-bit FLIPTABLE state register to keep track of which pieces should be flipped: no output dependencies between instructions Added benefit: seeing if a move is valid simply amounts to ( FLIPTABLE != 0) after flip operations
16
Results New instructions are extremely effective with relatively little complexity compared to optimizing for a multi-processor environment ~4.1 times better performance than base processor CPUCyclesCycleTimeAreaPricePerf Base492143906~10 ns~4.2 mm 2 ? Extended12683215910.63 ns20389??
17
Conclusions Positives Optimizations can be used for all Othello players With very little work, we could reduce the cycle time to that of the base processor Negatives Cost of custom processors is prohibitive It may be more effective to exploit parallelism of search algorithm Combining custom processors with MP parallelism for best results?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.