Solving Awari using Large-Scale Parallel Retrograde Analysis

Slides:



Advertisements
Similar presentations
Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Advertisements

Opening Workshop DAS-2 (Distributed ASCI Supercomputer 2) Project vrije Universiteit.
Big Data Working with Terabytes in SQL Server Andrew Novick
R2: An application-level kernel for record and replay Z. Guo, X. Wang, J. Tang, X. Liu, Z. Xu, M. Wu, M. F. Kaashoek, Z. Zhang, (MSR Asia, Tsinghua, MIT),
Chapter 8-1 : Multiple Processor Systems Multiple Processor Systems Multiple Processor Systems Multiprocessor Hardware Multiprocessor Hardware UMA Multiprocessors.
Chapter 15 Basic Asynchronous Network Algorithms
Presented by: Yash Gurung, ICFAI UNIVERSITY.Sikkim BUILDING of 3 R'sCLUSTER PARALLEL COMPUTER.
1 Solving Awari using Large-Scale Parallel Retrograde Analysis John W. Romein Henri E. Bal Vrije Universiteit, Amsterdam.
IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)
Redundant Bit Vectors for the Audio Fingerprinting Server John Platt Jonathan Goldstein Chris Burges.
Transposition Driven Work Scheduling in Distributed Search Department of Computer Science vrijeamsterdam vrije Universiteit amsterdam John W. Romein Aske.
Informationsteknologi Friday, November 16, 2007Computer Architecture I - Class 121 Today’s class Operating System Machine Level.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
4/29/09Prof. Hilfinger CS164 Lecture 381 Register Allocation Lecture 28 (from notes by G. Necula and R. Bodik)
Unit 3—Part A Computer Memory
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 1 Preview of Oracle Database 12 c In-Memory Option Thomas Kyte
CS 346 – Chapter 8 Main memory –Addressing –Swapping –Allocation and fragmentation –Paging –Segmentation Commitment –Please finish chapter 8.
Computer Systems. Basic Components Auxiliary Storage OutputProcessor Main Memory Input.
MAC OS – Unit A Page: 10-11, Investigating Data Processing Understanding Memory.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Principles of I/0 hardware.
CERN - IT Department CH-1211 Genève 23 Switzerland t Tier0 database extensions and multi-core/64 bit studies Maria Girone, CERN IT-PSS LCG.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
1 Chapter 17 Disk Storage, Basic File Structures, and Hashing Chapter 18 Index Structures for Files.
Unit 2—Part A Computer Memory Computer Technology (S1 Obj 2-3)
Cilk Pousse James Process CS534. Overview Introduction to Pousse Searching Evaluation Function Move Ordering Conclusion.
1 MSRBot Web Crawler Dennis Fetterly Microsoft Research Silicon Valley Lab © Microsoft Corporation.
Massive Semantic Web data compression with MapReduce Jacopo Urbani, Jason Maassen, Henri Bal Vrije Universiteit, Amsterdam HPDC ( High Performance Distributed.
Lecture on Central Process Unit (CPU)
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.
FroNtier Stress Tests at Tier-0 Status report Luis Ramos LCG3D Workshop – September 13, 2006.
Computer Performance. Hard Drive - HDD Stores your files, programs, and information. If it gets full, you can’t save any more. Measured in bytes (KB,
Main Memory Main memory – –a collection of storage locations, –each with a unique identifier called the address. Word- –Data are transferred to and from.
Vanderbilt University Toshiba IR Test Apparatus Project Final Design Review Ahmad Nazri Fadzal Zamir Izam Nurfazlina Kamaruddin Wan Othman.
Types of RAM (Random Access Memory) Information Technology.
NON STANDARD HARDWARE By the end of this lesson you will be able to: 1. Identify non standard computer hardware 2. Understand ACRONYMS used to describe.
Information Technology (IT). Information Technology – technology used to create, store, exchange, and use information in its various forms (business data,
Compute and Storage For the Farm at Jlab
Adversarial Search and Game-Playing
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Hardware specifications
Cyclic Redundancy Check (CRC)
Types of RAM (Random Access Memory)
Iterative Deepening A*
Memory and storage AS identifies the main hardware components of at least two types of computer. AS states and discusses the implications.
Computer Memory Digital Literacy.
Discovering Computers 2011: Living in a Digital World Chapter 4
Dr. Rachel Ben-Eliyahu – Zohary
Oracle SQL*Loader
TexPREP Summer Camp Computer Science
Unit 2 Computer Memory Computer Technology (S1 Obj 2-3)
Chapter 1: Introduction
Introduction to Computers
L21: Putting it together: Tree Search (Ch. 6)
Unit 3—Part A Computer Memory
An Overview of the Computer System
CSCI206 - Computer Organization & Programming
Unit 3—Part A Computer Memory
Microprocessor & Assembly Language
Communication and Memory Efficient Parallel Decision Tree Construction
Transposition Driven Work Scheduling in Distributed Search
Introduction to Computers
Kiran Subramanyam Password Cracking 1.
Virtual Memory Hardware
CS246 Search Engine Scale.
Introduction to Artificial Intelligence Lecture 9: Two-Player Games I
CS246: Search-Engine Scale
Dynamic Verification of Sequential Consistency
ICS103 Programming in C 1: Overview of Computers And Programming
Unit II Game Playing.
Cluster Computers.
Presentation transcript:

Solving Awari using Large-Scale Parallel Retrograde Analysis John W. Romein Henri E. Bal new cluster talked with Henri challenging apps solve awari enthousiastic 3 weeks let's do it 1:15 Vrije Universiteit, Amsterdam

introduction: awari 3500-year old board game best-known mancala variant wari, owari, wale, awale, ... determine score for 889,063,398,406 positions retrograde analysis 144 CPUs, 72 GB RAM, 1.4 TB disks, Myrinet board game 3500 years Africa; played worldwide mancala many names 889 billion positions retrograde analysis new cluster 1:45

outline rules of awari databases (parallel) retrograde analysis performance verification new game insights www: awari oracle 1:30

rules of awari sow counterclockwise capture if last, enemy pit contains 2 or 3 stones goal: capture majority of stones board player: 6 pits (auxiliary pits) move from non-empty pit sow caputure (repeat) goal: >24 stones (humiliate) ends if cannot move must give move repetition 2:20

awari databases build n-stone databases (n = 0, 1, ... , 46, 48) entry Û board entry contains score (-n ... +n) south to move construct databases split w.r.t. stones on board entry Û board, functions -48 <= score <= 48 7 bits next slide north to move -> rotate table largest: 204 billion, 178 GB largest DB for whichever game cannot split total 2:40

scores best move depends on remaining stones not on captured stones! final result = D captured stones + score score = eventual division of remaining stones score = +2 (8-6) best move depends on remaining stones not captured contribute to final interesting: remaining stones after optimal play score, stored in DB example: 14 stones DB: +2 (south 8, north 6) south adv +4; eventually +6 (27-21) 1:40 south to move

database construction: retrograde analysis initial state 4 1 4 3 1 6 4 2 3 1 1 4 6 2 4 contruct DB RA state space nodes = positions = entries edges root final states values in final states negamax bottom-up determine root pos -> win DCG MiniMax tree (DCG) search state space bottom-up final states

10-bit retrograde analysis best score (7 bits) + nr. unknown children (3 bits) inform parent if score becomes known 2 1 1 1 essention simple (nontrivial issues) 7 + 3 inform parent 1:00 2 1 1 2 ? ? ? 1

2-bit retrograde analysis 2 bits/entry in RAM: Win/Draw/Loss/Unknown search n times with widening window (-i, i) PROCEDURE CreateDatabase(n) IS FOR i IN 1 ... n DO Window := (-i, i); SetLeaves(); // handle terminal states and captures BottomUpSearch(); CollectScores(); tell more about new // alg based on seq Lincke & Marzeta 2 bits, 4 states: 2:00

bottom-up search PROCEDURE CheckState(node) IS IF state [node] = unknown AND AllChildrenAreWins(node) THEN state [node] := loss; SetParentsToWin(node); CheckStateOfGrandParents(node); W U W U W U L W W W W

parallel retrograde analysis partition database receive queue with work migrate work (asynchronously) global termination detection W U U W W U L W

performance (1/3) 72 x dual 1.0 GHz Pentium III 1 GB RAM 20 GB disk 2.0 Gb/s Myrinet Myrinet switch 1:00

performance (2/3) 48-stones: 15 hours total: 51 hours This figure shows the computation times for the 2 and the 10-bit algorithm. The 2-bit algorithm is slower, but is able to solve the larger databases, unlike the 10-bit algorithm. There is some noise in the sub-second area. We see that the execution times grow exponentially with the number of stones. Computation of the 48-stone database took a little over 15 hours, and using the fastest available algorithm, about 51 hours were needed to compute all databases. 48-stones: 15 hours total: 51 hours

performance (3/3) communication disk I/O 20 - 30 MB/s send + receive per SMP node 1.4 - 2.1 GB/s through switch 130 TB in total = 1.0 Pb ! disk I/O ~ 10 TB in total 0:50

verification hardware: software: ECC RAM, cache, and Myrinet memory CRC communication and disk checksums software: 2 algorithms give identical results (up to 41 stones) recomputed using 64 SMPs NegaMax integrity check compared statistics with others (up to 36 stones) We have executed quadrillions of instructions, sent terabytes of data, and stored hundreds of gigabytes of data on disk. How do we known that the databases are correct? The hardware, the application, and the operating system can fail, but there are several indications that errors are unlikely. The hardware uses error correcting codes on both the main memory and the memory on the Myrinet network card. Moreover, CRC checks are computed and verified for data that is sent over the network and data that is written to disk. But this does not procect us against errors in the software. During the development of the program, we discovered a few nasty race conditions

new awari insights awari is a draw best opening move: F4 other opening moves are losing! to capture is not always the best choice in 22% of cases, it is not

the awari oracle web server (being worked on) lookup positions interactive play download statistics requires 5 x 160 GB disks http://awari.cs.vu.nl/

conclusions awari is solved and is a draw parallel retrograde analysis overlap computation, communication and disk I/O required: score determination of 889,063,398,406 positions large parallel system 51 hours computation time 1.0 Pb communication