Parallel Nested Monte-Carlo Search

Slides:



Advertisements
Similar presentations
MINJAE HWANG THAWAN KOOBURAT CS758 CLASS PROJECT FALL 2009 Extending Task-based Programming Model beyond Shared-memory Systems.
Advertisements

GCSE Computing Working with numbers. Challenge ‘Crimson Mystical Mages’ is a fantasy role playing board game. You need to create an electronic program.
LOAD BALANCING IN A CENTRALIZED DISTRIBUTED SYSTEM BY ANILA JAGANNATHAM ELENA HARRIS.
Parallel ISDS Chris Hans 29 November 2004.
Poor Mans Cluster (PMC) Johny Balian. Outline What is PMC How it works Concept Positive aspects Negative aspects Good and Bad Application ideas Monte.
Scalable Content-aware Request Distribution in Cluster-based Network Servers Jianbin Wei 10/4/2001.
A Peer-to-Peer On-Demand Streaming Service and Its Performance Evaluation Presenter: Nera Liu Author: Yang Guo, Kyoungwon Suh, Jim Kurose and Don Towsley.
How it Works Copyright Nicola Ottomano 2005http://
Strategies Based On Threats Ling Zhao University of Alberta March 10, 2003 Comparative evaluation of strategies based on the values of direct threats by.
Load Sharing for Cluster-Based Network Service Jiani Guo and Laxmi Bhuyan Architecture Lab Department of Computer Science and Engineering University of.
Performance Optimization of Clustal W: Parallel Clustal W, HT Clustal and MULTICLUSTAL Arunesh Mishra CMSC 838 Presentation Authors : Dmitri Mikhailov,
Beowulf Cluster Computing Each Computer in the cluster is equipped with: – Intel Core 2 Duo 6400 Processor(Master: Core 2 Duo 6700) – 2 Gigabytes of DDR.
Fixing the Embarrassing Slowness of OpenDHT on PlanetLab Sean Rhea, Byung-Gon Chun, John Kubiatowicz, and Scott Shenker UC Berkeley (and now MIT) December.
Monte Carlo Go Has a Way to Go Haruhiro Yoshimoto (*1) Kazuki Yoshizoe (*1) Tomoyuki Kaneko (*1) Akihiro Kishimoto (*2) Kenjiro Taura (*1) (*1)University.
MARS: Applying Multiplicative Adaptive User Preference Retrieval to Web Search Zhixiang Chen & Xiannong Meng U.Texas-PanAm & Bucknell Univ.
FLANN Fast Library for Approximate Nearest Neighbors
Locality-Aware Request Distribution in Cluster-based Network Servers Presented by: Kevin Boos Authors: Vivek S. Pai, Mohit Aron, et al. Rice University.
Pipelined Two Step Iterative Matching Algorithms for CIOQ Crossbar Switches Deng Pan and Yuanyuan Yang State University of New York, Stony Brook.
1 Project Ideas. 2 Algorithmic Evaluations/Comparisons  Compare variants of (nested) policy rollout using different bandit algorithms  Compare some.
Non-Collective Communicator Creation in MPI James Dinan 1, Sriram Krishnamoorthy 2, Pavan Balaji 1, Jeff Hammond 1, Manojkumar Krishnan 2, Vinod Tipparaju.
Server System. Introduction A server system is a computer, or series of computers, that link other computers or electronic devices together. They often.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
Parallel Programming in C with MPI and OpenMP
File System Benchmarking
Parallel Monte-Carlo Tree Search with Simulation Servers H IDEKI K ATO †‡ and I KUO T AKEUCHI † † The University of Tokyo ‡ Fixstars Corporation November.
Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.
2012 4th International Conference on Cyber Conflict C. Czosseck, R. Ottis, K. Ziolkowski (Eds.) 2012 © NATO CCD COE Publications, Tallinn 朱祐呈.
Presented by Reliability, Availability, and Serviceability (RAS) for High-Performance Computing Stephen L. Scott and Christian Engelmann Computer Science.
Outline 3  PWA overview Computational challenges in Partial Wave Analysis Comparison of new and old PWA software design - performance issues Maciej Swat.
Parallelization of Classification Algorithms For Medical Imaging on a Cluster Computing System 指導教授 : 梁廷宇 老師 系所 : 碩光通一甲 姓名 : 吳秉謙 學號 :
Basic Statistical Terms: Statistics: refers to the sample A means by which a set of data may be described and interpreted in a meaningful way. A method.
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
Machine interference problem: introduction
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Rassul Ayani 1 Performance of parallel and distributed systems  What is the purpose of measurement?  To evaluate a system (or an architecture)  To compare.
WSV207. Cluster Public Cloud Servers On-Premises Servers Desktop Workstations Application Logic.
Advanced Computer Networks Lecture 1 - Parallelization 1.
Optimal Aggregation Algorithms for Middleware By Ronald Fagin, Amnon Lotem, and Moni Naor.
Javad Azimi, Ali Jalali, Xiaoli Fern Oregon State University University of Texas at Austin In NIPS 2011, Workshop in Bayesian optimization, experimental.
Outline Standard 2-way minimum graph cut problem. Applications to problems in computer vision Classical algorithms from the theory literature A new algorithm.
Semantics of Minimally Synchronous Parallel ML Myrto Arapini, Frédéric Loulergue, Frédéric Gava and Frédéric Dabrowski LACL, Paris, France.
Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University
Technical lssues for the Knowledge Engineering Competition Stefan Edelkamp Jeremy Frank.
A Web Based Job Submission System for a Physics Computing Cluster David Jones IOP Particle Physics 2004 Birmingham 1.
1 Traversal Algorithms  sequential polling algorithm  traversing connected networks (tree construction) complexity measures tree terminology tarry’s.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
A Parallel Communication Infrastructure for STAPL
1 Results of a Biome-BGC Monte Carlo Experiment on the SZTAKI Desktop Grid P. Ittzés1, A. Cs. Marosi2, Z. Barcza1, F. Horváth1 1. MTA Centre for Ecological.
Data Driven Resource Allocation for Distributed Learning
Machine Learning of Bridge Bidding
Mediation Event Flow.
SEMINAR ON DDNS (DYNAMIC DOMAIN NAME SERVICE)
Network Load Balancing Functionality
Mastering the game of Go with deep neural network and tree search
AlphaGo with Deep RL Alpha GO.
Boyang Peng, Le Xu, Indranil Gupta
Comparison of LAN, MAN, WAN
Automatic Tuning of Collective Communications in MPI
High Performance Computing in Teaching
Course Outline Introduction in algorithms and applications
Cluster Computing and the Grid, Proceedings
Peer-to-Peer Streaming: An Hierarchical Approach
By Brandon, Ben, and Lee Parallel Computing.
Monte Carlo Methods A so-called “embarrassingly parallel” computation as it decomposes into obviously independent tasks that can be done in parallel without.
Peer-to-Peer Client-server.
Adaptivity and Dynamic Load Balancing
COMPUTER NETWORKS PRESENTATION
Service Learning Middle School Options
Zhixiang Chen & Xiannong Meng U.Texas-PanAm & Bucknell Univ.
Your computer is the client
Presentation transcript:

Parallel Nested Monte-Carlo Search Tristan Cazenave Nicolas Jouandeau LAMSADE LIASD Université Paris-Dauphine Université Paris 8 Paris, France

Outline Nested Monte-Carlo Search Parallelization of Nested Monte-Carlo Search Experimental results Conclusion

Monte-Carlo Sampling Play random games and retain the best : int sample (position) while not end of game play random move return score Iteratively call the sample function.

Nested Monte-Carlo Search A nested search of level 0 is a sample. A nested search of level 1 plays a sample after each move and chooses the best move. A nested search of level 2 uses a nested search of level 1 to choose a move. A nested search of level 3 uses a nested search of level 2 to choose a move. etc.

Nested Monte-Carlo Search int nested (position, level) while not end of game if level is 1 move = argmax_m (sample (play (position, m))) else move = argmax_m (nested ( play (position, m), level - 1)) bestMove = move of best sequence position = play (position,bestMove) return score

Parallelization The root process is at the first level (the highest level of nesting) an asks the median processes to play games at level - 1. The median processes ask the clients to play games at level - 2. The client processes do nested searches at level – 2 The dispatcher process is used to tell median nodes which clients to use.

Communications can occur in parallel

The last minute algorithm The dispatcher maintains a list of free clients and a list of jobs Jobs are ordered by expected computation time. When a request is received by the dispatcher, either there are free clients and it sends back the first free client, or there are no free clients and the job is added to the list of jobs.

Client response in the last minute algorithm

Communications for last minute

Experimental results Cluster of 32 dual core Pc. Connected with a Gigabit network. Open MPI. Each node runs two clients The server runs the root process as well as all the median processes and the dispatcher All experiments use Morpion Solitaire disjoint model.

Experimental results All the time results are a mean over multiple runs of each algorithm. The standard deviation is given between parenthesis after the time results. The algorithms were tested on playing only the first move of a game, and on playing an entire game.

Times for the sequential algorithm level first move one rollout 3 08m03s (19s) 1h07m33s (42s) 4 28h00m06s (58m55s) (09d18h58m)

First move times for the Round-Robin algorithm clients level 3 level 4 64 10s (1s) 33m11s (1m33s) 32 20s (2s) 1h04m44s (3m02s) 16 37s (5s) (2h10m) 8 01m11s (8s) --- 4 02m22s (11s) --- 1 09m07s (28s) (29h56m14s)

Rollout times for the Round-Robin algorithm clients level 3 level 4 64 01m52s (8s) 5h09m16s (5m40s) 32 03m08s (26s) (6h31m) 16 05m22s (29s) --- 8 10m18s (1m21s) --- 4 21m41s (3m13s) --- 1 1h26m28s ---

First move times for the Last-Minute algorithm clients level 3 level 4 64 09s (2s) 27m20s (1m22s) 32 19s (1s) 59m44s (2m21s) 16 37s (4s) (2h05m17s) 8 01m12s (5s) --- 4 02m23s (4s) --- 1 09m30s (21s) (33h06m57s)

Rollout times for the Last-Minute algorithm clients level 3 level 4 64 01m32s (5s) 4h10m09s (24m04s) 32 02m43s (16s) 6h58m21s (52m42s) 16 05m35s (40s) --- 8 11m33s (1m34s) --- 4 19m51s (3m34s) --- 1 1h31m40s ---

First move times on an heterogeneous cluster clients alg level 3 level 4 16x4+16x2 LM 14s (2s) 28m37s (1m30s) 16x4+16x2 RR 16s (2s) 45m17s (1m19s) 8x4+8x2 LM 18s (3s) 58m21s (2m44s) 8x4+8x2 RR 25s (2s) 1h24m11s (3m24s)

80 moves world record

Conclusion Two algorithms that parallelize Nested Monte-Carlo Search on a cluster. The speedup for 64 clients is approximately 56 for Morpion Solitaire. The Last-Minute algorithm is more adapted to heterogeneous clusters. level 4 => 80 moves world record in 5 hours. Apply to other problems.