Comparative Study of Techniques for Parallelization of the Grid Simulation Problem Prepared by Arthi Ramachandran Kansas State University.

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
MPI Message Passing Interface
Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
1 Non-Blocking Communications. 2 #include int main(int argc, char **argv) { int my_rank, ncpus; int left_neighbor, right_neighbor; int data_received=-1;
Parallel Processing1 Parallel Processing (CS 667) Lecture 9: Advanced Point to Point Communication Jeremy R. Johnson *Parts of this lecture was derived.
1 Coven a Framework for High Performance Problem Solving Environments Nathan A. DeBardeleben Walter B. Ligon III Sourabh Pandit Dan C. Stanzione Jr. Parallel.
Development of Parallel Simulator for Wireless WCDMA Network Hong Zhang Communication lab of HUT.
A Grid Parallel Application Framework Jeremy Villalobos PhD student Department of Computer Science University of North Carolina Charlotte.
Reference: Getting Started with MPI.
1 Parallel Computing—Introduction to Message Passing Interface (MPI)
Page 1 CS Department Parallel Design of JPEG2000 Image Compression Xiuzhen Huang CS Department UC Santa Barbara April 30th, 2003.
Distributed Computations MapReduce
Diffusion scheduling in multiagent computing system MotivationArchitectureAlgorithmsExamplesDynamics Robert Schaefer, AGH University of Science and Technology,
Message Passing Interface In Java for AgentTeamwork (MPJ) By Zhiji Huang Advisor: Professor Munehiro Fukuda 2005.
A Brief Look At MPI’s Point To Point Communication Brian T. Smith Professor, Department of Computer Science Director, Albuquerque High Performance Computing.
Today Objectives Chapter 6 of Quinn Creating 2-D arrays Thinking about “grain size” Introducing point-to-point communications Reading and printing 2-D.
Parallel Programming with Java
1CPSD NSF/DARPA OPAAL Adaptive Parallelization Strategies using Data-driven Objects Laxmikant Kale First Annual Review October 1999, Iowa City.
Christopher Jeffers August 2012
Self Adaptivity in Grid Computing Reporter : Po - Jen Lo Sathish S. Vadhiyar and Jack J. Dongarra.
Socket Swapping for efficient distributed communication between migrating processes MS Final Defense Praveen Ramanan 12 th Dec 2002.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
CS 390- Unix Programming Environment CS 390 Unix Programming Environment Topics to be covered: Distributed Computing Fundamentals.
SUMA: A Scientific Metacomputer Cardinale, Yudith Figueira, Carlos Hernández, Emilio Baquero, Eduardo Berbín, Luis Bouza, Roberto Gamess, Eric García,
Part I MPI from scratch. Part I By: Camilo A. SilvaBIOinformatics Summer 2008 PIRE :: REU :: Cyberbridges.
Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
MPI Introduction to MPI Commands. Basics – Send and Receive MPI is a message passing environment. The processors’ method of sharing information is NOT.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
Mehmet Can Kurt, The Ohio State University Gagan Agrawal, The Ohio State University DISC: A Domain-Interaction Based Programming Model With Support for.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February Session 11.
Framework for MDO Studies Amitay Isaacs Center for Aerospace System Design and Engineering IIT Bombay.
MPI (continue) An example for designing explicit message passing programs Advanced MPI concepts.
1 Message Passing Models CEG 4131 Computer Architecture III Miodrag Bolic.
ProActive components and legacy code Matthieu MOREL.
Message-Passing Computing Chapter 2. Programming Multicomputer Design special parallel programming language –Occam Extend existing language to handle.
An Introduction to MPI (message passing interface)
Reconfigurable Communication Interface Between FASTER and RTSim Dec0907.
1 Rocket Science using Charm++ at CSAR Orion Sky Lawlor 2003/10/21.
AUTO-GC: Automatic Translation of Data Mining Applications to GPU Clusters Wenjing Ma Gagan Agrawal The Ohio State University.
MSE Presentation 3 By Lakshmikanth Ganti Under the Guidance of Dr. Virgil Wallentine – Major Professor Dr. Paul Smith – Committee Member Dr. Mitch Neilsen.
 Dan Ibanez, Micah Corah, Seegyoung Seol, Mark Shephard  2/27/2013  Scientific Computation Research Center  Rensselaer Polytechnic Institute 1 Advances.
PVM and MPI.
Application of Design Patterns to Geometric Decompositions V. Balaji, Thomas L. Clune, Robert W. Numrich and Brice T. Womack.
Apache Ignite Compute Grid Research Corey Pentasuglia.
Introduction to Operating Systems Concepts
Verification of Data-Dependent Properties of MPI-Based Parallel Scientific Software Anastasia Mironova.
OpenPBS – Distributed Workload Management System
Self Healing and Dynamic Construction Framework:
MPI: Portable Parallel Programming for Scientific Computing
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn.
Pattern Parallel Programming
Parallel Objects: Virtualization & In-Process Components
MPI Point to Point Communication
MPI Message Passing Interface
Parallel Programming in C with MPI and OpenMP
Performance Evaluation of Adaptive MPI
Parallel Programming with MPI and OpenMP
CSCE569 Parallel Computing
More on MPI Nonblocking point-to-point routines Deadlock
Message Passing Models
A Message Passing Standard for MPP and Workstations
Sathish Vadhiyar Courtesy: Dr. David Walker, Cardiff University
CSCE569 Parallel Computing
Introduction to parallelism and the Message Passing Interface
More on MPI Nonblocking point-to-point routines Deadlock
System calls….. C-program->POSIX call
An Orchestration Language for Parallel Objects
Higher Level Languages on Adaptive Run-Time System
Presentation transcript:

Comparative Study of Techniques for Parallelization of the Grid Simulation Problem Prepared by Arthi Ramachandran Kansas State University

2 The Problem Iterative schemes eg. – Jacobi method – Kinetic Monte Carlo – Gauss Seidel method Input data has a grid/matrix topology. Computation of value in a cell at time-step ‘t’ requires values of the neighboring cells at time-step ‘t-1’. Parallelization of this problem

3 The Problem (contd…) Jacobi Simulation of Heat flow

4 Outline Goal Available Solutions Parallel Adaptive Distributed Simulations – Architecture Performance Comparison Results Conclusions Future Work

5 The Goal A framework such that user/programmer should have to code only the problem specific logic. The framework manages the parallelization of the user’s application Load Balancing – required for some problems to achieve good performance – the model that we build is geared towards achieving a good performance via moving fixed size jobs across machines – we will see a slide that shows that we do get very good improvement in performance with load balancing.

6 Solution 1 – OptimalGrid Developed by IBM Almaden Research Lab Specifically built to parallelize connected problems Built using Java User only needs to supply – Java code for the problem to be parallelized – Changes to configuration files to fine-tune the behaviour of OptimalGrid (if required)

7 OptimalGrid Architecture Manages the Compute Agents Invokes Problem Builder if necessary – which automatically Partitions Problem Distributes the problem among agents Monitors the Agents – tracks their status performs optimizations if necessary

8 Data Units Original Problem Cell (OPC ) VPP Edge Variable Problem Partition (VPP) OPC surrounded by 4 neighbor cells

9 Implementation of Jacobi method of Heat Flow – OptimalGrid (1) EntityJacobi extends EntityAbstract Data Members double temperature Class Name Methods double getTemperature() void setTemperature (double) void initFromXML(Element entity) Element getXML()

10 Implementation of Jacobi method of Heat Flow – OptimalGrid (2) OPCJacobi extends OPCAbstract Data Members Version Identifier Class Name Methods propagate() localInteraction() propagate(){ ArrayList newOccupants; ArrayList neibList; neibList = this.getAllOpcNeighbors(); for(each entry in neibList){ newOccupants.add(neibListEntry. getOccupants() ); } return newOccupants; } localInteraction(ArrayList occupants){ double temperature = 0.0; if(this.loc.x == 0){ // Cell is on first row of grid temperature = 5.0; } else if(this.loc.y == 0 || this.loc.y == Grid_Dimension-1){ temperature = 0; } else{ // Inner cells – compute for(each neighbor entry in occupants list){ temperature += occupantsEntry.getTemperature(); } temperature /= 4; } occupants(0).setTemperature(temperature); Remove all neighbor entries from occupantsList }

11 MPI Solution Message Passing Interface library – a specification Various Implementations of this specification available eg. – LAM MPI – developed by Ohio Supercomputer Centre. – MPICH – developed by Argonne National Labs and Mississippi State University. (used in this work)

12 Features of MPI Flexible Send and Receive APIs – void Comm::Send(void *buf, int count, Datatype& datatype, int destination, int tag) – void Comm::Recv(void *buf, int count, Datatype& datatype, int source, int tag) Collective Communications support – Broadcast – Scatter and Gather operations between a set of processes – Collective computation operations such as ‘minimum’, ‘maximum’, ‘sum’ etc.

13 Features of MPI (contd.) Virtual Topologies Communication Modes – Non-blocking versions of Send/Receive APIs – Synchronous Mode – Buffered Mode Debugging and Profiling hooks

14 MPI Solution overview Process # 0Process # 1 Process # 2Process # 3

15 MPI Implementation - Jacobi Iteration method Number of Processes (N) Partition Size Number of Iterations Cartesian Matrix of processes Create_cart() All processes (0…N-1) Each process has the id of the left, top, right and bottom neighbor processes Comm::Shift()

16 MPI Implementation - Jacobi Iteration method (contd.) Process # 0 computes the grid co-ordinates of the partition to be assigned to each process Processes # 1 … N-1 wait for the partition co-ordinates from Process #0 Send Allocate Boundary Buffers for the partition

17 MPI Implementation - Jacobi Iteration method (contd.) Iterations finished No Issue calls to Isend and Irecv – non blocking methods to send/recv data Compute Inner Cells Wait for the Isend and Irecv calls to complete Compute Outer Cells Yes Send the result data to the Process # 0

18 Parallel Adaptive Distributed Simulations – A new model … But Why ? –E–Experiment with bringing together the concepts of partititions, double buffers, thread pool, jobs, synchronization schemes in thread pools and load balancing by moving fixed size jobs across machines. –O–OptimalGrid does some of this, however it is proprietary software – hence no access to its source. –I–It is fairly easy to code the application using MPI, however for problems such as atomistic motion simulation, load balancing is a required feature. –C–Can we do better ?

19 Thread Pool

20 Jobs and partitions Data Members int phase Methods bool execute public bool execute(phase){ switch(phase){ case 0: // Sequential code for phase 0 phase = 1; if(!synchronizationMethod(this)){ return false; } else{ return true; } case 1: } … About to enter Synchronization part Advance phase by 1 Synchronization returns false Job has to wait on some Condition – Job thread should relinquish this job Synchronization returns true Job can continue

21 PADS - Architecture Node3  Parsing Input  Opening, monitoring communication channels with controller agents  Partitioning input grid  Assigning the jobs to the hosts/nodes  Co-ordinate load balancing  Collect results from all hosts after iterations are completed  Emit output in the specified format  Establish communication channels with controller as well as with other controller agents  Initialize Thread Pool on host  Deploy jobs received from controller in Thread Pool  Handle communication requirements of each job  Respond to controller messages (load balancing)  Send results back to controller after iterations are complete Connected Sockets Connection-less socket

22 Communication between Controller Agents

23 Synchronization – Jobs and Controller agent All communication between jobs is through controller agent(s) Hence, synchronization required only between the controller agent and the job Shared Job data: – Time step of job – Boundary Buffers – Waiting flag – Frozen flag

24 Load Balancer Controller Node N S Job J M Node N S Job J MC Job movement Messages

25 Overview of Load Balancer Module

26 Overview of Load Balancer Module (2)

27 Overview of Load Balancer Module (3)

28 Overview of Load Balancer Module (4)

29 Performance Comparison Experiment 1

30 Performance Comparison Experiment 2

31 Performance Comparison Experiment 3

32 PADS – Performance comparison for varying number of threads per node (50 x 50 partition size)

33 PADS – Performance comparison for varying number of threads per node (25 x 25 partition size)

34 PADS – Performance comparison for varying number of threads per node (10 x 10 partition size)

35 Preliminary Results for Load Balancer

36 Conclusions OptimalGrid seems to perform better than PADS and MPI solution for a larger grain size. (≥ 10 μs) (System.nanoseconds() – accuracy ? ) PADS and MPI perform better than the OptimalGrid by an order of magnitude for small grain size (4 ~ 10ns) OptimalGrid provides features that can be used easily by the user. MPI provides hooks for logging and debugging which can be used by the programmer. OptimalGrid and PADS allow for load balancing to be done automatically. With PADS, from the results of the simulation, we see that a good performance improvement is obtained with load balancing. MPI does not provide dynamic load balancing.

37 Future Work Formulation and implementation of policies for dynamic load balancing in PADS. Experiment with flexibility - partitions are allowed to have variable dimensions – however, synchronization and communication will become more complex and might give rise to more overhead. Heterogeneity in duration of a time-step and computation among jobs needs to be allowed for implementation of certain problems. Develop a GUI for PADS.

38 Acknowledgements Dr. Virgil E. Wallentine Dr. Daniel A. Andresen Dr. Gurdip Singh Dr. Masaaki Mizuno

Questions ?