Fire Benchmark Parallelisation Programming of Supercomputers WS 11/12 Sam Maurus.

Slides:



Advertisements
Similar presentations
NGS computation services: API's,
Advertisements

MPI version of the Serial Code With One-Dimensional Decomposition Presented by Timothy H. Kaiser, Ph.D. San Diego Supercomputer Center Presented by Timothy.
Lecture 19: Parallel Algorithms
Practical techniques & Examples
1 Parallel Parentheses Matching Plus Some Applications.
CSCI-455/552 Introduction to High Performance Computing Lecture 11.
Introduction to MPI Programming (Part III)‏ Michael Griffiths, Deniz Savas & Alan Real January 2006.
Chess Problem Solver Solves a given chess position for checkmate Problem input in text format.
Parallel Computation of the 2D Laminar Axisymmetric Coflow Nonpremixed Flames Qingan Andy Zhang PhD Candidate Department of Mechanical and Industrial Engineering.
EECC756 - Shaaban #1 lec # 8 Spring Synchronous Iteration Iteration-based computation is a powerful method for solving numerical (and some.
1 Parallel Algorithms II Topics: matrix and graph algorithms.
Getting Started with MPI Self Test with solution.
Chapter 6 Floyd’s Algorithm. 2 Chapter Objectives Creating 2-D arrays Thinking about “grain size” Introducing point-to-point communications Reading and.
12a.1 Introduction to Parallel Computing UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008.
Sequential Circuits Problems(II) Prof. Sin-Min Lee Department of Mathematics and Computer Science Algorithm = Logic + Control.
Point-to-Point Communication Self Test with solution.
CSE5304—Project Proposal Parallel Matrix Multiplication Tian Mi.
1 Lecture 11 Sorting Parallel Computing Fall 2008.
Lecture 21: Parallel Algorithms
Collective Communications Self Test with solution.
Chapter 6 Floyd’s Algorithm. 2 Chapter Objectives Creating 2-D arrays Thinking about “grain size” Introducing point-to-point communications Reading and.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Pipelined Computations Divide a problem into a series of tasks A processor completes a task sequentially and pipes the results to the next processor Pipelining.
1 Parallel Computing—Introduction to Message Passing Interface (MPI)
Today Objectives Chapter 6 of Quinn Creating 2-D arrays Thinking about “grain size” Introducing point-to-point communications Reading and printing 2-D.
MapReduce Simplified Data Processing On large Clusters Jeffery Dean and Sanjay Ghemawat.
PSMILe development meetings Paris, September 19, 2002 The transformer is an independent entity that reacts when a signal is sent by a model or the driver.
2a.1 Evaluating Parallel Programs Cluster Computing, UNC-Charlotte, B. Wilkinson.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
L15: Putting it together: N-body (Ch. 6) October 30, 2012.
Chapter 4 Performance. Times User CPU time – Time that the CPU is executing the program System CPU time – time the CPU is executing OS routines for the.
Designing and Evaluating Parallel Programs Anda Iamnitchi Federated Distributed Systems Fall 2006 Textbook (on line): Designing and Building Parallel Programs.
Parallel Programming Models Basic question: what is the “right” way to write parallel programs –And deal with the complexity of finding parallelism, coarsening.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Introduction to Parallel Programming with C and MPI at MCSR Part 2 Broadcast/Reduce.
UPC Applications Parry Husbands. Roadmap Benchmark small applications and kernels —SPMV (for iterative linear/eigen solvers) —Multigrid Develop sense.
Introduction to Parallel Programming with C and MPI at MCSR Part 1 The University of Southern Mississippi April 8, 2010.
SE-292 High Performance Computing Intro. to Parallelization Sathish Vadhiyar.
Sieve of Eratosthenes by Fola Olagbemi. Outline What is the sieve of Eratosthenes? Algorithm used Parallelizing the algorithm Data decomposition options.
MPI Communications Point to Point Collective Communication Data Packaging.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
MPI (continue) An example for designing explicit message passing programs Advanced MPI concepts.
Lecture 15- Parallel Databases (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
FOUNDATION IN INFORMATION TECHNOLOGY (CS-T-101) TOPIC : INFORMATION SYSTEM – SOFTWARE.
Message-Passing Computing Chapter 2. Programming Multicomputer Design special parallel programming language –Occam Extend existing language to handle.
CSCI-455/552 Introduction to High Performance Computing Lecture 6.
Data Parallelism Task Parallel Library (TPL) The use of lambdas Map-Reduce Pattern FEN 20141UCN Teknologi/act2learn.
Efficiency of small size tasks calculation in grid clusters using parallel processing.. Olgerts Belmanis Jānis Kūliņš RTU ETF Riga Technical University.
1 BİL 542 Parallel Computing. 2 Message Passing Chapter 2.
Liverpool Experience of MDC 1 MAP (and in our belief any system which attempts to be scaleable to 1000s of nodes) broadcasts the code to all the nodes.
1 Distributed BDD-based Model Checking Orna Grumberg Technion, Israel Joint work with Tamir Heyman, Nili Ifergan, and Assaf Schuster CAV00, FMCAD00, CAV01,
2.1 Collective Communication Involves set of processes, defined by an intra-communicator. Message tags not present. Principal collective operations: MPI_BCAST()
Parallel Computing Presented by Justin Reschke
Compilation of XSLT into Dataflow Graphs for Web Service Composition Peter Kelly Paul Coddington Andrew Wendelborn.
Barnes Hut – A Broad Review Abhinav S Bhatele The 27th day of April, 2006.
Genetic algorithms for task scheduling problem J. Parallel Distrib. Comput. (2010) Fatma A. Omara, Mona M. Arafa 2016/3/111 Shang-Chi Wu.
Parallel FFT in Julia Review of FFT.
Implementation of Classifier Tool in Twister Magesh khanna Vadivelu Shivaraman Janakiraman.
Chun-Yuan Lin MPI-Programming training-1. Broadcast Sending same message to all processes concerned with problem. Multicast - sending same message to.
Pitfalls: Time Dependent Behaviors CS433 Spring 2001 Laxmikant Kale.
Parallel Programming By J. H. Wang May 2, 2017.
CDA 3101 Spring 2016 Introduction to Computer Organization
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Parallel Programming with MPI and OpenMP
More on MPI Nonblocking point-to-point routines Deadlock
Parallel Processing - MPI
Paraguin Compiler Communication.
CSCE569 Parallel Computing
Parallel Programming in C with MPI and OpenMP
Presentation transcript:

Fire Benchmark Parallelisation Programming of Supercomputers WS 11/12 Sam Maurus

What is Fire Benchmark? CFD solver for arbitrary geometries This project concerned itself with the gccg solver

How Fast is Fire Benchmark Sequentially?

What effect does the input file- format have?

Data structures in gccg Points Elements

Data structures in gccg x y z points array

Data structures in gccg elems array

Data structures in gccg lcc array

Data distribution approach Process 0 (root)Process 1 Process 2 Process 3 Root Process Tasks: Read input file Partition elements using chosen approach Create and send relevant mapping arrays to each processes Broadcast common data package to each processor = lcc, ne, epart, countPart, bs_local, be_local …

Communication model

P3 Communication model has_ghost_neighbour array P3 has_ghost_neighbour = 0has_ghost_neighbour = 1 P5

Communication model Process x Process 0Process 1 Process k (k = count) … Computational loop, phase one: Start Isend to required processes ( where cellCountsToSend[i] > 0) Start Irecv from required processes ( where cellCountsToRecv[i] > 0) Process local elements that have no ghost neighbours Wait on all requests Update remaining local elements

Communication model

Problems overcome MPI_WAIT FUNCTION Problem: MPI_Wait was being executed both for the send and receive requests for every element processed Solution: has_ghost_neighbour array introduced, allowing for intermediate computation. MPI_Wait then only called once for each request. BEFOREAFTER

Problems overcome REDUNDANT REPROCESSING OF INPUT FILE Problem: Input file was being read once at initialisation and again for writing the result (redundant) Solution: ‘Write solution’ code was refactored to re-use the relevant file information obtained from the first read BEFOREAFTER

Speedup – cojack

Speedup – pent

Speedup – drall

Speedup – tjunc

Speedup – full execution

Thanks for listening Discussion time!