ATmospheric, Meteorological, and Environmental Technologies RAMS Parallel Processing Techniques.

Slides:



Advertisements
Similar presentations
Weather Research & Forecasting: A General Overview
Advertisements

Threads, SMP, and Microkernels
MPI version of the Serial Code With One-Dimensional Decomposition Presented by Timothy H. Kaiser, Ph.D. San Diego Supercomputer Center Presented by Timothy.
SE-292 High Performance Computing
Programmability Issues
WHAT IS AN OPERATING SYSTEM? An interface between users and hardware - an environment "architecture ” Allows convenient usage; hides the tedious stuff.
Dynamic Load Balancing for VORPAL Viktor Przebinda Center for Integrated Plasma Studies.
Distributed System Structures Network Operating Systems –provide an environment where users can access remote resources through remote login or file transfer.
PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker.
MPI and C-Language Seminars Seminar Plan  Week 1 – Introduction, Data Types, Control Flow, Pointers  Week 2 – Arrays, Structures, Enums, I/O,
Reference: Message Passing Fundamentals.
ECE669 L4: Parallel Applications February 10, 2004 ECE 669 Parallel Computer Architecture Lecture 4 Parallel Applications.
Chapter 5: Computer Systems Organization Invitation to Computer Science, Java Version, Third Edition.
Evaluation and Optimization of a Titanium Adaptive Mesh Refinement Amir Kamil Ben Schwarz Jimmy Su.
1: Operating Systems Overview
Message Passing Fundamentals Self Test. 1.A shared memory computer has access to: a)the memory of other nodes via a proprietary high- speed communications.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Page 1 CS Department Parallel Design of JPEG2000 Image Compression Xiuzhen Huang CS Department UC Santa Barbara April 30th, 2003.
Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR.
Science Advisory Committee Meeting - 20 September 3, 2010 Stanford University 1 04_Parallel Processing Parallel Processing Majid AlMeshari John W. Conklin.
1 Software Testing and Quality Assurance Lecture 40 – Software Quality Assurance.
PRASHANTHI NARAYAN NETTEM.
ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation.
Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access memory.
Mapping Techniques for Load Balancing
Parallelization: Conway’s Game of Life. Cellular automata: Important for science Biology – Mapping brain tumor growth Ecology – Interactions of species.
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
1 CCOS Seasonal Modeling: The Computing Environment S.Tonse, N.J.Brown & R. Harley Lawrence Berkeley National Laboratory University Of California at Berkeley.
Stratified Magnetohydrodynamics Accelerated Using GPUs:SMAUG.
LOGO OPERATING SYSTEM Dalia AL-Dabbagh
 What is an operating system? What is an operating system?  Where does the OS fit in? Where does the OS fit in?  Services provided by an OS Services.
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
Most modern operating systems incorporate these five components.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
The WRF Model The Weather Research and Forecasting (WRF) Model is a mesoscale numerical weather prediction system designed for both atmospheric research.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
Sept COMP60611 Fundamentals of Concurrency Lab Exercise 2 Notes Notes on the finite difference performance model example – for the lab… Graham Riley,
The Functions of Operating Systems Desktop PC Operating Systems.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
1: Operating Systems Overview 1 Jerry Breecher Fall, 2004 CLARK UNIVERSITY CS215 OPERATING SYSTEMS OVERVIEW.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Computer Organization. This module surveys the physical resources of a computer system.  Basic components  CPU  Memory  Bus  I/O devices  CPU structure.
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
CDP Tutorial 3 Basics of Parallel Algorithm Design uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison.
Image Processing A Study in Pixel Averaging Building a Resolution Pyramid With Parallel Computing Denise Runnels and Farnaz Zand.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Lecture 3: Designing Parallel Programs. Methodological Design Designing and Building Parallel Programs by Ian Foster www-unix.mcs.anl.gov/dbpp.
Parallel Computing Presented by Justin Reschke
1 of 14 Lab 2: Formal verification with UPPAAL. 2 of 14 2 The gossiping persons There are n persons. All have one secret to tell, which is not known to.
COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dr. Xiao Qin Auburn University
Application of Design Patterns to Geometric Decompositions V. Balaji, Thomas L. Clune, Robert W. Numrich and Brice T. Womack.
Auburn University
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Conception of parallel algorithms
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Communication Costs (cont.) Dr. Xiao.
Parallel Programming in C with MPI and OpenMP
Distributed System Structures 16: Distributed Structures
Chapter 5: Computer Systems Organization
Analysis models and design models
CS 584.
Parallel Programming in C with MPI and OpenMP
Presentation transcript:

ATmospheric, Meteorological, and Environmental Technologies RAMS Parallel Processing Techniques

ATmospheric, Meteorological, and Environmental Technologies RAMS Parallel Design Considerations Concentrating on larger problems (only 3-dimensional) No compromises in physics and numerics Target distributed memory, MIMD architectures workstation cluster first platform Single code version for single processor and parallel platforms No performance degradation on single processor platforms Dynamic load balancing needed

ATmospheric, Meteorological, and Environmental Technologies RAMS Parallel Components and Structure Main structure Memory structure Variable tables Domain decomposition Communication procedures Concurrent computation/communication Bookkeeping arrays File output Nested grid considerations Dynamic load balancing

ATmospheric, Meteorological, and Environmental Technologies RAMS Parallel Structure For parallel execution, serial (single-processor) code divided into two types: Master process Initialization All input/output functions Node (“compute”) processes All computation

ATmospheric, Meteorological, and Environmental Technologies RAMS Memory Structure Many options / array space / nests with variable numbers of grid points Dynamically-allocate “A” array “1-D” array with all memory “Pointers” to beginning of each field (integer indices in COMMON) Computational routines dynamically-dimension arrays Each node/master process allocates its own memory

ATmospheric, Meteorological, and Environmental Technologies RAMS Variable Tables Data file defines characteristics of each model array Existence (whether on node or master processes) Output Communication

ATmospheric, Meteorological, and Environmental Technologies RAMS VTABLE File # Velocity, perturbation Exner function UP : 1:3: 0:110000:vard:hist:anal:lite:mpti:mpt3:mpt2 UC : 2:3: 0:110000:vard:hist:mpti:mpt3 VP : 3:3: 0:110000:vard:hist:anal:lite:mpti:mpt3:mpt2 VC : 4:3: 0:110000:vard:hist:mpti:mpt3 WP : 5:3: 0:110000:vard:hist:anal:lite:mpti:mpt3:mpt2 WC : 6:3: 0:110000:vard:hist:mpti:mpt3 PP : 7:3: 0:110000:vard:hist:anal:lite:mpti:mpt3:mpt2 PC : 8:3: 0:110000:vard:hist:mpti:mpt3

ATmospheric, Meteorological, and Environmental Technologies Domain Decomposition Nodes exchange “overlap” rows (halo regions) with adjacent nodes

ATmospheric, Meteorological, and Environmental Technologies Domain decomposition Each column assigned a “work” factor Factor based on boundary, physics, machine performance, etc. Domain horizontally decomposed across processors “2-D” decomposition (3 or more nodes)

ATmospheric, Meteorological, and Environmental Technologies Timestep flow - single grid - serial Start Compute long timestep End123 Start Small or acoustic timesteps

ATmospheric, Meteorological, and Environmental Technologies Timestep flow - single grid - parallel StartEnd1 Node 1: Node 2: StartEnd1 Compute

ATmospheric, Meteorological, and Environmental Technologies Concurrent communication/computation Overlap rows sent first thing in timestep flow. Model processes that can be computed without overlap rows are done first. Sub-domain interior is computed for some terms Messages are received Sub-domain boundary terms are computed

ATmospheric, Meteorological, and Environmental Technologies Bookkeeping arrays Keep track of communication “paths” source node / destination node ipaths(info,itype,ngrids,nproc,nproc) message bounds machine id “type” of message grid number number of processor iget_paths(itype,ngrids,nproc,nproc)

ATmospheric, Meteorological, and Environmental Technologies Communication “types” Initialization / re-decomposition Master to node; Full sub-domains; Most variables Long timestep overlap region Node to node; 1 row; Prognostic variables Small timestep overlap region (2 events) Node to node; 1 row; Selected variables File output Node to master; Full sub-domains; Prognostic variables All node to node types package variables in a single message before sending.

ATmospheric, Meteorological, and Environmental Technologies File output Nodes transfer necessary data to master process Transfer done with message passing No local disk space or NFS needed for file output Nodes can continue computing Data available on master for other activities Parallel I/O not needed

ATmospheric, Meteorological, and Environmental Technologies Nested grid considerations all grids decomposed independently 2-way interactive communication interpolation of nested grid boundaries performed on fine grid nodes, coarse grid information sent to fine grid feedback (averaging) of fine grid information performed on fine grid node, averaged values sent to coarse grid minimizes data needed to be communicated

ATmospheric, Meteorological, and Environmental Technologies Dynamic load balancing Regions of domain can require additional physics as model progresses (microphysics, radiation) Nodes send timing information back to master at end of timestep At opportune times (coinciding with file output times), master process will: re-compute “work” factor re-decomposes the domain re-initializes the nodes

ATmospheric, Meteorological, and Environmental Technologies Performance Performance varies, dependent on architecture, communication hardware, model configuration Speedups and efficiencies Results % efficiency Efficiency: E = S / # processors Speed up: Time to run serial code S = Time to run parallel code