Flexible Control of Data Transfer between Parallel Programs Joe Shang-chieh Wu Alan Sussman Department of Computer Science University of Maryland, USA.

Slides:

Advertisements

Similar presentations

KANSEI TESTBED OHIO STATE UNIVERSITY. HETEREGENOUS TESTBED Multiple communication networks, computation platforms, multi-modal sensors/actuators, and.

Advertisements

MicroKernel Pattern Presented by Sahibzada Sami ud din Kashif Khurshid.

A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University

Dynamic Load Balancing for VORPAL Viktor Przebinda Center for Integrated Plasma Studies.

The InterComm-based CCA MxN components Hassan Afzal Alan Sussman University of Maryland.

1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.

Reference: Message Passing Fundamentals.

GridFlow: Workflow Management for Grid Computing Kavita Shinde.

Integrated Frameworks for Earth and Space Weather Simulation Timothy Killeen and Cecelia DeLuca National Center for Atmospheric Research, Boulder, Colorado.

Telescoping Languages: A Compiler Strategy for Implementation of High-Level Domain-Specific Programming Systems Ken Kennedy Rice University.

DCABES 2009 China University Of Geosciences 1 The Parallel Models of Coronal Polarization Brightness Calculation Jiang Wenqian.

A Parallel Structured Ecological Model for High End Shared Memory Computers Dali Wang Department of Computer Science, University of Tennessee, Knoxville.

Expanded Observatory support (redundancy, verification) CME (Empirical) propagation (Cone Model) (ICME strength and arrival time) Electrodynamics model.

Real-Time Kernels and Operating Systems. Operating System: Software that coordinates multiple tasks in processor, including peripheral interfacing Types.

Flexible and Efficient Control of Data Transfers for Loosely Coupled Components Joe Shang-Chieh Wu Department of Computer Science University.

On Fairness, Optimizing Replica Selection in Data Grids Husni Hamad E. AL-Mistarihi and Chan Huah Yong IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,

July 14, 2005 Shine 2005 Coupling Frameworks in Solar-Terrestrial Research Chuck Goodrich Boston University.

Numerical Grid Computations with the OPeNDAP Back End Server (BES)

1 Web Based Interface for Numerical Simulations of Nonlinear Evolution Equations Ryan N. Foster & Thiab Taha Department of Computer Science The University.

Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.

Course Outline DayContents Day 1 Introduction Motivation, definitions, properties of embedded systems, outline of the current course How to specify embedded.

Building Coupled Parallel and Distributed Scientific Simulations with InterComm Alan Sussman Department of Computer Science & Institute for Advanced Computer.

German National Research Center for Information Technology Research Institute for Computer Architecture and Software Technology German National Research.

1 Hybrid methods for solving large-scale parameter estimation problems Carlos A. Quintero 1 Miguel Argáez 1 Hector Klie 2 Leticia Velázquez 1 Mary Wheeler.

Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?

Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.

Institute for Mathematical Modeling RAS 1 Dynamic load balancing. Overview. Simulation of combustion problems using multiprocessor computer systems For.

CCA Common Component Architecture Manoj Krishnan Pacific Northwest National Laboratory MCMD Programming and Implementation Issues.

Coupling Parallel Programs via MetaChaos Alan Sussman Computer Science Dept. University of Maryland With thanks to Mike Wiltberger (Dartmouth/NCAR)

Computational issues in Carbon nanotube simulation Ashok Srinivasan Department of Computer Science Florida State University.

Initial Results from the Integration of Earth and Space Frameworks Cecelia DeLuca/NCAR, Alan Sussman/University of Maryland, Gabor Toth/University of Michigan.

Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,

international organization design and control

Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec.

Building a Parallel File System Simulator E Molina-Estolano, C Maltzahn, etc. UCSC Lab, UC Santa Cruz. Published in Journal of Physics, 2009.

Chapter 101 Multiprocessor and Real- Time Scheduling Chapter 10.

Efficient Integration of Large Stiff Systems of ODEs Using Exponential Integrators M. Tokman, M. Tokman, University of California, Merced 2 hrs 1.5 hrs.

Materials Process Design and Control Laboratory Finite Element Modeling of the Deformation of 3D Polycrystals Including the Effect of Grain Size Wei Li.

By Garrett Kelly. 3 types or reasons for distributed applications Data Data used by the application is distributed Computation Computation is distributed.

NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.

CLUSTER COMPUTING TECHNOLOGY BY-1.SACHIN YADAV 2.MADHAV SHINDE SECTION-3.

Distribution and components. 2 What is the problem? Enterprise computing is Large scale & complex: It supports large scale and complex organisations Spanning.

MobiQuitous 2007 Towards Scalable and Robust Service Discovery in Ubiquitous Computing Environments via Multi-hop Clustering Wei Gao.

Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo.

Progress on Component-Based Subsurface Simulation I: Smooth Particle Hydrodynamics Bruce Palmer Pacific Northwest National Laboratory Richland, WA.

An Evaluation of Partitioners for Parallel SAMR Applications Sumir Chandra & Manish Parashar ECE Dept., Rutgers University Submitted to: Euro-Par 2001.

Ohio State University Department of Computer Science and Engineering Servicing Range Queries on Multidimensional Datasets with Partial Replicas Li Weng,

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.

General requirements for BES III offline & EF selection software Weidong Li.

1 Rocket Science using Charm++ at CSAR Orion Sky Lawlor 2003/10/21.

C OMPUTATIONAL R ESEARCH D IVISION 1 Defining Software Requirements for Scientific Computing Phillip Colella Applied Numerical Algorithms Group Lawrence.

Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From

Parallel Computing Presented by Justin Reschke

A TIME-GCM CAM Multi-executable Coupled Model Using ESMF and InterComm Robert Oehmke, Michael Wiltberger, Alan Sussman, Wenbin Wang, and Norman Lo.

VIEWS b.ppt-1 Managing Intelligent Decision Support Networks in Biosurveillance PHIN 2008, Session G1, August 27, 2008 Mohammad Hashemian, MS, Zaruhi.

Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.

Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.

Servicing Seismic and Oil Reservoir Simulation Data through Grid Data Services Sivaramakrishnan Narayanan, Tahsin Kurc, Umit Catalyurek and Joel Saltz.

Application of Design Patterns to Geometric Decompositions V. Balaji, Thomas L. Clune, Robert W. Numrich and Brice T. Womack.

Xing Cai University of Oslo

Distribution and components

Grid Computing.

University of Technology

Anne Pratoomtong ECE734, Spring2002

Integration of MATLAB with Cometa infrastructure:

A Domain Decomposition Parallel Implementation of an Elasto-viscoplasticCoupled elasto-plastic Fast Fourier Transform Micromechanical Solver with Spectral.

Department of Computer Science, University of Tennessee, Knoxville

Ph.D. Thesis Numerical Solution of PDEs and Their Object-oriented Parallel Implementations Xing Cai October 26, 1998.

Automatic and Efficient Data Virtualization System on Scientific Datasets Li Weng.

L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher

Presentation transcript:

Flexible Control of Data Transfer between Parallel Programs Joe Shang-chieh Wu Alan Sussman Department of Computer Science University of Maryland, USA

Grid Corona and solar wind Global magnetospheric MHD Thermosphere- ionosphere model Rice convection model Particle and Hybrid model

Grid What is the problem? Coupling existing (parallel) programs –for physical simulations more accurate answers can be obtained –for visualization, flexible transmission of data between simulation and visualization codes Exchange data across shared or overlapped regions in multiple parallel programs Couple multi-scale (space & time) programs Focus on multiple time scale problems (when to exchange data)

Grid Roadmap Motivation Approximate Matching Matching properties Performance results Conclusions and future work

Grid Is it important? Petroleum reservoir simulations – multi- scale, multi-resolution code Special issue in May/Jun 2004 of IEEE Computing in Science & Engineering “It’s then possible to couple several existing calculations together through an interface and obtain accurate answers.” Earth System Modeling Framework several US federal agencies and universities. (

Grid Solving multiple space scales 1.Appropriate tools 2.Coordinate transformation 3.Domain knowledge

Grid Matching is OUTSIDE components Separate matching (coupling) information from the participating components –Maintainability – Components can be developed/upgraded individually –Flexibility – Change participants/components easily –Functionality – Support variable-sized time interval numerical algorithms or visualizations Matching information is specified separately by application integrator Runtime match via simulation time stamps

Grid Separate codes from matching define region Sr12 define region Sr4 define region Sr5... Do t = 1, N, Step0... // computation jobs export(Sr12,t) export(Sr4,t) export(Sr5,t) EndDo define region Sr0... Do t = 1, M, Step1 import(Sr0,t)... // computation jobs EndDo Importer Ap1 Exporter Ap0 Ap1.Sr0 Ap2.Sr0 Ap4.Sr0 Ap0.Sr12 Ap0.Sr4 Ap0.Sr5 Configuration file # Ap0 cluster0 /bin/Ap Ap1 cluster1 /bin/Ap Ap2 cluster2 /bin/Ap Ap4 cluster4 /bin/Ap4 4 # Ap0.Sr12 Ap1.Sr0 REGL 0.05 Ap0.Sr12 Ap2.Sr0 REGU 0.1 Ap0.Sr4 Ap4.Sr0 REG 1.0 #

Grid Matching implementation Library is implemented with POSIX threads Each process in each program uses library threads to exchange control information in the background, while applications are computing in the foreground One process in each parallel program runs an extra representative thread to exchange control information between parallel programs –Minimize communication between parallel programs –Keep collective correctness in each parallel program –Improve overall performance

Grid Approximate Matching Exporter Ap0 produces a sequence of data object A at simulation times 1.1, 1.2, 1.5, and Importer Ap1 requests the same data object A at time 1.3 Is there a match for If Yes, which one and why?

Grid Supported matching policies = LUBminimum f(x) with f(x) ≥ x GLBmaximum f(x) with f(x) ≤ x REGf(x) minimizes |f(x)-x| with |f(x)-x| ≤ p REGUf(x) minimizes f(x)-x with 0 ≤ f(x)-x ≤ p REGLf(x) minimizes x-f(x) with 0 ≤ x-f(x) ≤ p FASTRany f(x) with |f(x)-x| ≤ p FASTUany f(x) with 0 ≤ f(x)-x ≤ p FASTLany f(x) with 0 ≤ x-f(x) ≤ p

Grid Acceptable ≠ Matchable t e’ t e’’

Grid Region-type matches te’te’

Grid Experimental setup Question : How much overhead introduced by runtime matching? 6 PIII-600 processors, connected by channel-bonded Fast Ethernet u tt = u xx + u yy + f(t,x,y), solve 2-d diffusion equation by the finite element method. u(t,x,y) : 512x512 array, on 4 processors (Ap1) f(t,x,y) : 32x512 array, on 2 processors (Ap2) All data in Ap2 is sent (exported) to Ap1 using matching criterion Ap1 receives (imports) data with 3 different scenarios matches made for each scenario (results averaged over multiple runs)

Grid Experiment result 1 P10P11P12P13 Case A341ms336ms610ms614ms Case B620ms618ms Case C624ms612ms340ms339ms Ap1 execution time (average)

Grid Experiment result 2 Do t = 1, N import (data, t) compute u EndDo Do t = 1, N Request a match for Receive data compute u EndDo Matching timeData Transfer timeComputation TimeMatching Overhead Case A944us6.1ms605ms13% Case B708us2.9ms613ms20% Case C535us6.8ms614ms7% Ap1 pseudo code Ap1 overhead in the slowest process

Grid Experiment result 3 Slowest ProcessFastest Process Case A944us (P13)4394us (P11) Case B708us (P10)3468us (Others) Case C535us (P10)3703us (P13) Comparison of matching time Fastest process (P11) - high cost, remote match Slowest process (P13) - low cost, local match High cost match can be hidden

Grid Conclusions & Future work Conclusions –Low overhead approach for flexible data exchange between different time scale e- Science components Ongoing & future work –Performance experiments in Grid environment –Caching strategies to efficiently deal with slow importers –Real applications – space weather is the first one

End of Talk

Grid Main components

Grid Local and Remote requests

Grid Space Science Application