Flexible and Efficient Control of Data Transfers for Loosely Coupled Components Joe Shang-Chieh Wu Department of Computer Science University.

Slides:



Advertisements
Similar presentations
Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY
Advertisements

Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.
Phillip Dickens, Department of Computer Science, University of Maine. In collaboration with Jeremy Logan, Postdoctoral Research Associate, ORNL. Improving.
Master/Slave Architecture Pattern Source: Pattern-Oriented Software Architecture, Vol. 1, Buschmann, et al.
1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.
Accurately Approximating Superscalar Processor Performance from Traces Kiyeon Lee, Shayne Evans, and Sangyeun Cho Dept. of Computer Science University.
The InterComm-based CCA MxN components Hassan Afzal Alan Sussman University of Maryland.
Presented by: Priti Lohani
Reference: Message Passing Fundamentals.
SECTION 13.3 Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
GridFlow: Workflow Management for Grid Computing Kavita Shinde.
Computer Science Department 1 Load Balancing and Grid Computing David Finkel Computer Science Department Worcester Polytechnic Institute.
Extensible Scalable Monitoring for Clusters of Computers Eric Anderson U.C. Berkeley Summer 1997 NOW Retreat.
Algorithms (Contd.). How do we describe algorithms? Pseudocode –Combines English, simple code constructs –Works with various types of primitives Could.
DDDDRRaw: A Prototype Toolkit for Distributed Real-Time Rendering on Commodity Clusters Thu D. Nguyen and Christopher Peery Department of Computer Science.
Chapter 13 Embedded Systems
SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
Flexible Control of Data Transfer between Parallel Programs Joe Shang-chieh Wu Alan Sussman Department of Computer Science University of Maryland, USA.
1 Web Based Interface for Numerical Simulations of Nonlinear Evolution Equations Ryan N. Foster & Thiab Taha Department of Computer Science The University.
Building Coupled Parallel and Distributed Scientific Simulations with InterComm Alan Sussman Department of Computer Science & Institute for Advanced Computer.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Institute for Mathematical Modeling RAS 1 Dynamic load balancing. Overview. Simulation of combustion problems using multiprocessor computer systems For.
Scaling to New Heights Retrospective IEEE/ACM SC2002 Conference Baltimore, MD.
High Performance I/O and Data Management System Group Seminar Xiaosong Ma Department of Computer Science North Carolina State University September 12,
Trace Generation to Simulate Large Scale Distributed Application Olivier Dalle, Emiio P. ManciniMar. 8th, 2012.
SUMA: A Scientific Metacomputer Cardinale, Yudith Figueira, Carlos Hernández, Emilio Baquero, Eduardo Berbín, Luis Bouza, Roberto Gamess, Eric García,
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
A Dynamic Data Grid Replication Strategy to Minimize the Data Missed Ming Lei, Susan Vrbsky, Xiaoyan Hong University of Alabama.
Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.
Efficient Integration of Large Stiff Systems of ODEs Using Exponential Integrators M. Tokman, M. Tokman, University of California, Merced 2 hrs 1.5 hrs.
Stochastic DAG Scheduling using Monte Carlo Approach Heterogeneous Computing Workshop (at IPDPS) 2012 Extended version: Elsevier JPDC (accepted July 2013,
1 M. Tudruj, J. Borkowski, D. Kopanski Inter-Application Control Through Global States Monitoring On a Grid Polish-Japanese Institute of Information Technology,
Algorithmic Finance and Tools for Grid Execution (the Swift Grid Scripting/Workflow tool) Tiberiu (Tibi) Stef-Praun.
A Dynamic Service Deployment Infrastructure for Grid Computing or Why it’s good to be Jobless Paul Watson School of Computing Science.
Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.
Distribution and components. 2 What is the problem? Enterprise computing is Large scale & complex: It supports large scale and complex organisations Spanning.
Multilevel Parallelism using Processor Groups Bruce Palmer Jarek Nieplocha, Manoj Kumar Krishnan, Vinod Tipparaju Pacific Northwest National Laboratory.
O PTIMAL SERVICE TASK PARTITION AND DISTRIBUTION IN GRID SYSTEM WITH STAR TOPOLOGY G REGORY L EVITIN, Y UAN -S HUN D AI Adviser: Frank, Yeong-Sung Lin.
ATmospheric, Meteorological, and Environmental Technologies RAMS Parallel Processing Techniques.
1 THE EARTH SIMULATOR SYSTEM By: Shinichi HABATA, Mitsuo YOKOKAWA, Shigemune KITAWAKI Presented by: Anisha Thonour.
Scheduling MPI Workflow Applications on Computing Grids Juemin Zhang, Waleed Meleis, and David Kaeli Electrical and Computer Engineering Department, Northeastern.
Lecture 3: Computer Architectures
1 Rocket Science using Charm++ at CSAR Orion Sky Lawlor 2003/10/21.
Vincenzo Innocente, CERN/EPUser Collections1 Grid Scenarios in CMS Vincenzo Innocente CERN/EP Simulation, Reconstruction and Analysis scenarios.
C OMPUTATIONAL R ESEARCH D IVISION 1 Defining Software Requirements for Scientific Computing Phillip Colella Applied Numerical Algorithms Group Lawrence.
Introduction Contain two or more CPU share common memory and peripherals. Provide greater system throughput. Multiple processor executing simultaneous.
Parallel IO for Cluster Computing Tran, Van Hoai.
Background Computer System Architectures Computer System Software.
Intro to Distributed Systems Hank Levy. 23/20/2016 Distributed Systems Nearly all systems today are distributed in some way, e.g.: –they use –they.
1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.
Application of Design Patterns to Geometric Decompositions V. Balaji, Thomas L. Clune, Robert W. Numrich and Brice T. Womack.
Group Members Hamza Zahid (131391) Fahad Nadeem khan Abdual Hannan AIR UNIVERSITY MULTAN CAMPUS.
Virtual memory.
Paper Presentation Prepared by Dindar Öz
Distribution and components
Grid Computing.
Auburn University COMP7500 Advanced Operating Systems I/O-Aware Load Balancing Techniques (2) Dr. Xiao Qin Auburn University.
Collaborative Offloading for Distributed Mobile-Cloud Apps
Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin
Haiyan Meng and Douglas Thain
Lock Ahead: Shared File Performance Improvements
Parallel and Multiprocessor Architectures – Shared Memory
Introduction to locality sensitive approach to distributed systems
Chapter 2: Operating-System Structures
Overview of Workflows: Why Use Them?
Virtual Memory: Working Sets
Chapter 2: Operating-System Structures
Process/Code Migration and Cloning
FREERIDE: A Framework for Rapid Implementation of Datamining Engines
Time Zoya Yeprem.
Presentation transcript:

Flexible and Efficient Control of Data Transfers for Loosely Coupled Components Joe Shang-Chieh Wu Department of Computer Science University of Maryland, USA

What & How Obtain more accurate results by coupling existing (parallel) physical simulation components Different time and space scales for data produced in shared or overlapped regions Runtime decisions for which time-stamped data objects should be exchanged Performance might be a concern

Roadmap Approximate Match [Grid 2004] Collective Buffering [IPDPS 2007] Distributed App Match + Eager Transfer [under submission] Conclusion

Matching is OUTSIDE components Separate matching (coupling) information from the participating components  Maintainability – Components can be developed/upgraded individually  Flexibility – Change participants/components easily  Functionality – Support variable-sized time interval numerical algorithms or visualizations

Distributed Array Transfer Library Basic Operation Runtime-based Approximate Match Library Importer component Request Array for T = 2.5 Matched Array for T = 3 Approximate Match Exporter component T=4 T=3 T=2 Exported Distributed Array Imported Distributed Array Arrays are distributed among multiple processes T=1

Separate codes from matching define region R1 define region R4 define region R5... Do t = 1, N, Step0... // computation jobs export(R1,t) export(R4,t) export(R5,t) EndDo define region R2... Do t = 1, M, Step1 import(R2,t)... // computation jobs EndDo Importer App1 Exporter App0 Configuration file # App0 cluster0 /bin/App App1 cluster1 /bin/App App2 cluster2 /bin/App App4 cluster4 /bin/App4 4 # App0.R1 App4.R0 REGL 0.05 App0.R1 App2.R0 REG 0.1 App0.R4 App1.R2 REGU 0.5 # Connection-Wise Approximate Match Policy Precision Find t’ in App0, s.t. (a) t <= t’ <= t (b) minimize t’ – t Sourc e Sink

Execution time is composed of  Computation time (T comp )  Buffering time (T buf )  Matched data transfer time (T tran ) T buf matters when exporter components (data sources) run more slowly T tran matters when import components (data sinks) run more slowly Dissection of Execution Time

Collective Buffering (when exporters run more slowly) Fastest export process sends runtime match results to slower processes in the same program Unnecessary memory copies can be avoided in slower processes Optimal State: only required exported data are buffered

Collective Buffering Result Data Exporting Time for the Slowest Process Copy All Copy Some Only Copy Required Optimal State

Eager Transfer + Distributed Match (when importer runs more slowly) Bandwidth and Latency both contribute matched data transfer time Eager transfer, transferring predicted data in advance, solves bandwidth issue Distributed approximate match, running on both exporter and importer, solves latency issue

Original ET Only ET+DM

Conclusion Runtime-based approximate match is a solution to couple different time scale components Performance can be improved –When exporter runs more slowly, avoid unnecessary memory copies –When importer runs more slowly, transfer predicted data and meta-data in advance

The End

Questions ? (

Distributed Array Transfer Library Basic Operation Runtime-based Approximate Match Library Importer component Request Array for T = 2.5 Matched Array for T = 3 Approximate Match Exporter component T=4 T=3 T=2 Exported Distributed Array Imported Distributed Array Arrays are distributed among multiple processes T=1

On-Demand Approach Import Component Makes Request Perform Approx Match on Export Component, and then Transfer Matched Data Need Data Transfer Time (T 3 – T 2 ) and 2 one-way delays (T 2 – T 1 )

Eager Transfer Only Get permission to push predicted data Transfer predicted data in advance Import component makes request Perform approx match on export component Need 2 one-way delays ( T 16 – T 15 )

Eager Transfer With Distributed Match … Transfer predicted data + meta-data in advance Import component makes request becomes local operations Local operation time T 26 – T 25 is needed, independent to one- way delay

All Together

Supported matching policies = LUBminimum f(x) with f(x) ≥ x GLBmaximum f(x) with f(x) ≤ x REGf(x) minimizes |f(x)-x| with |f(x)-x| ≤ p REGUf(x) minimizes f(x)-x with 0 ≤ f(x)-x ≤ p REGLf(x) minimizes x-f(x) with 0 ≤ x-f(x) ≤ p FASTRany f(x) with |f(x)-x| ≤ p FASTUany f(x) with 0 ≤ f(x)-x ≤ p FASTLany f(x) with 0 ≤ x-f(x) ≤ p