1 M. Tudruj, J. Borkowski, D. Kopanski Inter-Application Control Through Global States Monitoring On a Grid Polish-Japanese Institute of Information Technology,

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

1 GRID applications control based on synchronizers, D. Kopanski *, J. Borkowski *, M. Tudruj The Cracow Grid Workshop 2004 D. Kopanski *, J. Borkowski.
Construction process lasts until coding and testing is completed consists of design and implementation reasons for this phase –analysis model is not sufficiently.
System Integration and Performance
Parallel Processing & Parallel Algorithm May 8, 2003 B4 Yuuki Horita.
Virtual Time “Virtual Time and Global States of Distributed Systems” Friedmann Mattern, 1989 The Model: An asynchronous distributed system = a set of processes.
Lecture 8: Asynchronous Network Algorithms
Lecture 8: Three-Level Architectures CS 344R: Robotics Benjamin Kuipers.
Synchronization Chapter clock synchronization * 5.2 logical clocks * 5.3 global state * 5.4 election algorithm * 5.5 mutual exclusion * 5.6 distributed.
Visual Solution to High Performance Computing Computer and Automation Research Institute Laboratory of Parallel and Distributed Systems
Ordering and Consistent Cuts Presented By Biswanath Panda.
CPSC 668Set 16: Distributed Shared Memory1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
OS Fall ’ 02 Introduction Operating Systems Fall 2002.
ECE669 L5: Grid Computations February 12, 2004 ECE 669 Parallel Computer Architecture Lecture 5 Grid Computations.
1: Operating Systems Overview
Concurrency CS 510: Programming Languages David Walker.
Scheduling with Optimized Communication for Time-Triggered Embedded Systems Slide 1 Scheduling with Optimized Communication for Time-Triggered Embedded.
WSN Simulation Template for OMNeT++
CS533 - Concepts of Operating Systems
6/27/2015Page 1 This presentation is based on WS-Membership: Failure Management in Web Services World B. Ramamurthy Based on Paper by Werner Vogels and.
Rensselaer Polytechnic Institute CSC 432 – Operating Systems David Goldschmidt, Ph.D.
Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.
Distributed Systems Foundations Lecture 1. Main Characteristics of Distributed Systems Independent processors, sites, processes Message passing No shared.
DEMONSTRATION FOR SIGMA DATA ACQUISITION MODULES Tempatron Ltd Data Measurements Division Darwin Close Reading RG2 0TB UK T : +44 (0) F :
Time Warp OS1 Time Warp Operating System Presenter: Munehiro Fukuda.
Lecture 12 Today’s topics –CPU basics Registers ALU Control Unit –The bus –Clocks –Input/output subsystem 1.
OpenCL Introduction A TECHNICAL REVIEW LU OCT
Computer System Architectures Computer System Software
Real-Time Software Design Yonsei University 2 nd Semester, 2014 Sanghyun Park.
LOGO OPERATING SYSTEM Dalia AL-Dabbagh
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
Introduction Distributed Algorithms for Multi-Agent Networks Instructor: K. Sinan YILDIRIM.
Computer and Automation Research Institute Hungarian Academy of Sciences Presentation and Analysis of Grid Performance Data Norbert Podhorszki and Peter.
A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Scientific Computing By: Fatima Hallak To: Dr. Guy Tel-Zur.
A performance evaluation approach openModeller: A Framework for species distribution Modelling.
Multiprocessor and Real-Time Scheduling Chapter 10.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Introduction to Concurrency.
“Software” Esterel Execution (work in progress) Dumitru POTOP-BUTUCARU Ecole des Mines de Paris
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
Issues with Clocks. Context The tree correction protocol was based on the idea of local detection and correction. Protocols of this type are complex to.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
XII.1 Debugging of Distributed Systems. XII.2 Debugging of Distributed Systems Example of a tool for distributed systems Approach to fault search during.
PARALLEL APPLICATIONS EE 524/CS 561 Kishore Dhaveji 01/09/2000.
SOFTWARE DESIGN. INTRODUCTION There are 3 distinct types of activities in design 1.External design 2.Architectural design 3.Detailed design Architectural.
Chapter 7 -1 CHAPTER 7 PROCESS SYNCHRONIZATION CGS Operating System Concepts UCF, Spring 2004.
Data Communications and Networking Overview
A Software Framework for Distributed Services Michael M. McKerns and Michael A.G. Aivazis California Institute of Technology, Pasadena, CA Introduction.
Silberschatz, Galvin and Gagne  Operating System Concepts UNIT II Operating System Services.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 3: Process-Concept.
Lamport's Scalar clocks and Singhal-Kshemkalyani’s VC Algorithms
Network Systems Lab. Korea Advanced Institute of Science and Technology No.1 Ch. 1 Introduction EE692 Parallel and Distribution Computation | Prof. Song.
Distributed Mutual Exclusion Synchronization in Distributed Systems Synchronization in distributed systems are often more difficult compared to synchronization.
Reachability Testing of Concurrent Programs1 Reachability Testing of Concurrent Programs Richard Carver, GMU Yu Lei, UTA.
IHP Im Technologiepark Frankfurt (Oder) Germany IHP Im Technologiepark Frankfurt (Oder) Germany ©
PDEVS Protocol Performance Prediction using Activity Patterns with Finite Probabilistic DEVS DEMO L. Capocchi, J.F. Santucci, B.P. Zeigler University of.
Clock Snooping and its Application in On-the-fly Data Race Detection Koen De Bosschere and Michiel Ronsse University of Ghent, Belgium Taipei, TaiwanDec.
TensorFlow– A system for large-scale machine learning
“Language Mechanism for Synchronization”
Parallel Programming By J. H. Wang May 2, 2017.
Lecture 9: Asynchronous Network Algorithms
Introduction to locality sensitive approach to distributed systems
Outline Distributed Mutual Exclusion Introduction Performance measures
Multiprocessor and Real-Time Scheduling
Basics of Distributed Systems
Lecture 8 Processes and events Local and global states Time
William Stallings Computer Organization and Architecture
Advanced Computer Architecture Lecture 3
Presentation transcript:

1 M. Tudruj, J. Borkowski, D. Kopanski Inter-Application Control Through Global States Monitoring On a Grid Polish-Japanese Institute of Information Technology, Koszykowa 86, Warsaw, Poland

2  Arrows represent reliable, asynchronous communication channels P1 P2 P3 P4 S a state information S b control Processes can communicate with a number of Synchronizers. Synchronizers learn state information from processes and send back control information. synchronizers processes Monitoring global states

3 There is no global clock, no shared memory Synchronizer must be able to order properly incoming events to build Strongly Consistent Global States (SCGS) SCGS is a combination of process local states, one state from each process, such that the local states are pairwise concurrent. E.g. is a SCGS, is not. P1 P2 sync e1 e2 f1 f2 s1 t2 t1 m1m2 Monitoring consistent global states

4 Events must have timestamps to be able to order messages correctly. Logical vector clocks or real time intervals based on roughly synchronized local clocks can be used If process local clocks are synchronized with a known accuracy, then real time interval timestamps can be used to identify SCGS Strongly Consistent Global States

5 Computation activation and cancellation caused by predicate evaluation

6 A complete graphical programming environment for developing message passing applications designed at Parallel and Distributed Systems Laboratory of the SZTAKI Institute of Hungarian Academy of Sciences Application level specifies processes and their interconnections Process level defines control flow diagram of a process Text level is used to enter sequential C code into elements of a flow diagram GRADE system

7 standard message passing channels local state info transfer channels signal transfer channels GRADE extension – PS-GRADE State information monitoring

8 PS-GRADE - synchronizer

9 condition send signal reception of state variables PS-GRADE– synchronizer control flow window condition window

10 Start signal-sensitive region "watching- signal" End signal-sensitive region "endwatching- signal" Resume interrupted computations Cancel computations Send state End signal- insensitive region Start signal- insensitive region PS-GRADE– Process - control flow window

11 PS-GRADE– synchronizer hierarchy

12 The principles of Grid application control Control of Grid application by: Data control flow (similarly to P-GRADE Workflow implemented by SZTAKI), based on input and output files for cluster application Grid Synchronizer : Collects information (vector of state) about application state Detects SCGS or OGS Computes conditions Sends signals to the application

13 A Grid-level synchronizer inserted into a workflow graph A1 1 A2 A3 1 1 A A5 1 2 Synch A

14 A Grid-level synchronizer and an application GRID Synchronizer Application A2 Application A3 Application A4 Application A5

15 Grid PS-GRADE environment general structure

16 Simple workflow. A selected application starts executing after a set of selected applications is completed. Example: complicated scientific computations performed layer by layer in different computer networks. Organizing Grid-level program execution control The following co-operation schemes included into the proposed Grid environment will be discussed:

17 Alternative workflow. One of several applications is selected for execution depending on the results (state) of former applications. Example: one of two available program packages is run depending on computation results performed so far. Partial canceling of workflow: Applications that become superfluous from the point of view of the general purpose of computation are stopped. Example: The exhaustive parallel search on a Grid for the optimal solution in a solution space is stopped or restricted when the search provides a satisfactory solution.

18 Supporting workflow: A set of currently executed applications require activation of auxiliary applications, which will provide useful results. Example: In a coarse grain simulation of a system of moving objects a collision that appears, stimulates a change in the Grid application configuration. An application which models the collision in a detailed way (with a fine granularity of events) is activated. After detailed simulation of the collision the coarse grain of the simulation process is restored. Workflow coupling : A common (global) status of many applications is monitored and control directives are distributed to particular applications of needed. Applications compute parameters that are subject to mutual exchange. Some parameters in meta-level applications are updated with the use of results of some auxiliary computations

19 Example – A Grid TSP application structure

20 The TSP application Synch1 conditions: DataRequest, MinDist, FinishCond (left to right)

21 The TSP application B&B part: condition window Heuristic search Application structure search process application structure B&B part: communication diagram

22 The paper has presented how the synchronization-based parallel application control can be extended and ported onto the GRID level. With the use of the proposed method we can create an advanced control of many applications running in the GRID environment. Inter–application coordination between programs, which are executed on different GRID sites, is supported. We identify five types of Grid-level program execution control The presented example shows that the new programming environment provides convenient means for designing complicated Grid applications control. Being on the Grid, we can extend the time consuming parts of the applications and run it on any available clusters during the middle stage of the algorithm. The best results from the heuristic part of the application obtained in a shorter time than by the B&B computations can support faster finding of the exact solution. Conclusions