Building the Computational Infrastructure for DART

Slides:



Advertisements
Similar presentations
Nimrod/K: Towards Massively Parallel Dynamic Grid Workflows David Abramson, Colin Enticott, Monash Ilkay Altinas, UCSD.
Advertisements

1 Flexible IO Services in the Kepler Grid Workflow System David Abramson Jagan Kommineni Ilkay Altintas
Database System Concepts and Architecture
Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.
Using DSVM to Implement a Distributed File System Ramon Lawrence Dept. of Computer Science
Technical Architectures
6th Biennial Ptolemy Miniconference Berkeley, CA May 12, 2005 Distributed Computing in Kepler Ilkay Altintas Lead, Scientific Workflow Automation Technologies.
Chapter 2 Database Environment.
Introduction to Kepler Deana Pennington, PhD University of New Mexico LTER Network Office, Sevilleta LTER PI CI-Team: Advancing CI-Based Science through.
Architectural Design Establishing the overall structure of a software system Objectives To introduce architectural design and to discuss its importance.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Introduction to Databases Transparencies 1. ©Pearson Education 2009 Objectives Common uses of database systems. Meaning of the term database. Meaning.
Database Environment 1.  Purpose of three-level database architecture.  Contents of external, conceptual, and internal levels.  Purpose of external/conceptual.
Biology.sdsc.edu CIPRes in Kepler: An integrative workflow package for streamlining phylogenetic data analyses Zhijie Guan 1, Alex Borchers 1, Timothy.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
The Old World Meets the New: Utilizing Java Technology to Revitalize and Enhance NASA Scientific Legacy Code Michael D. Elder Furman University Hayden.
1 CMPT 275 High Level Design Phase Architecture. Janice Regan, Objectives of Design  The design phase takes the results of the requirements analysis.
Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
Pipelines and Scientific Workflows with Ptolemy II Deana Pennington University of New Mexico LTER Network Office Shawn Bowers UCSD San Diego Supercomputer.
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
Crystal-25 April The Rising Power of the Web Browser: Douglas du Boulay, Clinton Chee, Romain Quilici, Peter Turner, Mathew Wyatt. Part of a.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
Component Technology. Challenges Facing the Software Industry Today’s applications are large & complex – time consuming to develop, difficult and costly.
Research Design for Collaborative Computational Approaches and Scientific Workflows Deana Pennington January 8, 2007.
Large Scale Nuclear Physics Calculations in a Workflow Environment and Data Provenance Capturing Fang Liu and Masha Sosonkina Scalable Computing Lab, USDOE.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Processes Introduction to Operating Systems: Module 3.
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Vidhya Sivasankaran.
1 Chapter 1 Introduction to Databases Transparencies.
GLOBAL EDGE SOFTWERE LTD1 R EMOTE F ILE S HARING - Ardhanareesh Aradhyamath.
SEEK Science Environment for Ecological Knowledge l EcoGrid l Ecological, biodiversity and environmental data l Computational access l Standardized, open.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
Staging of the Ecological Niche Modeling Mammal Prototype Project Deana Pennington University of New Mexico December 14, 2004.
System Architecture CS 560. Project Design The requirements describe the function of a system as seen by the client. The software team must design a system.
Introduction to OOAD and UML
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Distributed Systems Architectures. Topics covered l Client-server architectures l Distributed object architectures l Inter-organisational computing.
CompSci 280 S Introduction to Software Development
PLM, Document and Workflow Management
File System Implementation
Grid Portal Services IeSE (the Integrated e-Science Environment)
The Client/Server Database Environment
CSC 480 Software Engineering
EIN 6133 Enterprise Engineering
Hierarchical Architecture
University of Technology
#01 Client/Server Computing
Database System Concepts and Architecture
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
B. N. Bershad, T. E. Anderson, E. D. Lazowska and H. M
Chapter 2 Database Environment Pearson Education © 2009.
Outline Midterm results summary Distributed file systems – continued
Mobile Agents.
Database Environment Transparencies
Distributed Object-based systems
Design and Implementation
Software models - Software Architecture Design Patterns
Introduction to Operating Systems
Outline Chapter 2 (cont) OS Design OS structure
MORE ON ARCHITECTURES The main reasons for using an architecture are maintainability and performance. We want to structure the software into reasonably.
Chapter 15: File System Internals
Mark McKelvin EE249 Embedded System Design December 03, 2002
Outline Operating System Organization Operating System Examples
System calls….. C-program->POSIX call
Architectural Mismatch: Why reuse is so hard?
#01 Client/Server Computing
Presentation transcript:

Building the Computational Infrastructure for DART David Abramson Jagan Kommineni Tim Ho Ilkay Altinas

Outline The GriddLeS IO Library Kepler & Grid Workflows Kepler + GriddLeS = Flexible Workflows Transparent Data Replication (SI5) Active Data (SI6)

GriddLeS: Reusing Legacy Code Legacy applications within the workflow rather than rewriting new programs. Existing programs Are often written in a range of legacy languages such as Fortran and C Often use conventional file IO operations like READ and WRITE. May be old and are not well suited to modification. end if deltt = deltt2 * 0.5 do 100 m=1,mx do 200 j=1,jx if(j.eq.1.and.m.eq.1) go to 200 l = j+m-2 kl = float(l*(l+1)) dkl = kl-2. c Apply the horizontal diffusion pt(j,m) = pt(j,m) - dkl*hdiff*pm(j,m) ct(j,m) = ct(j,m) - dkl*hdiff*cm(j,m) zt(j,m) = zt(j,m) - dkl*hdiff*zm(j,m) ppv=pm(j,m)+deltt2*pt(j,m) if ( imp.eq.1 ) then c Do a semi-implicit time step ccv = ( cm(j,m) + deltt2* ( ct(j,m) + kl*( zm(j,m) + & deltt*(zt(j,m)-zmean*cm(j,m)*.5))))/ & ( 1. + deltt*deltt*kl*zmean ) zzv = zm(j,m) + deltt2*( zt(j,m) - zmean*(cm(j,m)+ccv)*.5 ) else c Do an explicit time step ccv=cm(j,m)+deltt2*(ct(j,m)+kl*z(j,m)) zzv = zm(j,m) + deltt2*( zt(j,m) - zmean*c(j,m) ) if (ifirst.eq.0) then c Here we do the Asselin time filtering. Note we filter AND update c ( [alpha]m=[alpha] ), so the '...m' appears on lhs rather than the c current values. nb that ppv is the future p value at this stage. pm(j,m)=p(j,m) + vnu*(pm(j,m)-2.*p(j,m)+ppv) cm(j,m)=c(j,m) + vnu*(cm(j,m)-2.*c(j,m)+ccv) zm(j,m)=z(j,m) + vnu*(zm(j,m)-2.*z(j,m)+zzv) p(j,m)=ppv c(j,m)=ccv z(j,m)=zzv c Do a forward time step w/o updating the previous step values or time c filtering c(j,m) = ccv z(j,m) = zzv 200 continue 100 continue c c turn off forward timestep flag (may already be off) ifirst=0 return end + Workstations The Grid

GriddLeS Legacy applications need to be shielded from IO details in Grid Local files Remote files Replicated files Producer-consumer pipes Don’t want to lock in IO model when application is written (or even Grid Enabled) Choice of IO model should be Dynamic Late bound

Flexible IO in GriddLeS Late bound decision Local File read() write() seek() Remote File close() open() FileMultiplexer Cache GRS Remote Application Process Legacy Application Replica Replica Replica Replica

Interprocess Communication in GriddLeS Writer Application Reader Application fd = open(‘blah’, “w”); : write(fd, …..) fd = open(‘blah’, “r”); : read(fd, …..) blah socket cache

open, read, write, seek, close, stat GriddLeS Implementation Application FileMultiplexer FileMultiplexer: A small piece of software placed between the application and the operating system Trapping mechanism open, read, write, seek, close, stat Grid Buffer Client GNS Client SRB Client/ Globus Replica Client GForm Client Operating System Application attempts certain certain system calls, the FileMultiplexer grabs control and manipulates the results by using client modules (such as web service client, srb client, Globus replica client and gform client).

GriddLeS Architecture Application Write, Read, etc Grid Buffer Client Server Grid FTP Local File System Remote File GNS File Multiplexer GRS Application Write, Read, etc Grid Buffer Client Server Grid FTP Local File System Remote File GNS File Multiplexer GRS GriddLeS Name Server (GNS)

Configuring an application GriddLeS Name Service stores configuration information on a particular application Set of entries Keyed on file name, machine name Different behaviour Open local file Open remote file Open replicated file (performance based sourcing) Open pipe between applications Locate cache file(s) No fixed number (or location) of GNSs

Grid Workflows … Workflow captures the linkage of constituent tasks together in a hierarchical fashion to build larger complex tasks. Workflow is concerned with the automation of procedures whereby files and data are passed between participants according to a defined set of rules to achieve an overall goal It is possible to build Grid workflows in which a number of otherwise independent legacy applications are run in a “pipeline” These workflows are called virtual applications and can run on virtual organizations The individual components process data from an arbitrary source ranging from - data bases, files, replicas, data from other processes - real time data from scientific instruments Kepler workflow system

Genomics: Promoter Identification Workflow Source: Matt Coleman (LLNL)

Ecology: GARP Analysis Pipeline for Invasive Species Prediction Training sample (d) GARP rule set (e) Test sample (d) Integrated layers (native range) (c) Species presence & absence points (native range) (a) EcoGrid Query Layer Integration Sample Data +A3 +A2 +A1 Calculation Map Generation Validation User Integrated layers (invasion area) (c) Species presence &absence points (invasion area) (a) Native range prediction map (f) Model quality parameter (g) Environmental layers (native range) (b) Generate Metadata Archive To Ecogrid Registered Ecogrid Database Environmental layers (invasion area) (b) Invasion area prediction map (f) Selected prediction maps (h) Source: NSF SEEK (Deana Pennington et. al, UNM)

Source: NIH BIRN (Jeffrey Grethe, UCSD)

DRAG and DROP Utilities from Actor and Director Libraries Sample Atmospheric Science Workflow DRAG and DROP Utilities from Actor and Director Libraries The Graph Editor consists of a Director library and an Actor Library Director: Governs the execution of a composite entity, model. Scheduling, dispatching threads, generate code etc … Actor: is an encapsulation of the parameterized actions. Building workflow is as simple as dragging actors from library

Kepler Directors Orchestrate Workflow Synchronous Data Flow Consumer actors not started until producer completes Files copied from producer to consumer. Process Networks All actors execute concurrently Communication through TCP/IP Sockets Dedicated IO IO modes produce different performance results.

Integrating Kepler & GriddLeS Application Write, Read, etc Grid Buffer Client Server Grid FTP Local File System Remote File GNS File Multiplexer SRB GriddLeS Name Server (GNS) Make Gridlet Actor Gridlet Run Application UPDATE GNS Atmospheric Science Workflow

Transparent Data Replication (SI5)

Transparent Replication Real time data Data Fusion General Circulation model Topography Database Regional weather model Vegetation Database Emissions Inventory Photo-chemical pollution model Particle dispersion model Bushfire model

The GriddLeS Replication Service GriddLeS Name Server Grid Buffer Client Local File Client Remote File Client Grid FTP Server Local File System Application GNS Client File Multiplexer Grid Buffer Server White, Read, etc GRS IO Network Monitor NWS Client NWS Server SRB Client GRS SRB Server GRS

Architecture of the GRS GriddLeS Name Server Grid Buffer Client Local File Client Remote File Client Grid FTP Server Local File System Application GNS Client File Multiplexer Grid Buffer Server White, Read, etc GRS IO Network Monitor SRB Server SRB Client NWS Client NWS Server GRS RLS Server RLS Client GRS

Architecture of the GRS GriddLeS Name Server Grid Buffer Client Local File Client Remote File Client Grid FTP Server Local File System Application GNS Client File Multiplexer Grid Buffer Server White, Read, etc GRS IO Network Monitor SRB Server SRB Client NWS Client NWS Server GRS RLS Server RLS Client GFarm Server GFarm Client GRS

Access to Metadata MD MD MD Application GriddLeS Name Server Grid Buffer Client Local File Client Remote File Client Grid FTP Server Local File System Application GNS Client File Multiplexer Grid Buffer Server White, Read, etc GRS IO Network Monitor MD SRB Server SRB Client NWS Client NWS Server GRS MD RLS Server RLS Client MD GFarm Server GFarm Client GRS

Active Data (SI6)

Active Data General Circulation model Topography Database Regional weather model Vegetation Database Emissions Inventory Photo-chemical pollution model Particle dispersion model Bushfire model

Active Data General Circulation model Topography Database Regional weather model Vegetation Database Emissions Inventory Photo-chemical pollution model Particle dispersion model Bushfire model

Active Data General Circulation model Topography Database Regional weather model Vegetation Database Emissions Inventory Photo-chemical pollution model Particle dispersion model Bushfire model

Active Data – File Fault GriddLeS Name Server Grid Buffer Client Local File Client Remote File Client Grid FTP Server Local File System Application GNS Client File Multiplexer Grid Buffer Server White, Read, etc GRS IO Network Monitor MD SRB Server SRB Client NWS Client NWS Server GRS MD RLS Server RLS Client MD GFarm Server GFarm Client GRS

Active Data – Resourcing Application White, Read, etc GRS IO Network Monitor Grid FTP Server Remote File Client MD SRB Server SRB Client NWS Client NWS Server Local File System Local File Client GRS MD RLS Server RLS Client Grid Buffer Server Grid Buffer Client MD GFarm Server GFarm Client GriddLeS Name Server GNS Client GRS File Multiplexer

Conclusion & Further work Leverage existing workflow systems Flexible IO model allows dynamic decisions Developing pool of applications Requires some software modification!

Acknowledgements CSIRO Division of Atmospheric Sciences John McGregor, Jack Katzfey and Martin Dix Funding & Support Australian Research Council Australian Government (DCITA, DEST) Hewlett Packard US National Science Foundation US Department of Energy

Questions? www.csse.monash.edu.au/~davida/griddles