San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for.

Slides:



Advertisements
Similar presentations
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Advertisements

GFS OGF-22 Global Resource Naming Developers: Reagan Moore Arcot Mike.
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Database System Concepts and Architecture
Pegasus on the Virtual Grid: A Case Study of Workflow Planning over Captive Resources Yang-Suk Kee, Eun-Kyu Byun, Ewa Deelman, Kran Vahi, Jin-Soo Kim Oracle.
Data Grid: Storage Resource Broker Mike Smorul. SRB Overview Developed at San Diego Supercomputing Center. Provides the abstraction mechanisms needed.
San Diego Supercomputer Center Self-organizing Smart Namespaces : Next Generation Data Grid Systems Arun Jagatheesan iRODS.org.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid PPDG Data Handling System Reagan.
San Diego Supercomputer Center, University of California at San Diego Grid Physics Network (GriPhyN) University of Florida A Data Storage Language for.
Background Chronopolis Goals Data Grid supporting a Long-term Preservation Service Data Migration Data Migration to next generation technologies Trust.
Data Grids Jon Ludwig Leor Dilmanian Braden Allchin Andrew Brown.
Nadia Ranaldo - Eugenio Zimeo Department of Engineering University of Sannio – Benevento – Italy 2008 ProActive and GCM User Group Orchestrating.
Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center.
6th Biennial Ptolemy Miniconference Berkeley, CA May 12, 2005 Distributed Computing in Kepler Ilkay Altintas Lead, Scientific Workflow Automation Technologies.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
An Agent-Oriented Approach to the Integration of Information Sources Michael Christoffel Institute for Program Structures and Data Organization, University.
Architecture of Grid File System (GFS) - Based on the outline draft - Arun swaran Jagatheesan San Diego Supercomputer Center Global Grid Forum 11 Honolulu,
Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
Katanosh Morovat.   This concept is a formal approach for identifying the rules that encapsulate the structure, constraint, and control of the operation.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida Programming Gridflows using Matrix Arun Jagatheesan Architect, SDSC.
 DATABASE DATABASE  DATABASE ENVIRONMENT DATABASE ENVIRONMENT  WHY STUDY DATABASE WHY STUDY DATABASE  DBMS & ITS FUNCTIONS DBMS & ITS FUNCTIONS 
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida Dataflows in SRB using SDSC Matrix Arun Jagatheesan Architect & Team.
Data R&D Issues for GTL Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego Bertram Ludäscher
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
Grid Service  Grid Webservice Arun Jagatheesan San Diego Supercomputer Center/ University of Florida.
Towards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering Nithya N. Vijayakumar, Beth Plale DDE Lab, Indiana University {nvijayak,
San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Data Grid Management Systems (DGMS) Arun Jagatheesan San Diego Supercomputer Center
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure SRB + Web Services = Datagrid Management System (DGMS) Arcot.
Designing the Architecture for Grid File System (GFS) Arun swaran Jagatheesan San Diego Supercomputer Center Global Grid Forum 12 Brussels, Belgium.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Archive for the NSDL Reagan W. Moore Charlie Cowart.
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida DGL: The Assembly Language for Grid Computing Arun swaran Jagatheesan.
Pegasus: Running Large-Scale Scientific Workflows on the TeraGrid Ewa Deelman USC Information Sciences Institute
Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore 1.
San Diego Supercomputer Center iRODS DGMS Towards Data Grid Standard Implementations Arun Jagatheesan San Diego Supercomputer Center Open.
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure1 Data Grids, Digital Libraries, and Persistent Archives Reagan.
SAN DIEGO SUPERCOMPUTER CENTER By: Roman Olschanowsky An Introduction to the.
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
Introduction to The Storage Resource.
International Symposium on Grid Computing (ISGC-07), Taipei - March 26-29, 2007 Of 16 1 A Novel Grid Resource Broker Cum Meta Scheduler - Asvija B System.
San Diego Supercomputer Center, University of California at San Diego Grid Physics Network (GriPhyN) University of Florida Data Grid and Gridflow Management.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
National Archives and Records Administration1 Integrated Rules Ordered Data System (“IRODS”) Technology Research: Digital Preservation Technology in a.
Grid File System Working Group SAGA and GFS-WG Grid File System Working Group (GFS-WG) Global Grid Forum (GGF)
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
Copyright © 2004 R2AD, LLC Submitted to GGF ACS Working Group for GGF-16 R2AD, LLC Distributing Software Life Cycles Join the ACS Team GGF-16, Athens R2AD,
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
Managing Data Resources File Organization and databases for business information systems.
Lessons from LEAD/VGrADS Demo Yang-suk Kee, Carl Kesselman ISI/USC.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Data R&D Issues for GTL Bertram Ludäscher Data and Knowledge Systems
Designing the Architecture for Grid File System (GFS)
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Principles of GIS Fundamental database concepts Shaowen Wang
Chapter 2 Database Environment Pearson Education © 2009.
San Diego Supercomputer Center University of California, San Diego
Chapter 2 Database Environment Pearson Education © 2009.
Presentation transcript:

San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure University of Florida DGL: The Assembly Language for Grid Computing Arun swaran Jagatheesan San Diego Supercomputer Center (SDSC) University of California, San Diego GriPhyN All Hands Meeting May 17, 2004, University of Chicago

San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 2 Acknowledgement Participants Jonathan Weinberg Allen Ding Dipti Borkar Erik Vandekieft Reena Mathew Marcio Faerman (SCEC) Lucas Gilbert (BIRN) Good-will Wishers Reagan Moore and SDSC SRB Team Kim Baldridge You !!! Jonathan Weinberg Allen Ding Dipti Borkar Also an out-sourced resource from the Gator’s Physics department – Thanks to Paul Avery for this important resource

San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 3 Talk Outline Problem : Gridflow Description and Querying Gridflow Description Gridflow Language Requirements Options Path we took Our success Our future Summary

San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 4 SRB Data Grid Management Systems Southern California Earthquake Center NASA Data Grids NIH Biomedical Informatics Research Network This work is generic and not restricted to SRB alone National Science Digital Library Scripps Institute of Oceanography

San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 5 Gridflow in SCEC (data  information pipeline) Metadata derivation Ingest Metadata Ingest Data Determine analysis pipeline Initiate automated analysis Organize result data into distributed data grid collections Use the optimal set of resources based on the task – on demand Pipeline could be triggered by input at data source or by a data request from user All gridflow activities stored for data flow provenance

San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 6 Data  Discovery Digital entities Meta-data Services State New data updates relationships among data in collections Services invoked to analyze new relationships DGMS applications get notified of state updates Digital entities Meta-data Services State

San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 7 Data  Discovery Digital entities Meta-data Services State New data updates relationships among data in collections Services invoked to analyze new relationships DGMS applications get notified of state updates Digital entities Meta-data Services State

San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 8 What they want? We know the business (scientific) process CyberInfrastructure is all we care (why bother about colliding atoms)

San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 9 What they want? Use DGL to describe your process logic with abstract references to datagrid infrastructure dependencies Describe resource, site, VO or grid policy dependencies independently (UPL, CVF??)

San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 10 Gridflows Grid Workflow (Gridflow) is the automation of a execution pipeline in which data or tasks are processed through multiple autonomous grid resources according to a set of procedural rules Gridflows are executed on resources that are dynamically obtained through confluence of one or more autonomous administrative domains (peers)

San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 11 Gridflow Language and CS Domains Compiler Design Variable scope definition, Recursive Grammar, Execution Stack Management, Data Modeling Schema definitions for gridflow patterns Grid Computing Data Grid data types, Virtual Organization, basic operations, … Other concepts and Standards Rules, W3C XQuery, GGF JSDL?

San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 12 Gridflow Language Requirements High level Abstract descriptions Abstract description of cyberinfrastructure dependencies Simple yet flexible Flexible to describe complex requirements (no brute force) Gridflow dependency patterns Based on execution structure and data semantics (Parallel, Sequential, fork-new), (milestones, for-each, switch-case).. Asynchronous execution For long-run requests Querying using existing standard XQuery

San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 13 Gridflow Language Requirements Process meta data and annotations Runtime definition, update and querying of meta-data Runtime Management of Gridflows Stop gridflow at run time Partitioning Facility in language to divide a gridflow request to multiple requests Import descriptions Refer other gridflows in execution

San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 14 Data Grid Language (DGL) DGL is just a language specification Can be used in any commercial or academic data grid software DGL describes gridflow description and dependencies

San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 15 Gridflow Process I End User using DGBuilder Gridflow Description Data Grid Language

San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 16 Gridflow Process II Abstract Gridflow using Data Grid Language Planner Concrete Gridflow Using Data Grid Language

San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 17 Gridflow Process III Gridflow P2P Network Gridflow Processor Concrete Gridflow Using Data Grid Language

San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 18 DGL - Hypothetical Picture DGL Compiler (at run time – late binding) SRB Operation GridFTP Operation Condor execution DAG TeraGrid Scheduler node? Capone? …

San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 19 DGL Structure (data model) Runnable Pre-Process Post-Process ECA Rule based definitions Meta-data Flow Logic Structure Structure – parallel, sequential etc., Recursive definition of runnables as either data operation or as a executable process (Job)

San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 20 Operations in DGL Execute Process (DAG, java, WSDL, etc) Very generic Datagrid operations Copy directories/files Change Permissions (Chmod) Create directory/file/archive Delete directory/file/archive Ingest/download URl or any data source Replicate, Rename, List SeekNWrite, SeekNRead Ingest, Query Any type of Metadata

San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 21 Components of DGL DGL document is either a request or a response Data Grid Request Could be a Flow (aggregation of operations) Or could be a Status Query Data Grid Response Could be a Flow Acknowledgement Or could be a Status Response Can be made Synchronous or Asynchronous Flexibility for any type of Implementation

San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 22 Summary A standard description language is Needed Requirements of the language Data Grid Language (DGL) Recursive definition of flows and steps Metadata or variable scopes Rules Can be partitioned (sub-divided) Components of Data Grid Language Next step: Talk to Scheduling or Heuristics people