San Diego Supercomputer Center University of California, San Diego

Slides:



Advertisements
Similar presentations
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Advertisements

OGF-23 iRODS Metadata Grid File System Reagan Moore San Diego Supercomputer Center.
Abstraction Layers Why do we need them? –Protection against change Where in the hourglass do we put them? –Computer Scientist perspective Expose low-level.
Service Oriented Architecture for Mobile Applications Swarupsingh Baran University of North Carolina Charlotte.
Database System Concepts and Architecture
Data Grid: Storage Resource Broker Mike Smorul. SRB Overview Developed at San Diego Supercomputing Center. Provides the abstraction mechanisms needed.
San Diego Supercomputer Center, University of California at San Diego Grid Physics Network (GriPhyN) University of Florida A Data Storage Language for.
As computer network experiments increase in complexity and size, it becomes increasingly difficult to fully understand the circumstances under which a.
G O B E Y O N D C O N V E N T I O N WORF: Developing DB2 UDB based Web Services on a Websphere Application Server Kris Van Thillo, ABIS Training & Consulting.
NextGRID & OGSA Data Architectures: Example Scenarios Stephen Davey, NeSC, UK ISSGC06 Summer School, Ischia, Italy 12 th July 2006.
Ch 12 Distributed Systems Architectures
Chapter 4 Database Management Systems. Chapter 4Slide 2 What is a Database Management System (DBMS)?  Database An organized collection of related data.
Architecture of Grid File System (GFS) - Based on the outline draft - Arun swaran Jagatheesan San Diego Supercomputer Center Global Grid Forum 11 Honolulu,
TIBCO Designer TIBCO BusinessWorks is a scalable, extensible, and easy to use integration platform that allows you to develop, deploy, and run integration.
Chapter 1 Overview of Databases and Transaction Processing.
Data Integration in Service Oriented Architectures Rahul Patel Sr. Director R & D, BEA Systems Liquid Data – XML-based data access and integration for.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida Programming Gridflows using Matrix Arun Jagatheesan Architect, SDSC.
Knowledge based Learning Experience Management on the Semantic Web Feng (Barry) TAO, Hugh Davis Learning Society Lab University of Southampton.
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida Dataflows in SRB using SDSC Matrix Arun Jagatheesan Architect & Team.
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure SRB + Web Services = Datagrid Management System (DGMS) Arcot.
Designing the Architecture for Grid File System (GFS) Arun swaran Jagatheesan San Diego Supercomputer Center Global Grid Forum 12 Brussels, Belgium.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Archive for the NSDL Reagan W. Moore Charlie Cowart.
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida DGL: The Assembly Language for Grid Computing Arun swaran Jagatheesan.
Microsoft Virtual Academy. STANDARDIZATION SELF SERVICEAUTOMATION Give Customers of IT services the ability to identify, access and request services.
San Diego Supercomputer Center iRODS DGMS Towards Data Grid Standard Implementations Arun Jagatheesan San Diego Supercomputer Center Open.
Introduction to The Storage Resource.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for.
A Demonstration of Collaborative Web Services and Peer-to-Peer Grids Minjun Wang Department of Electrical Engineering and Computer Science Syracuse University,
Copyright © 2004, Keith D Swenson, All Rights Reserved. OASIS Asynchronous Service Access Protocol (ASAP) Tutorial Overview, OASIS ASAP TC May 4, 2004.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
The Storage Resource Broker and.
Grid File System Working Group SAGA and GFS-WG Grid File System Working Group (GFS-WG) Global Grid Forum (GGF)
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Chapter 1 Overview of Databases and Transaction Processing.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
Software Architecture Patterns (3) Service Oriented & Web Oriented Architecture source: microsoft.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Maintaining and Searching Metadata Mario Antonioletti, Shannon Hastings, Peter Kunszt, Stephen Langella, Simon Laws, Susan Malaika, Gavin McCance, Alex.
Service Oriented Architecture (SOA) Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
Grid File System WG & Architecture
Chapter 2 Database Environment.
GFS-WG: Informal Status Report
Flexible Extensible Digital Object Repository Architecture
OGSA Data Architecture Scenarios
Flexible Extensible Digital Object Repository Architecture
Designing the Architecture for Grid File System (GFS)
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Principles of GIS Fundamental database concepts Shaowen Wang
Chapter 2 Database Environment Pearson Education © 2009.
Chapter 2 Database Environment.
Database Design Hacettepe University
Information Services for Dynamically Assembled Semantic Grids
Architecture of Grid File System (GFS) - Based on the outline draft -
Chapter 2 Database Environment Pearson Education © 2009.
New Tools In Education Minjun Wang
Chapter 2 Database Environment Pearson Education © 2009.
Presentation transcript:

San Diego Supercomputer Center University of California, San Diego What is SRB Matrix? Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of California, San Diego VLDB Workshop on Data Management in Grids Trondheim, Norway, 2-3 September 2005

Talk Outline Data grid Landscape Long-run data management processes Data Grid ILM Data Grid Triggers Dataflow Pipelines Execution Logic – Data Grid Language End-to-End Infrastructure Deployment API User GUI Service-oriented *Infrastructure*

Data Grid Landscape

The “Grid” Vision complexity in getting electrical grid (transmission, fluctuation etc) but once done plug-n-play devices to grid; similary … grids in computing.

Data Grid Resource Providers I start with by introducing the reality. We have some data and storage on some disk/ hierarchical resource. IT can have its own directories or physical names for the data. These are Grid Resource providers from the GFS perspective. Later these resource providers could be other sources like service registries or any thing dealing with a naming system or catalog. It is highly tempting to say that GFS Resource providers are nothing but SRM – at least looks for me to be so. But please refrain from doing that. We need to have the GSM also agree to this picture. Grid Resource Providers (GRP) providing content and/or storage /txt3.txt GRP GRP

Data Grid Administrative Domain Administrative domain with one or more GFS Resource Providers Could include their data centers Research Lab Multiple such resource provides could be present in an organization or autonomous administrative domain as we called before. /txt3.txt GRP GRP

Data Grid Administrative domains University data + storage (10) Storage-R-Us Resource Providers data + storage (50) Research lab- Taiwan data + storage (40) There could be multiple administrative domains like this that makes up an virtual enterprise or what we call as the Grid GRP /txt3.txt GRP GRP GRP /…/text1.txt /…//text2.txt

Data Grid (Enterprise Utility) Physical Resources managed by autonomous administrative domains of the same enterprise (ABCZ.com) IT Department US IT Department Asia 3rd Party ABCZ.com US Data center ABCZ.com Asia

Data Grid (Enterprise Utility) Each project has a data grid instance consisting of Logical Resources with different SLAs offered by IT department Project 1 Project 2 IT Department US IT Department Asia 3rd Party ABCZ.com US Data center ABCZ.com Asia

Data Grid (Enterprise Utility) Project1 Project2 Project3 Project4 IT Department US IT Department Asia 3rd Party ABCZ.com US Data center ABCZ.com Asia

Long-run Processes in Data Grid Data Grid ILM Data Grid Triggers Data Gridflows

Data Grid ILM

Change is Constant Changes in access patterns Data Value Based on number of users accessing a data Domains which want to access data Data Value The value of data set (collections?) for a particular domain based on it business model and users’ access patterns Each domain will have a different value based on its users and its role in a data grid

“Data Value” based on users When more users access a project’ data, its data value increases, move that data to a faster storage type Project1 Project2 Project3 Project4 IT Department US IT Department Asia 3rd Party ABCZ.com US Data center ABCZ.com Asia

“Data Value” based on domain When more users from the same domain access the data, the data value for that particular data in that particular domain increases, so replicate the data to resources in that domain. (converse is also true) Project1 Project2 Project3 Project4 IT Department US IT Department Asia 3rd Party ABCZ.com US Data center ABCZ.com Asia

“Data Value” based on role The 3rd party data center – no users who use data, but is interested in having replica of any data (or deleted data) for long term preservation Project1 Project2 Project3 Project4 IT Department US IT Department Asia 3rd Party ABCZ.com US Data center ABCZ.com Asia

Data Grid ILM ILM = Information Lifecycle Management Dynamic re-orientation of data placement and data retention policies (rules) Based on “business value of data” and storage cost HSM = Hierarchical Storage Management, based on “data freshness”. ILM goes one step further Applying this concept on Data Grid, very tricky as different autonomous domains have different business rules

Data Grid Triggers

Data Grid Triggers Similar to triggers in databases Based on ECA concepts Event Condition Action Example Event = Insert new file in collection (“/ourProject/data”) Condition = (color= “blue” && galaxy = “Andromedia”) Action = Run ( selectiveDataReplicator.dgl )

Data  Discovery New data Digital entities Meta-data Services State updates relationships among data in collections Meta-data Services invoked to analyze new relationships Services DGMS applications get notified of state updates Generalizing the pipelines used in all these projects Heavy use of Animation: Take care not to press <enter> or click to many times as it will skip the next slides State

Data Gridflows

Gridflow in SCEC (data  information pipeline) Metadata derivation Pipeline could be triggered by input at data source or by a data request from user Ingest Data Ingest Metadata Determine analysis pipeline Initiate automated analysis Use the optimal set of resources based on the task – on demand Organize result data into distributed data grid collections All gridflow activities stored for data flow provenance

Data Grid Language (DGL)

Data Grid Language Requirement Analogy of SQL in relational databases Data Grid ILM process The long run process that has to be run is described in DGL Data Grid Triggers Action part of the ECA (Event-Condition-Action) logic Data Gridflows Step by step execution of long run process on Data Grid Analogy of SQL in relational databases Long-run process procedures stored and executed in Data Grid it self Captures the “Infrastructure Execution Logic”

DGL Request Annotations about the Data Grid Request Can be either a Flow or a Status Query

DGL Requests (2 types) Data Grid Flow Status Query An XML Structure that describes the execution logic, associated procedural rules and DGL variables. Can be synchronous or asynchronous flow Status Query An XML Structure used to query the execution status any gridflow or a sub-flow at any granular level. Status Queries can be made for both synchronous and asynchronous flows

Flow Scoped Variables that can control the flow Logic used by the sub-members Sub-members that are the real execution statements

Flow Logic (How a flow executes)

… <userDefinedRule name="beforeEntry"> <condition> <simpleQuery>$numVar == 1</simpleQuery> </condition> <action name="true"> <actionString>SET var1 = 1</actionString> </action> <actionString>SET var2 = "foo"</actionString> <action name="false"> <actionString>SET var1 = 0</actionString> </userDefinedRule>

What is SRB Matrix? Matrix provides the SRB as a Web Service Web Service based on Data Grid Language SOA for Data Grid or Digital Library Service oriented *infrastructure* Asynchronous end-user facing applications Long run operations presented to users as portlets Data Grid Automation and ILM File Triggers on unstructured data Automated movement or management of data

Matrix Gridflow Server Architecture JAXM Wrapper WSDL Description JMS Messaging Interface Event Publish Subscribe, Notification SOAP Service for Matrix Clients Matrix Data Grid Request Processor Sangam P2P Gridflow Broker and Protocols Transaction Handler Status Query Handler Workflow Query Processor Flow Handler and Execution Manager XQuery Processor ECA rules Handler Gridflow Meta data Manager Persistence (Store) Abstraction Matrix Agent Abstraction SDSC SRB Agents Other SDSC Data Services Agents for java, WSDL and other grid executables JDBC In Memory Store

Conclusion Data Grids are evolving Data Grid Automation of long-run processes essential Need a language for Data Grid Automation Data Grid Language is one such effort as part SRB Matrix Project Open source project for anyone to use (or join) talk2matrix@sdsc.edu (or arun@sdsc.edu)