SAN DIEGO SUPERCOMPUTER CENTER An Intelligent Rule-Oriented Data Management System DataGrid Wayne Schroeder San Diego Supercomputer Center, University.

Slides:



Advertisements
Similar presentations
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Advertisements

GFS OGF-22 Global Resource Naming Developers: Reagan Moore Arcot Mike.
OGF-23 iRODS Metadata Grid File System Reagan Moore San Diego Supercomputer Center.
The Storage Resource Broker and.
The Storage Resource Broker and.
Overview of the SDSC Storage Resource Broker Wayne Schroeder (and other SRB team members) May, 2004 San Diego Supercomputer Center, University of California.
Peter Berrisford RAL – Data Management Group SRB Services.
Data Grid: Storage Resource Broker Mike Smorul. SRB Overview Developed at San Diego Supercomputing Center. Provides the abstraction mechanisms needed.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids Reagan W. Moore San Diego Supercomputer Center.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid PPDG Data Handling System Reagan.
San Diego Supercomputer Center, University of California at San Diego Grid Physics Network (GriPhyN) University of Florida A Data Storage Language for.
San Diego Supercomputer Center NARA Research Prototype Persistent Archive Building Preservation Environments with Data Grid Technology (NARA Research Prototype.
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure1 Grid Based Solutions for Distributed Data Management Reagan.
A Very Brief Introduction to iRODS
Security Requirements for Shared Collections Storage Resource Broker Reagan W. Moore
Complex Systems Engineering for the Global information Grid Bob Marcus
Applying Data Grids to Support Distributed Data Management Storage Resource Broker Reagan W. Moore Ian Fisk Bing Zhu University of California, San Diego.
Jean-Yves Nief, CC-IN2P3 Wilko Kroeger, SCCS/SLAC Adil Hasan, CCLRC/RAL HEPiX, SLAC October 11th – 13th, 2005 BaBar data distribution using the Storage.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
Modern Data Management Overview Storage Resource Broker Reagan W. Moore
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Data Grid Interactions with Firewalls Michael Wan Reagan Moore SDSC/UCSD/NPACI.
National Partnership for Advanced Computational Infrastructure Digital Library Architecture Reagan Moore Chaitan Baru Amarnath Gupta George Kremenek Bertram.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Information Management and Distributed Data Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI.
Jan Storage Resource Broker Managing Distributed Data in a Grid A discussion of a paper published by a group of researchers at the San Diego Supercomputer.
Managing Simulation Output Storage Resource Broker Reagan W. Moore
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida Dataflows in SRB using SDSC Matrix Arun Jagatheesan Architect & Team.
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Data Grid Services/SRB/SRM & Practical Hai-Ning Wu Academia Sinica Grid Computing.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Rule-Based Programming for VORBs Bertram Ludaescher Arcot Rajasekar Data and Knowledge Systems San Diego Supercomputer Center U.C. San Diego.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure SRB + Web Services = Datagrid Management System (DGMS) Arcot.
Part Four: The LSC DataGrid Part Four: LSC DataGrid A: Data Replication B: What is the LSC DataGrid? C: The LSCDataFind tool.
Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Archive for the NSDL Reagan W. Moore Charlie Cowart.
SRB 1 & iRODS 2 Arcot Rajasekar Reagan Moore Mike Wan SDSC/UCSD Pathways to OOI-CI CyberData Architecture 1 Storage Resource Broker 2 integrated Rule Oriented.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore 1.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure1 Data Grids, Digital Libraries, and Persistent Archives Reagan.
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
SAN DIEGO SUPERCOMPUTER CENTER By: Roman Olschanowsky An Introduction to the.
GCRC Meeting 2004 BIRN Coordinating Center Software Development Vicky Rowley.
1 iRODS: A Rule Oriented Data ManagementSystem SRB Space.
Michael Doherty RAL UK e-Science AHM 2-4 September 2003 SRB in Action.
From SRB to IRODS: Policy Virtualization using Rule-Based Data Grids Reagan W. Moore Wayne Schroeder Arcot Rajasekar Mike Wan San Diego Supercomputer Center.
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
Introduction to The Storage Resource.
Biomedical Informatics Research Network The Storage Resource Broker & Integration with NMI Middleware Arcot Rajasekar, BIRN-CC SDSC October 9th 2002 BIRN.
SDSC Storage Resource Broker & Meta-data Catalog SRB Archives HPSS, ADSM, UniTree, DMF Databases DB2, Oracle, Sybase File Systems Unix, NT, Mac OSX Application.
Partnerships in Innovation: Serving a Networked Nation Grid Technologies: Foundations for Preservation Environments Portals for managing user interactions.
National Archives and Records Administration1 Integrated Rules Ordered Data System (“IRODS”) Technology Research: Digital Preservation Technology in a.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
The Storage Resource Broker and.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
Building Preservation Environments from Federated Data Grids Reagan W. Moore San Diego Supercomputer Center Storage.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
An Overview of iRODS Integrated Rule-Oriented Data System
Policy-Based Data Management integrated Rule Oriented Data System
Arcot Rajasekar Michael Wan Reagan Moore (sekar, mwan,
VORB Virtual Object Ring Buffers
Presentation transcript:

SAN DIEGO SUPERCOMPUTER CENTER An Intelligent Rule-Oriented Data Management System DataGrid Wayne Schroeder San Diego Supercomputer Center, University of California San Diego

SAN DIEGO SUPERCOMPUTER CENTER Talk Outline Background Brief Overview of the SDSC SRB Current Projects/Usage Activities/Plans Rule-Oriented Data Management System iRODS Requirements/Planning Architecture Infrastructure Development Collaborations/Plans

SAN DIEGO SUPERCOMPUTER CENTER Data Grid Using a Data Grid – in Abstract Ask for data User asks for data from the data grid Data delivered The data is found and returned Where & how details are hidden

SAN DIEGO SUPERCOMPUTER CENTER User asks for data Using a Data Grid - Details Storage Resource Broker Data request goes to SRB Server Storage Resource Broker Metadata Catalog DB Server looks up data in catalogCatalog tells which SRB server has data1 st server asks 2 nd for dataThe data is found and returned

SAN DIEGO SUPERCOMPUTER CENTER Using a Data Grid - Details SRB MCAT DB SRB Data Grid has arbitrary number of servers Complexity is hidden from users

SAN DIEGO SUPERCOMPUTER CENTER Storage Resource Broker A Data Grid Solution Collaborative client-server system that federates distributed heterogeneous resources using uniform interfaces and metadata Provides a simple tool to integrate data and metadata handling – attribute-based access Blends browsing and searching Developed at SDSC -Operational for 7+ years; -Under continual development since 1997; -Customer-driven

SAN DIEGO SUPERCOMPUTER CENTER Some SRB Features The SRB is an integrated solution which includes: a logical namespace, interfaces to a wide variety of storage systems, high performance data movement (including parallel I/O), fault-tolerance and fail-over, WAN-aware performance enhancements (bulk operations), storage-system-aware performance enhancements ('containers' to aggregate files), metadata ingestion and queries (a MetaData Catalog (MCAT)), user accounts, groups, access control, audit trails, GUI administration tool data management features, replication user tools (including a Windows GUI tool (inQ), a set of SRB Unix commands, and Web (mySRB)), and APIs (including C, C++, Java, and Python). SRB Scales Well (many millions of files, terabytes) Supports Multiple Administrative Domains / MCATs (srbZones) And includes SDSC Matrix: SRB-based data grid workflow management system to create, access and manage workflow process pipelines.

SAN DIEGO SUPERCOMPUTER CENTER Recent SRB Release, April 28 Any valid ASCII characters are now acceptable in SRB filenames, except a string of two quotes in a row Data integrity and vault management Quota System SRB Web Perl Portal SRB account management via grid-mapfile Real time data management New driver for NCAR MSS Completely reworked web site/documentation system (MediaWiki) Other new features Critical bug patches for in included Other bugzilla fixes (about 35) MCAT Patch

SAN DIEGO SUPERCOMPUTER CENTER Recent SRB Releases April 28, October 31, April 6, February 18, August 13, July 2, April 19, December 19, October 1, August 12, July 14, June 3, May 1, March 14, February 18, 2003

SAN DIEGO SUPERCOMPUTER CENTER SRB Projects Astronomy National Virtual Observatory Data Grids UK e-Science CCLRC Teragrid Digital Libraries and Archives National Archives and Records Administration National Science Digital Library Persistent Archive Testbed Ecological, Environmental, Oceanographic ROADnet Southern California Earthquake Center SIO Digital Libraries Molecular Sciences Synchrotron Data Repository Alliance for Cellular Signaling Neuro Sciences Biomedical Information Research Network Physics and Chemistry BaBar Many others Over 650 Tera Bytes in 106 million files

SAN DIEGO SUPERCOMPUTER CENTER SRB Scalability Over 2 Petabytes World-wide Major SRB instances in the UK, Australia, Taiwan, US United Kingdom - UK e-Science Australia - APAC Taiwan - Academia Sinica, NCHC Europe -IN2P3, Italy, Norway United States 660 Terabytes at SDSC 100 Million files SAM QFS, HPSS, Unix file system, SRB Bricks

SAN DIEGO SUPERCOMPUTER CENTER SDSC Hosted SRB Data

SAN DIEGO SUPERCOMPUTER CENTER Case Study: SRB in BIRN BIRN Toolkit Mediator Viewing/Visualization Queries/ResultsApplications Data Management File System MCAT HPSS Data Model Data Access Data Grid Computational Grid Collaboration NMI Grid Management Globus GridPort Scheduler Distributed Resources Database SRB Database

SAN DIEGO SUPERCOMPUTER CENTER SRB server SRB agent SRB server Federated SRB Operation MCAT Read Application in Boston SRB agent /6 Logical Name Or Attribute Condition 1.Logical-to-Physical mapping 2. Identification of Replicas 3.Access & Audit Control Peer-to-peer Brokering Server(s) Spawning Data Access Parallel Data Access R1 R2 San Diego Durham

SAN DIEGO SUPERCOMPUTER CENTER SDSC Storage Resource Broker & Meta-data Catalog SRB Archives HPSS, ADSM, UniTree, DMF Databases DB2, Oracle, Sybase File Systems Unix, NT, Mac OSX Application C, C++, Linux I/O Unix Shell Dublin Core Resource, User Defined Application Meta-data Remote Proxies DataCutter Third-party copy Java, NT Browsers Web Prolog Predicate MCAT HRM

SAN DIEGO SUPERCOMPUTER CENTER IRODS - the Next Generation of Data Grid Technology

17 Moving Forward, a Two-Prong Plan Maintain and Adapt SRB to New Usages: SRB has reached a Stable Plateau Bug Fixes Some New Features Merge Features Developed by others Continue Testing Improve Documentation Continue Application Support Existing and new Projects Continue Answering User Queries Chart New Areas Federation Research - ZoneSRB Collaborative Data Grids Real-time Data Grids - Virtual Object Ring Buffer Sensors and Video Streams Collaborating Observatories SRB Workflows - New UI for Admins and users Kepler actors, Matrix, etc iRODS - Adaptive Middleware Architecture MCAT1 MCAT2 MCAT3 Server1.1 Server1.2 Server2.1 Server2.2 Server3.1

SAN DIEGO SUPERCOMPUTER CENTER Continuing SRB Support 10 FTEs SRB 5 FTEs iRODS iRODS Developers Support SRB

19 Next generation Data Architecture SRB is quite complex – with too many functions and operations The intelligence is hard-coded extensions/modifications require extreme care But, the modules are fairly robust and reusable AIM: Can we make SRB more flexible Easy to customize at finer level Example: Higher authentication for a particular collection Example: Can we use stricter authorization for a collection Example: Can we treat a particular resource differently Currently- needs code changes Solution: Use rule-based architecture to provide flexibility

SAN DIEGO SUPERCOMPUTER CENTER iRODS A New Paradigm in Middleware Development Flexible Collection management Can be customized at user/collection-levels, … Language for Collection management As in stored procedures, triggers (RDB) Administrative ease Lot of potential beyond SRB adaptive middleware architectures This will be a fully Open Source effort

SAN DIEGO SUPERCOMPUTER CENTER Rule-Oriented Data Systems Framework Resources Client InterfaceAdmin Interface Metadata Modifier Module Config Modifier Module Rule Modifier Module Consistency Check Module Confs Rule Base Meta Data Base Engine Rule Curren t State Rule Invoker Micro Service Modules Resource-based Services Micro Service Modules Metadata-based Services Service Manager Consistency Check Module Consistency Check Module

SAN DIEGO SUPERCOMPUTER CENTER Rule Checking Establish State Data Movement CleanUp Condition checking, rule firing Backend Processing Micro Services Setup state and interact with RCAT – updates and modifications to persistent state Cleanup state and interact with RCAT – updates and modifications to persistent state Client Operation such as srbObjCreate Server-side Client-side Rule-oriented Data System (Phase I Operational Model)

23 Rules and Constraints Rule-based Lower-level Functions are composed of micro-services Higher-level Functions are composed of rules of lower-level micro- services Rules are interpreted using a rule engine Customizability Problems with rule composition Integrity checks to make sure rules do not break higher-level functionalities Declarative programming Rules define semantics Operational programming Rule invocation provides procedural interpretation Rules can be used as “checks and balances” to make sure that collections are self-consistent Example: Rule makes two copies of each files Constraint checking: can be used to see if the collection is consistent with this rule

24 Rule Scalability and Decidability Distinct Sets of Rules Applied in Different Ways Atomic Deferred (state flags) Compound Applied Using Micro-services Granularity User Input to Influence Rule Expression Administration Enforcement Collection Consistency Management Rule Properties Metadata Managing Execution (granularity, periodicity) Metadata Defining Result of Rule Execution

25 ingestInCollection(S) :- /* store & backup */ chkCond1(S), ingest(S), register(S) findBackUpRsrc(S.Coll, R), replicate(S,R). ingestInCollection(S) :- /*store & check */ chkCond2(S),computeClntChkSum(S,C1), ingest(S), register(S), computeSerChkSum(S,C2), checkAndRegisterChkSum(C1,C2,S). ingestInCollection(S) :- /* store, chk, backup & chk */ chkCond3(S),computeClntChkSum(S,C1), ingest(S), register(S), computeSerChkSum(S,C2), checkAndRegisterChkSum(C1,C2,S), findBackUpRsrc(S.Coll, R), replicate(S,R) computeSerChkSum(S,C3), checkAndRegisterChkSum(C2,C3,S). ingestInCollection(S) :- /*store,check, backup & extract metadata */ chkCond4(S),computeClntChkSum(S,C1), ingest(S), register(S), computeSerChkSum(S,C2), checkAndRegisterChkSum(C1,C2,S), findBackUpRsrc(S.Coll, R), [replicate(S,R) || extractRegisterMetadata(S)]. ingestInCollection(S) :- /* just store */ ingest(S), register(S). Sample Rules chkCond1(S) :- user(S) == chkCond1(S) :- coll(S) like ‘*/scec.sdsc/img/*’. chkCond2(S) :- user(S) == chkCond3(S) :- user(S) == chkCond4(S) :- user(S) == datatype(S) == ‘DICOM’. [OprList] implies delay for later or send to a CronJobManager Opr||Opr implies do them in parallel Opr, Opr implies do them serially

SAN DIEGO SUPERCOMPUTER CENTER New DataGrid Technology Next Generation SRB -- iRODS: Intelligent Rule-Oriented Data Systems Customizable and Flexible – User Configurable Administratively Simpler – Admin Configurable Build upon the experience of SRB Data Grid Transition from SRB to iRODS Client-level similarity Meta Catalog transition Current NSF Funding Information Technology Research 2 years ~ 2 FTEs Simple proto-type in a year Started September 2004 Rule-based architecture Follow-on funding NARA NSF

SAN DIEGO SUPERCOMPUTER CENTER iRODS Collaborations SRB/iRODS Developers Arcot Rajasekar Michael Wan Wayne Schroeder Other SRB Team Members Collaborative Development UK e-Science University of Queensland University of Maryland Others