San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure Storage Resource Broker Reagan W. Moore San Diego Supercomputer.

Slides:



Advertisements
Similar presentations
National University Community Research Institute (NUCRI) NU Community Research Institute (NUCRI) HASTAC (higher education)/HASS grid National School Board.
Advertisements

Building Shared Collections Using the Storage Resource Broker Storage Resource Broker Reagan W. Moore
3 September 2004NVO Coordination Meeting1 Grid-Technologies NVO and the Grid Reagan W. Moore George Kremenek Leesa Brieger Ewa Deelman Roy Williams John.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
GFS OGF-22 Global Resource Naming Developers: Reagan Moore Arcot Mike.
OGF-23 iRODS Metadata Grid File System Reagan Moore San Diego Supercomputer Center.
The Storage Resource Broker and.
The Storage Resource Broker and.
Overview of the SDSC Storage Resource Broker Wayne Schroeder (and other SRB team members) May, 2004 San Diego Supercomputer Center, University of California.
Peter Berrisford RAL – Data Management Group SRB Services.
Digital Preservation Lifecycle Management Building a demonstration prototype for the preservation of large-scale multi-media collections Arcot Rajasekar.
San Diego Supercomputer Center & NPACIAHM March 2002 Storage Resource Broker Case Studies George Kremenek
SACNAS, Sept 29-Oct 1, 2005, Denver, CO What is Cyberinfrastructure? The Computer Science Perspective Dr. Chaitan Baru Project Director, The Geosciences.
Data Grid: Storage Resource Broker Mike Smorul. SRB Overview Developed at San Diego Supercomputing Center. Provides the abstraction mechanisms needed.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure Data Grids, Digital Libraries, and Persistent Archives ESIP.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids Reagan W. Moore San Diego Supercomputer Center.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids, Digital Libraries and Persistent Archives Reagan.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid PPDG Data Handling System Reagan.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure Integration of Data Grids, Digital Libraries, and Persistent.
San Diego Supercomputer Center NARA Research Prototype Persistent Archive Building Preservation Environments with Data Grid Technology (NARA Research Prototype.
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure1 Grid Based Solutions for Distributed Data Management Reagan.
Federating Archives in the DELAMAN Network Reagan W. Moore San Diego Supercomputer Center Storage Resource.
Security Requirements for Shared Collections Storage Resource Broker Reagan W. Moore
VL-e PoC Introduction Maurice Bouwhuis VL-e work shop, April 7 th, 2006.
Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center.
Applying Data Grids to Support Distributed Data Management Storage Resource Broker Reagan W. Moore Ian Fisk Bing Zhu University of California, San Diego.
Modern Data Management Overview Storage Resource Broker Reagan W. Moore
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
UNIVERSITY of MARYLAND GLOBAL LAND COVER FACILITY High Performance Computing in Support of Geospatial Information Discovery and Mining Joseph JaJa Institute.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
Why Build Image Mosaics for Wide Area Surveys? An All-Sky 2MASS Mosaic Constructed on the TeraGrid A. C. Laity, G. B. Berriman, J. C. Good (IPAC, Caltech);
National Partnership for Advanced Computational Infrastructure Digital Library Architecture Reagan Moore Chaitan Baru Amarnath Gupta George Kremenek Bertram.
MCAT: A Metadata Catalog San Diego Supercomputing Center Part of the Storage Resource Broker (SRB)
Data Grids and Data Management Storage Resource Broker Reagan W. Moore
Managing Simulation Output Storage Resource Broker Reagan W. Moore
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
National Partnership for Advanced Computational Infrastructure Collection-based Persistent Archives Reagan W. Moore Associate Director, Data Intensive.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Data Grid Services/SRB/SRM & Practical Hai-Ning Wu Academia Sinica Grid Computing.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure SRB + Web Services = Datagrid Management System (DGMS) Arcot.
Kurt Mueller San Diego Supercomputer Center NPACI HotPage Updates.
Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Archive for the NSDL Reagan W. Moore Charlie Cowart.
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure1 Data Grids, Digital Libraries, and Persistent Archives Reagan.
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
SAN DIEGO SUPERCOMPUTER CENTER By: Roman Olschanowsky An Introduction to the.
Michael Doherty RAL UK e-Science AHM 2-4 September 2003 SRB in Action.
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
Introduction to The Storage Resource.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for.
Biomedical Informatics Research Network The Storage Resource Broker & Integration with NMI Middleware Arcot Rajasekar, BIRN-CC SDSC October 9th 2002 BIRN.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Interlib Technology Integration Reagan.
SDSC Storage Resource Broker & Meta-data Catalog SRB Archives HPSS, ADSM, UniTree, DMF Databases DB2, Oracle, Sybase File Systems Unix, NT, Mac OSX Application.
National Archives and Records Administration1 Integrated Rules Ordered Data System (“IRODS”) Technology Research: Digital Preservation Technology in a.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
The SMB Archive System: Data Backup Across the Web Kenneth R. Sharp Stanford Synchrotron Radiation Laboratory.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Building Preservation Environments Reagan W. Moore San Diego Supercomputer Center Storage Resource Broker.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
Building Preservation Environments from Federated Data Grids Reagan W. Moore San Diego Supercomputer Center Storage.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Collection Based Persistent Archives
Policy-Based Data Management integrated Rule Oriented Data System
Arcot Rajasekar Michael Wan Reagan Moore (sekar, mwan,
VORB Virtual Object Ring Buffers
Technical Issues in Sustainability
Presentation transcript:

San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure Storage Resource Broker Reagan W. Moore San Diego Supercomputer Center

San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure Data Management Objectives Automate all aspects of data management –Discovery (without knowing the file name) –Access (without knowing its location) –Retrieval (using your preferred API) –Control (without having a personal account at the remote storage system) –Performance (use latency management mechanisms to minimize impact of wide-area-networks)

San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure Collections Replicated via SRB onto TeraGrid 2MASS –10 TBs, 5 million images DPOSS –3 TBs, 6000 images USNO-B –In progress SDSS –In progress MACHO –In negotiation

San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure SRB Implementations Data collecting –Sensor systems, object ring buffers and portals Data organization –Collections, manage data context Data sharing –Data grids, manage heterogeneity Data publication –Digital libraries, support discovery Data preservation –Persistent archives, manage technology evolution Data analysis –Processing pipelines, manage knowledge extraction

San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure NSF Infrastructure Projects Using SRB Partnership for Advanced Computational Infrastructure - PACI –Data grid - Storage Resource Broker Distributed Terascale Facility - DTF/ETF –Compute, storage, network resources Digital Library Initiative, Phase II - DLI2 –Publication, discovery, access Information Technology Research projects - ITR –SCEC Southern California Earthquake Center –GEON GeoSciences Network –SEEK Science Environment for Ecological Knowledge –GriPhyN Grid Physics Network –NVO National Virtual Observatory National Science Digital Library - NSDL –Support for education curricula modules

San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure Federal Infrastructure Projects Using SRB NASA –Information Power Grid - IPG –Advanced Data Grid - ADG –Data Management System - Data Assimilation Office Integration of DODS with Storage Resource Broker data grid –Earth Observing Satellite EOS data pools –Consortium of Earth Observing Satellites CEOS data grid Library of Congress –National Digital Information Infrastructure and Preservation Program - NDIIPP National Archives and Records Administration and National Historical Public Records Commission –Prototype persistent archives NIH –Biomedical Informatics Research Network data grid DOE –Particle Physics Data Grid - Babar, CMS

San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure SDSC Collaborations Hayden Planetarium Simulation & Visualization Knowledge Network for BioComplexity (NSF) Mol Science – JCSG, AfCS Visual Embryo Project (NLM) RoadNet (NSF) Earth System Sciences – CEED, Bionome, SIO Explorer Hyper LTER Grid Portal (NPACI) Tera Scale Computing (NSF) Long Term Archiving Project (NARA) Education – Transana (NPACI) NSDL – National Science Digital Library (NSF) Digital Libraries – ADL, Stanford, UMichigan, UBerkeley, CDL … 31 additional collaborations

San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure Approach Use collections to organize digital entities –Digital entity - file, URL, SQL, directory, table, … Create logical name space –Location independent naming convention –Map state information created by data access services to the logical name space –Manage consistency constraints on the metadata update Build an interoperability mechanism –Map from storage repository protocols to preferred APIs

San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure Basic Concepts Logical name space –Map administrative, descriptive, authenticity, consistency metadata onto the logical name Storage repository abstraction –Standard operations performed at remote storage Information repository abstraction –Standard operations to manage collection in a database Access abstraction –Standard operations supported for metadata and data access Authentication abstraction –Collection-owned data, ACLs for data and metadata Latency management mechanisms

San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure Unix Shell Java, NT Browsers OAI WSDL GridFTP SDSC Storage Resource Broker & Meta-data Catalog Archives HPSS, ADSM, UniTree, DMF Databases DB2, Oracle, Postgres File Systems Unix, NT, Mac OSX Application HRM ORB Access APIs Servers Storage Abstraction Catalog Abstraction Databases DB2, Oracle, Postgres, SQLServer, Informix C, C++, Libraries Logical Name Space Latency Management Data Transport Metadata Transport Consistency Management / Authorization-Authentication Prime Server Linux I/O DLL / Python

San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure Production Data Grid SDSC Storage Resource Broker –Federated client-server system, managing Over 70 TBs of data at SDSC Over 10 million files –Manages data collections stored in Archives (HPSS, UniTree, ADSM, DMF) Hierarchical Resource Managers Tapes, tape robots File systems (Unix, Linux, Mac OS X, Windows) FTP sites Databases (Oracle, DB2, Postgres, SQLserver, Sybase, Informix) Virtual Object Ring Buffers

San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure SRB server SRB agent SRB server Federated SRB server model MCAT Read Application SRB agent Logical Name Or Attribute Condition 1.Logical-to-Physical mapping 2.Identification of Replicas 3.Access & Audit Control Peer-to-peer Brokering Server(s) Spawning Data Access Parallel Data Access R1 R2 5/6

San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure Logical Name Space Example - Hayden Planetarium Generate fly-through of the evolution of the solar system Access data distributed across multiple administration domains Gigabyte files, total data size was 7 TBytes Very tight production schedule - 3 months

San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure

Hayden Data Flow NCSA SDSC AMNH NYC GPFS 7.5 TB IBM SP2 SGI Production parameters, movies, images data simulation visualization HPSS 7.5 TB 2.5 TB UniTree UVa NY CalTech BIRN

San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure Logical Name Space Global, location-independent identifiers for digital entities –Organized as collection hierarchy –Attributes mapped to logical name space Attributed managed in a database Types of system metadata –Physical location of file –Owner, size, creation time, update time –Access controls

San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure Mappings on Name Space Define logical resource name –List of physical resources Replication –Write to logical resource completes when all physical resources have a copy Load balancing –Write to a logical resource completes when copy exist on next physical resource in the list Fault tolerance –Write to a logical resource completes when copies exist on k of n physical resources

San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure Latency Management Example - Digital Sky Project 2MASS (2 Micron All Sky Survey): –Bruce Berriman, IPAC, Caltech; John Good, IPAC, Caltech, Wen- Piao Lee, IPAC, Caltech NVO (National Virtual Observatory): –Tom Prince, Caltech, Roy Williams CACR, Caltech, John Good, IPAC, Caltech SDSC – SRB : –Arcot Rajasekar, Mike Wan, George Kremenek, Reagan Moore

San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure Digital Sky - 2MASS The input data was originally written to DLT tapes in the order seen by the telescope –10 TBytes of data, 5 million files Ingestion took nearly 1.5 years - almost daily reading of tapes, one at a time Images aggregated into 147,000 containers by SRB

San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure Digital Sky Data Ingestion Informix SUN SRB SUN E10K HPSS …. 800 GB 10 TB SDSC IPAC CALTECH input tapes from telescopes star catalog Data Cache

San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure

SRB Latency Management Replication Server-initiated I/O Streaming Parallel I/O Caching Client-initiated I/O Remote Proxies, Staging Data Aggregation Containers Source Destination Prefetch Network Destination Network

San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure Containers Images sorted by spatial location –Retrieving one container accesses related images Minimizes impact on archive name space –HPSS stores 680 Tbytes in 17 million files Minimizes distribution of images across tapes Bulk unload by transport of containers

San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure SRB Development Peer-to-peer federation –Support multiple independent MCAT catalogs –Replicate metadata mySQL/BerkeleyDB port OGSA/OGSI compliant interface GridFTP interfaces –Waiting for next release of the software (4thQ)

San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure MySRB Features Data & File Management Collection Creation and Management Collection of Varied Objects –Files, SQL Objects, Databases, URLs, directories, archives, … Metadata Handling Browsing & Querying Interface Access Control Version Control (soon) Support proxy (remote) operations

San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure MySRB Web-based Access to the SRB Secure HTTP Uses Cookies for Session Control Self Registration of Users Supported –Currently limited to SDSC users Self Registration of Resources (soon) Access to Both Data and Metadata

San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure Data Management Browse in Hierarchical Collections Registration of (remote) Legacy Files & Directories Registration of SQL Objects Registration of URLs Data Movement Operations –Ingest & Re-Ingest, Delete, Unlink –Replicate, Copy, Move, S-Link Access Control Operations –Read, Write, Own, Curate, Annotate, … –Ticket-based Access Version Control Operations (soon) –Read Lock, Write Lock, Unlock –Check In Check Out

San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure Types of Meta data System-level Metadata –Size, resource, owner, date, access control, … User-defined Meta data –for data & collections – triples –No limits in number of metadata –Support for Collection-level schemas Comments, default values, drop-down lists –Support for Standardized Schemas (eg. Dublin Core) Annotations –Supports textual annotations –Annotator, date, context also registered

San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure Meta Data Management Insert, Update and Delete of Metadata Access Control for Metadata (soon in mySRB) Querying across system-level, user-defined metadata and annotations –Query under collections & across collections Browsing on user-defined metadata Metadata supported for legacy files & directories Extract Metadata (using proxy operations)