Building Shared Collections Using the Storage Resource Broker Storage Resource Broker Reagan W. Moore

Slides:



Advertisements
Similar presentations
National University Community Research Institute (NUCRI) NU Community Research Institute (NUCRI) HASTAC (higher education)/HASS grid National School Board.
Advertisements

San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure Storage Resource Broker Reagan W. Moore San Diego Supercomputer.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
GFS OGF-22 Global Resource Naming Developers: Reagan Moore Arcot Mike.
OGF-23 iRODS Metadata Grid File System Reagan Moore San Diego Supercomputer Center.
Data Management Systems Richard Marciano Reagan W. Moore Wayne Schroeder Arcot Rajasekar Mike Wan San Diego Supercomputer Center
The Storage Resource Broker and.
The Storage Resource Broker and.
Overview of the SDSC Storage Resource Broker Wayne Schroeder (and other SRB team members) May, 2004 San Diego Supercomputer Center, University of California.
Peter Berrisford RAL – Data Management Group SRB Services.
Digital Preservation Lifecycle Management Building a demonstration prototype for the preservation of large-scale multi-media collections Arcot Rajasekar.
Data Grid: Storage Resource Broker Mike Smorul. SRB Overview Developed at San Diego Supercomputing Center. Provides the abstraction mechanisms needed.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure Data Grids, Digital Libraries, and Persistent Archives ESIP.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids Reagan W. Moore San Diego Supercomputer Center.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids, Digital Libraries and Persistent Archives Reagan.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid PPDG Data Handling System Reagan.
San Diego Supercomputer Center, University of California at San Diego Grid Physics Network (GriPhyN) University of Florida A Data Storage Language for.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure Integration of Data Grids, Digital Libraries, and Persistent.
San Diego Supercomputer Center NARA Research Prototype Persistent Archive Building Preservation Environments with Data Grid Technology (NARA Research Prototype.
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure1 Grid Based Solutions for Distributed Data Management Reagan.
A Very Brief Introduction to iRODS
Federating Archives in the DELAMAN Network Reagan W. Moore San Diego Supercomputer Center Storage Resource.
Security Requirements for Shared Collections Storage Resource Broker Reagan W. Moore
Applying Data Grids to Support Distributed Data Management Storage Resource Broker Reagan W. Moore Ian Fisk Bing Zhu University of California, San Diego.
Modern Data Management Overview Storage Resource Broker Reagan W. Moore
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
On Developing Data Grid Workflows using Storage Resource Broker (SRB) and Kepler Tim H. Wong - UC Davis Efrat Frank - SDSC Dr. Bertram Ludäscher - UC Davis.
Data Grid Interactions with Firewalls Michael Wan Reagan Moore SDSC/UCSD/NPACI.
SDSC Projects Part 1: BUILDING PRESERVATION ENVIRONMENTS (Reagan Moore, Storage Resource Broker (SRB) and collection migration technologies:
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
Data Grids and Data Management Storage Resource Broker Reagan W. Moore
National Partnership for Advanced Computational Infrastructure Digital Library Architecture Reagan Moore Chaitan Baru Amarnath Gupta George Kremenek Bertram.
Information Management and Distributed Data Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
Jan Storage Resource Broker Managing Distributed Data in a Grid A discussion of a paper published by a group of researchers at the San Diego Supercomputer.
Data Grids and Data Management Storage Resource Broker Reagan W. Moore
Managing Simulation Output Storage Resource Broker Reagan W. Moore
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Data Grid Services/SRB/SRM & Practical Hai-Ning Wu Academia Sinica Grid Computing.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Archive for the NSDL Reagan W. Moore Charlie Cowart.
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida DGL: The Assembly Language for Grid Computing Arun swaran Jagatheesan.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure1 Data Grids, Digital Libraries, and Persistent Archives Reagan.
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
Michael Doherty RAL UK e-Science AHM 2-4 September 2003 SRB in Action.
From SRB to IRODS: Policy Virtualization using Rule-Based Data Grids Reagan W. Moore Wayne Schroeder Arcot Rajasekar Mike Wan San Diego Supercomputer Center.
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
Introduction to The Storage Resource.
Biomedical Informatics Research Network The Storage Resource Broker & Integration with NMI Middleware Arcot Rajasekar, BIRN-CC SDSC October 9th 2002 BIRN.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Interlib Technology Integration Reagan.
SDSC Storage Resource Broker & Meta-data Catalog SRB Archives HPSS, ADSM, UniTree, DMF Databases DB2, Oracle, Sybase File Systems Unix, NT, Mac OSX Application.
National Archives and Records Administration1 Integrated Rules Ordered Data System (“IRODS”) Technology Research: Digital Preservation Technology in a.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Building Preservation Environments Reagan W. Moore San Diego Supercomputer Center Storage Resource Broker.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
Preservation Environments for GIS Systems Reagan Moore Richard Marciano Ilyz Zaslavsky San Diego Supercomputer Center.
Building Preservation Environments from Federated Data Grids Reagan W. Moore San Diego Supercomputer Center Storage.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Collection Based Persistent Archives
Policy-Based Data Management integrated Rule Oriented Data System
Arcot Rajasekar Michael Wan Reagan Moore (sekar, mwan,
Interlib Technology Integration
VORB Virtual Object Ring Buffers
Technical Issues in Sustainability
Presentation transcript:

Building Shared Collections Using the Storage Resource Broker Storage Resource Broker Reagan W. Moore

Storage Resource Broker Data grid middleware Organize distributed data into shared collections. Support access through C library calls Java class libraries and GridSphere portal Python/Perl load libraries Interactive browsers (Web, Perl, PHP, Windows) Digital libraries (DSpace, Fedora). Manage properties of the shared collection needed by Preservation environments Digital libraries Real-time sensor systems Secure data management environments. Used in production SDSC collections Internationally shared collections

Data Grid Using a Data Grid – in Abstract Ask for data User asks for data from the data grid Data delivered The data is found and returned Where & how details are hidden

Using a Data Grid - Details Storage Resource Broker Data request goes to SRB Server Storage Resource Broker Metadata Catalog DB Server looks up data in catalog Catalog tells which SRB server has data 1 st server asks 2 nd for data The data is found and returned User asks for data

Using a Data Grid - Details SRB MCAT DB SRB Data Grid has arbitrary number of servers Complexity is hidden from users

Shared Collections Purpose of SRB data grid is to enable the creation of a collection that is shared between academic institutions Register digital entity into the shared collection Assign owner, access controls Assign descriptive, provenance metadata Manage state information Audit trails, versions, replicas, backups, locks Size, checksum, validation date, synchronization date, … Manage interactions with storage systems Unix file systems, Windows file systems, tape archives, … Manage interactions with preferred access mechanisms Web browser, Java, WSDL, C library, …

Shared Collections Data grids support the creation of shared collections that may be distributed across multiple institutions, sites, and storage systems. Digital libraries publish data, and provide services for discovery and display Persistent archives preserve data, managing the migration to new technology Real-time sensor systems federate name spaces across independent environments

Mark Ellisman Biomedical Informatics Research Network BIRN Data Grid

Mark Ellisman

National Science Digital Library URLs for educational material for all grade levels registered into repository at Cornell SDSC crawls the URLs, registers the web pages into a SRB data grid, builds a persistent archive 750,000 URLs 13 million web pages About 3 TBs of data

Southern California Earthquake Center SCEC Community Library Select Receiver (Lat/Lon) Output Time History Seismograms Select Scenario Fault Model Source Model Intuitive User Interface – –Pull-Down Query Menus – –Graphical Selection of Source Model – –Clickable LA Basin Map (Olsen) – –Seismogram/History extraction (Olsen) Access SCEC Digital Library – –Data stored in a data grid – –Annotated by modelers – –Standard naming convention – –Automated extraction of selected data and metadata – –Management of visualizations SCEC Digital Library

Terashake Data Handling Simulate 7.7 magnitude earthquake on San Andreas fault 50 Terabytes in a simulation Move 10 Terabytes per day Post-Processing of wave field Movies of seismic wave propagation Seismogram formatting for interactive on-line analysis Velocity magnitude Displacement vector field Cumulative peak maps Statistics used in visualizations Register derived data products into SCEC digital library

Humidity Climate Ecological Wireless Oceanography Wind Speed Climate Ecological Wireless Oceanography Seismic Geophysics ROADNet Sensor Network Data Integration Fire start Rain start Frank Vernon - UCSD/SIO

NARA Persistent Archive NARAU MdSDSC MCAT Original data at NARA, data replicated to U Md & SDSC Replicated copy at U Md for improved access, load balancing and disaster recovery Active archive at SDSC, user access Demonstrate preservation environment Authenticity Integrity Management of technology evolution Mitigation of risk of data loss Replication of data Federation of catalogs Management of preservation metadata Scalability Types of data collections Size of data collections Federation of Three Independent Data Grids

Logical Name Spaces Storage Repository Storage location User name File name File context (creation date,…) Access constraints Data Grid Logical resource name space Logical user name space Logical file name space Logical context (metadata) Control/consistency constraints Data Collection Data Access Methods (C library, Unix, Web Browser) Data is organized as a shared collection

Federation Between Data Grids Data Grid Logical resource name space Logical user name space Logical file name space Logical context (metadata) Control/consistency constraints Data Collection B Data Access Methods (Web Browser, DSpace, OAI-PMH) Data Grid Logical resource name space Logical user name space Logical file name space Logical context (metadata) Control/consistency constraints Data Collection A Access controls and consistency constraints on cross registration of digital entities

NOAO Astronomy Data Grid Chile Tucson, Arizona NCSA, Illinois A functioning international Data Grid for Astronomy Manchester-SDSC mirror Moved over 400,000 images

Irene Barg

Worldwide University Network Data Grid SDSC Manchester Southampton White Rose NCSA U. Bergen A functioning, general purpose international Data Grid for academic collaborations Manchester-SDSC mirror

WUNGrid Collections BioSimGrid Molecular structure collaborations White Rose Grid Distributed Aircraft Maintenance Environment Medieval Studies Music Grid e-Print collections DSpace Astronomy

BaBar High-energy Physics Stanford Linear Accelerator Lyon, France Rome, Italy San Diego RAL, UK A functioning international Data Grid for high-energy physics Manchester-SDSC mirror Moved over 170 TBs of data

SRB Objectives Automate all aspects of data discovery, access, management, analysis, preservation Security paramount Distributed data Provide distributed data support for Data sharing - data grids Data publication - digital libraries Data preservation - persistent archives Data collections - Real time sensor data

Unix Shell NT Browser, Kepler Actors http, Portlet, WSDL, OAI-PMH) DSpace, OpenDAP, GridFTP, Fedora Archives - Tape, Sam-QFS, DMF, HPSS, ADSM, UniTree, ADS Databases - DB2, Oracle, Sybase, Postgres, mySQL, Informix File Systems Unix, NT, Mac OSX Application ORB Storage Repository Abstraction Database Abstraction Databases - DB2, Oracle, Sybase, Postgres, mySQL, Informix C Library, Java Logical Name Space Latency Management Data Transport Metadata Transport Consistency & Metadata Management / Authorization, Authentication, Audit Linux I/O C++ DLL / Python, Perl, Windows Federation Management Storage Resource Broker 3.3.1

Data Grid Operations File access Open, close, read, write, seek, stat, synch, … Audit, versions, pinning, checksums, synchronize, … Parallel I/O and firewall interactions Versions, backups, replicas Latency management Bulk operations Register, load, unload, delete, … Remote procedures HDFv5, data filtering, file parsing, replicate, aggregate Metadata management SQL generation, schema extension, XML import and export, browsing, queries, GGF, Operations for Access, Management, and Transport at Remote Sites

Types of Risk Media failure Replicate data onto multiple media Vendor specific systemic errors Replicate data onto multiple vendor products Operational error Replicate data onto a second administrative domain Natural disaster Replicate data to a geographically remote site Malicious user Replicate data to a deep archive

How Many Replicas Three sites minimize risk Primary site Supports interactive user access to data Secondary site Supports interactive user access when first site is down Provides 2nd media copy, located at a remote site, uses different vendor product, independent administrative procedures Deep archive Provides 3rd media copy, staging environment for data ingestion, no user access

Deep Archive Z2Z1 Z3 Z2:D2:U2 Register Z3:D3:U3 Register Pull Pull Firewall Server initiated I/O DeepArchive StagingZone Remote Zone No access by Remote zones PVN

SRB Developers Reagan Moore - PI Michael Wan - SRB Architect Arcot Rajasekar - SRB Manager Wayne Schroeder - SRB Productization Charlie Cowart- inQ Lucas Gilbert - Jargon Bing Zhu - Perl, Python, Windows Antoine de Torcy - mySRB web browser Sheau-Yen Chen - SRB Administration George Kremenek- SRB Collections Arun Jagatheesan - Matrix workflow Marcio Faerman - SCEC Application Sifang Lu - ROADnet Application Richard Marciano - SALT persistent archives Contributors from UK e-Science, Academia Sinica, Ohio State University, Aerospace Corporation, … 75 FTE-years of support About 300,000 lines of C

Development SRB December 15, 2000 Basic distributed data management system Metadata Catalog SRB February 18, 2003 Parallel I/O support Bulk operations SRB August 30, 2003 Federation of data grids SRB April 30, 2006 Feature requests (quotas)

For More Information Reagan W. Moore San Diego Supercomputer Center