Federating Archives in the DELAMAN Network Reagan W. Moore San Diego Supercomputer Center Storage Resource.

Slides:



Advertisements
Similar presentations
Building Shared Collections Using the Storage Resource Broker Storage Resource Broker Reagan W. Moore
Advertisements

3 September 2004NVO Coordination Meeting1 Grid-Technologies NVO and the Grid Reagan W. Moore George Kremenek Leesa Brieger Ewa Deelman Roy Williams John.
San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure Storage Resource Broker Reagan W. Moore San Diego Supercomputer.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
OGF-23 iRODS Metadata Grid File System Reagan Moore San Diego Supercomputer Center.
The Storage Resource Broker and.
The Storage Resource Broker and.
Overview of the SDSC Storage Resource Broker Wayne Schroeder (and other SRB team members) May, 2004 San Diego Supercomputer Center, University of California.
Digital Preservation Lifecycle Management Building a demonstration prototype for the preservation of large-scale multi-media collections Arcot Rajasekar.
Data Grid: Storage Resource Broker Mike Smorul. SRB Overview Developed at San Diego Supercomputing Center. Provides the abstraction mechanisms needed.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure Data Grids, Digital Libraries, and Persistent Archives ESIP.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids Reagan W. Moore San Diego Supercomputer Center.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids, Digital Libraries and Persistent Archives Reagan.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid PPDG Data Handling System Reagan.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure Integration of Data Grids, Digital Libraries, and Persistent.
San Diego Supercomputer Center NARA Research Prototype Persistent Archive Building Preservation Environments with Data Grid Technology (NARA Research Prototype.
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure1 Grid Based Solutions for Distributed Data Management Reagan.
A Very Brief Introduction to iRODS
Security Requirements for Shared Collections Storage Resource Broker Reagan W. Moore
Applying Data Grids to Support Distributed Data Management Storage Resource Broker Reagan W. Moore Ian Fisk Bing Zhu University of California, San Diego.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
Modern Data Management Overview Storage Resource Broker Reagan W. Moore
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Data Grid Interactions with Firewalls Michael Wan Reagan Moore SDSC/UCSD/NPACI.
SDSC Projects Part 1: BUILDING PRESERVATION ENVIRONMENTS (Reagan Moore, Storage Resource Broker (SRB) and collection migration technologies:
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
Data Grids and Data Management Storage Resource Broker Reagan W. Moore
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
National Partnership for Advanced Computational Infrastructure Digital Library Architecture Reagan Moore Chaitan Baru Amarnath Gupta George Kremenek Bertram.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Information Management and Distributed Data Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
Jan Storage Resource Broker Managing Distributed Data in a Grid A discussion of a paper published by a group of researchers at the San Diego Supercomputer.
Data Grids and Data Management Storage Resource Broker Reagan W. Moore
Managing Simulation Output Storage Resource Broker Reagan W. Moore
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Data Grid Services/SRB/SRM & Practical Hai-Ning Wu Academia Sinica Grid Computing.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Data Grid Management Systems (DGMS) Arun Jagatheesan San Diego Supercomputer Center
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure SRB + Web Services = Datagrid Management System (DGMS) Arcot.
Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Archive for the NSDL Reagan W. Moore Charlie Cowart.
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida DGL: The Assembly Language for Grid Computing Arun swaran Jagatheesan.
Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore 1.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure1 Data Grids, Digital Libraries, and Persistent Archives Reagan.
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
Introduction to The Storage Resource.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for.
Biomedical Informatics Research Network The Storage Resource Broker & Integration with NMI Middleware Arcot Rajasekar, BIRN-CC SDSC October 9th 2002 BIRN.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Interlib Technology Integration Reagan.
National Archives and Records Administration1 Integrated Rules Ordered Data System (“IRODS”) Technology Research: Digital Preservation Technology in a.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Building Preservation Environments Reagan W. Moore San Diego Supercomputer Center Storage Resource Broker.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
Building Preservation Environments from Federated Data Grids Reagan W. Moore San Diego Supercomputer Center Storage.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Collection Based Persistent Archives
Policy-Based Data Management integrated Rule Oriented Data System
Arcot Rajasekar Michael Wan Reagan Moore (sekar, mwan,
VORB Virtual Object Ring Buffers
Technical Issues in Sustainability
Presentation transcript:

Federating Archives in the DELAMAN Network Reagan W. Moore San Diego Supercomputer Center Storage Resource Broker

Build a shared collection Authenticate users independently of the storage systems Control access independently of the storage systems Organize the file name space independently of the storage systems Manage context (metadata) independently of content (files) Maintain consistency between context and operations on content Distributed Data Management Using Data Grids

Storage Resource Broker Generic distributed data management technology Data grids - sharing Digital libraries - publication Persistent archives - preservation Federated server architecture / thin client 250,000 lines of “C” code Supports all major compute and storage platforms All requirements listed on following Scenario slides are supported

Scenario 1- Data Migration Provide URIDs (logical file names) that are independent of storage system Provide metadata for each file Support browse and discovery on collection hierarchy Support access interfaces to the data Support registration of existing files into a shared collection Single sign-on environment GSI / challenge response / tickets

Managing Distributed Data Storage Repository Storage location User name File name File context (creation date,…) Access constraints Data Access Methods (Web Browser, DSpace, OAI-PMH) Naming conventions provided by storage systems

Data Grids Provide a Level of Indirection for Each Naming Convention Storage Repository Storage location User name File name File context (creation date,…) Access constraints Data Grid Logical resource name space Logical user name space Logical file name space (URID) Logical context (metadata) Control/consistency constraints Data Collection Data Access Methods (C library, Unix, Web Browser) Data is organized as a shared collection

Provide Context for Data Properties of files Provenance - source Descriptive attributes State information resulting from operations on files Organize properties as metadata in a collection hierarchy Define operations on file properties Manage state information - location, replicas, containers, checksums Separate context management from content management Maintain consistency of context as operations are done on content Support context management Schema extension, automated SQL generation, bulk metadata load Metadata extraction through a remote procedure parsing the file

SRB server SRB agent SRB server Federated Server Architecture MCAT Read Application SRB agent Logical Name Or Attribute Condition 1.Logical-to-Physical mapping 2.Identification of Replicas 3.Access & Audit Control Peer-to-peer Brokering Server(s) Spawning Data Access Parallel Data Access R1 R2 5/6

Unix Shell Java, NT Browser Kepler Actors OAI, WSDL, WSRF HTTP DSpace OpenDAP Archives - Tape, Sam-QFS, DMF, HPSS, ADSM, UniTree, ADS Databases DB2, Oracle, Sybase, SQLserver, Postgres, mySQL, Informix File Systems Unix, NT, Mac OSX Application ORB Storage Repository Virtualization Catalog Abstraction Databases DB2, Oracle, Sybase, Postgres, mySQL, Informix C, C++, Java Libraries Logical Name Space Latency Management Data Transport Metadata Transport Consistency & Metadata Management / Authorization,Authentication,Audit Linux I/O DLL / Python, Perl Federation Management Storage Resource Broker - Data Grid

Scenario 2 - Data Exchange Support access controls on the URIDs Java administration GUI to support owner control of access controls Can delegate permission to set access controls Access controls apply on all replicas independent of storage system Support latency management for moving files across wide area networks Parallel I/O, replication, staging, aggregation of data / metadata / I/O commands Support integrity validation Manage checksums for each file

Latency Management -Bulk Operations Bulk register Create a logical name for a file Bulk load Create a copy of the file on a data grid storage repository Bulk unload Provide containers to hold small files and pointers to each file location Bulk delete Mark as deleted in metadata catalog After specified interval, delete file Bulk metadata load Support parsing of metadata from a remote file at remote storage Requests for bulk operations for access control setting, …

Scenario 3 - Community Access Within the shared collection, the digital entities are owned and managed by the data grid Files, URLs, SQL commands, database binary large objects can be registered into the shared collection Access controls for Files / metadata / storage systems Access controls are defined for multiple roles Schema extension, create new metadata Modify metadata Add annotations Turn on audit trails Write data Read data

Scenario 4 - Explorative Studies Uniform access mechanisms to data across all storage systems Support for queries on databases Support for formatting results (XML, HTML) Support audit trails, encryption Support user-defined collection hierarchy Soft links (build a logical collection of pointers to data within the data grid) Support for multiple types of discovery By URID (Logical File Name) By query on metadata (may be unique to a single file) By GUID (handle system)

Scenario 5 - Education SRB is used to build digital libraries Assemble class material Manage student reports Display material through web browsers Federation of digital libraries Controlled sharing across independent data grids or digital libraries Support for cross-registration of logical name spaces Authentication done by “home” data grid Access controls managed by both data grids

Federation Data Grid Logical resource name space Logical user name space Logical file name space Logical context (metadata) Control/consistency constraints Data Collection B Data Access Methods (Web Browser, DSpace, OAI-PMH) Data Grid Logical resource name space Logical user name space Logical file name space Logical context (metadata) Control/consistency constraints Data Collection A Access controls and consistency constraints on cross registration of digital entities

Scenario 6 - Updating Resources Maintain system level metadata Owner of registered file Creation time, modification time, size, audit trails Replica locations Support for synchronization of replicas Can modify a replica, subsequent reads are to the modified copy Can synchronize copies to the modified version Support for physical file containers Aggregate small files before storage

Scenario 7 - Web-based Editions Support for digital library interfaces on top of the data grid Transana - technology to manipulate, edit, and manage classroom video (University of Wisconsin) DSpace - digital library system to manage ingestion of material into a collection OAI-PMH - Open Archives Initiative protocol for metadata harvesting OpenDAP - Data Access Protocol that supports both semantic and structural manipulation of registered files Windows browser, Web browser, Java, WSDL interfaces Collaborating on development of portlet interface

Unix Shell Java, NT Browser Kepler Actors OAI, WSDL, WSRF HTTP DSpace OpenDAP Archives - Tape, Sam-QFS, DMF, HPSS, ADSM, UniTree, ADS Databases DB2, Oracle, Sybase, SQLserver,Postgres, mySQL, Informix File Systems Unix, NT, Mac OSX Application ORB Storage Repository Virtualization Catalog Abstraction Databases DB2, Oracle, Sybase, Postgres, mySQL, Informix C, C++, Java Libraries Logical Name Space Latency Management Data Transport Metadata Transport Consistency & Metadata Management / Authorization,Authentication,Audit Linux I/O DLL / Python, Perl Federation Management Storage Resource Broker - Data Grid

Scenario 8 - Unconnected Editions Ability to download data from shared collection to local resource Support for PCs, workstations, supercomputers Generalization of anonymous FTP Can issue a ticket permitting Limited number of read accesses valid for specified time interval Can set public access to a sub-collection Can restrict access by user name/domain/zone

Local Archives Maintain files in local file system Register existence of the files into the data grid Issue synchronization command to replicate into the archive Maintain a data grid on the local system Entire environment can be installed on a Mac in 15 minutes (Perl install script) Use data grid federation to synchronize name spaces, files, metadata from local data grid to archives data grid

Scenario 9 - Collaborative Commmentary Comments can be added by owner Annotations can be added by authorized persons Annotations marked by person name, date Can restrict annotation right by group Can choose to create explicit metadata attributes to manage comments Can store multiple comments per object Can search across metadata Or can use digital library interfaces to manage comments

Sites Using the SRB

Generic Infrastructure SDSC developed the Storage Resource Broker (SRB) to support access to distributed data Effort started in 1996 as a DARPA funded project Now support over 30 national/international projects Development team of 12 staff is led by Michael Wan, data management systems Arcot Rajasekar, information management systems

Arun Jagatheesan George Kremenek Sheau-Yen Chen Arcot Rajasekar (SRB development lead) Reagan Moore (SRB PI) Michael Wan (SRB architect) Roman Olschanowsky (BIRN) Bing Zhu Charlie Cowart Lucas Gilbert Tim Warnock Wayne Schroeder (SRB product) Adam Birnbaum (SRB production) Antoine De Torcy Vicky Rowley (BIRN) Marcio Faerman (SCEC) Students & emeritus Erik Vandekieft Reena Mathew Xi (Cynthia) Sheng Allen Ding Grace Lin Qiao Xin Daniel Moore Ethan Chen Jon Weinburg Supported by overt 20 projects (NSF, DOE, NASA, NARA, NIH, LOC, NHPRC) SDSC SRB Team (left to right)

Data Grid Capabilities Data manipulation Containers Parallel I/O Firewall interactions Resource interactions Fault tolerance Load leveling Replication HIPAA security requirements Authentication of all users Access controls on data and metadata Audit trails Data encryption Centralized control Application interfaces C library, Shell commands, Java, Perl, Python, WSDL, workflow

Data Management System Features Data grid for managing distributed data Latency management for bulk analyses of collections Infrastructure independent name spaces for describing data, resources, users, and state information Digital library for managing data context Curation services for managing collections Descriptive metadata for discovery Persistent archive to manage technology evolution Interoperability mechanisms between heterogeneous storage systems and user access mechanisms

BIRN - Biomedical Informatics Research Network Data Grid Duke UCLA Cal Tech Wash U. Duke Harvard NIH/NCRR Centers for Imaging and Computing Cal-(IT) 2 NPACI/ SDSC “Deep Web” “Surface Web” Integrating Cyber Infrastructure to Link: Advanced Imaging Instruments Data Intensive Computing Multi-Scale Brain Databases Wireless “Pad” Web Interface

Digital Library Collection hierarchy for organizing data User-defined metadata Collection level metadata Metadata manipulation Schema extension Bulk metadata processing Queries on metadata Access controls on metadata Views on collections Digital library APIs DSpace, Fedora, OAI-PMH, web browsers METS metadata XML schema

Southern California Earthquake Center Store seismic data Managing over 90 TBs, over 1.7 million files Store community models for seismic velocity Data distributed between USC, SDSC SCEC community digital library Storage Resource Broker data grid technology NMI portal interface Digital library services to display seismograms Visualizations of seismic waves at the surface Visualization of seismic wave propagation through the volume SCEC Community Library Select Receiver (Lat/Lon) Output Time History Seismograms Select Scenario Fault Model Source Model

Registry Layer Existing Data Centers Data Services Semantics (UCD) SIAP, SSAP VOTable FITS, GIF,… OpenSkyQuery SkyQueryVOPlot OASIS conVOT Topcat Mirage AladinDIS Disks, Tapes, CPUs, Fiber Grid Middleware SRB, Globus, OGSA SOAP, GridFTP data mining visualization image source detection Virtual Observatory Architecture Digital Library Other registries XML, DC, METS OAI ADS My Space storage services Databases, Persistency, Replication Virtual Data Workflow (pipelines) Discover Compute Publish Collaborate Authentication & Authorization crossmatch HTTP Services SOAP Services Grid Services stateless, registered self-describing persistent, authenticated Portals, User Interfaces, Tools Compute Services Bulk Access interfaces to data National Virtual Observatory Provide access to large star catalogs and large image sky surveys 2MASS SDSS DPOSS USNO-B Macho

National Science Digital Library Web Interface to Persistent Archive Preserve educational material that has been registered into a central repository at Cornell through URLs Crawl web and retrieve material, 10 levels of indirection Convert internal URLs into data grid handles Aggregate files into containers for storage Preserve using SRB data grid technology Currently housing over 26 million files

National Archives and Records Administration - Research Prototype Persistent Archive NARAU MdSDSC MCAT Principle copy stored at NARA with complete metadata catalog Replicated copy at U Md for improved access, load balancing and disaster recovery Deep Archive at SDSC, no user access, but complete copy Demonstrate preservation environment Authenticity Integrity Management of technology evolution Mitigation of risk of data loss Replication of data Federation of catalogs Management of preservation metadata Scalability EAP collection 350,000 files 1.2 TBs in size Federation of Three Independent Data Grids

For More Information Reagan W. Moore San Diego Supercomputer Center