Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,

Slides:



Advertisements
Similar presentations
3 September 2004NVO Coordination Meeting1 Grid-Technologies NVO and the Grid Reagan W. Moore George Kremenek Leesa Brieger Ewa Deelman Roy Williams John.
Advertisements

National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
GFS OGF-22 Global Resource Naming Developers: Reagan Moore Arcot Mike.
OGF-23 iRODS Metadata Grid File System Reagan Moore San Diego Supercomputer Center.
© 2006 Open Grid Forum OGF19 Federated Identity Rule-based data management Wed 11:00 AM Mountain Laurel Thurs 11:00 AM Bellflower.
DuraSpace: Digital Information All Ways, Always Pretoria, South Africa May 14 th, 2009.
Data Grid: Storage Resource Broker Mike Smorul. SRB Overview Developed at San Diego Supercomputing Center. Provides the abstraction mechanisms needed.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid PPDG Data Handling System Reagan.
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure1 Grid Based Solutions for Distributed Data Management Reagan.
A Very Brief Introduction to iRODS
Applying Data Grids to Support Distributed Data Management Storage Resource Broker Reagan W. Moore Ian Fisk Bing Zhu University of California, San Diego.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
1 Chapter 2 Database Environment. 2 Objectives of Three-Level Architecture u All users should be able to access same data u User’s view immune to changes.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Database Environment 1.  Purpose of three-level database architecture.  Contents of external, conceptual, and internal levels.  Purpose of external/conceptual.
National Partnership for Advanced Computational Infrastructure Digital Library Architecture Reagan Moore Chaitan Baru Amarnath Gupta George Kremenek Bertram.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Information Management and Distributed Data Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
Working Group: Practical Policy Rainer Stotzka, Reagan Moore.
A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.
Jan Storage Resource Broker Managing Distributed Data in a Grid A discussion of a paper published by a group of researchers at the San Diego Supercomputer.
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
Information System Development Courses Figure: ISD Course Structure.
OOI CI LCA REVIEW August 2010 Ocean Observatories Initiative OOI Cyberinfrastructure Architecture Overview Michael Meisinger Life Cycle Architecture Review.
The Grid System Design Liu Xiangrui Beijing Institute of Technology.
Working Group Practical Policy based on slides and latest documents from the PP WG chaired by Reagan Moore, Rainer Stotzka presented by Johannes Reetz.
Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Archive for the NSDL Reagan W. Moore Charlie Cowart.
Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore 1.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
1 Chapter 1 Introduction to Databases Transparencies.
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
GGF-17 Preservation Environments Research Group Preservation Environment Working Group Officers: Bruce Barkstrom (NASA Langley) Reagan.
Introduction to The Storage Resource.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
Biomedical Informatics Research Network The Storage Resource Broker & Integration with NMI Middleware Arcot Rajasekar, BIRN-CC SDSC October 9th 2002 BIRN.
Partnerships in Innovation: Serving a Networked Nation Grid Technologies: Foundations for Preservation Environments Portals for managing user interactions.
National Archives and Records Administration1 Integrated Rules Ordered Data System (“IRODS”) Technology Research: Digital Preservation Technology in a.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
Working Group: Data Foundations and Terminology (Practical Policy Considerations) Reagan Moore.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
Building Preservation Environments from Federated Data Grids Reagan W. Moore San Diego Supercomputer Center Storage.
國立臺北科技大學 課程:資料庫系統 Chapter 2 Database Environment.
REV 00 Chapter 2 Database Environment DDC DATABASE SYSTEM.
REV 00 Chapter 2 Database Environment DDC DATABASE SYSTEM.
The Data Grid: Towards an architecture for Distributed Management
Policy-Based Data Management integrated Rule Oriented Data System
Joseph JaJa, Mike Smorul, and Sangchul Song
Flexible Extensible Digital Object Repository Architecture
Flexible Extensible Digital Object Repository Architecture
CHAPTER 2 CREATING AN ARCHITECTURAL DESIGN.
Outline Pursue Interoperability: Digital Libraries
Chapter 2 Database Environment Pearson Education © 2009.
Arcot Rajasekar Michael Wan Reagan Moore (sekar, mwan,
Database Environment Transparencies
The Anatomy and The Physiology of the Grid
Chapter 2 Database Environment Pearson Education © 2014.
Technical Issues in Sustainability
Presentation transcript:

Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar, M. Wan Presenter: Yedugani Pawan Kumar

Overview Introduction Grid Evolution Integrating Digital Libraries and Data Grids: Spanning The Information Divide Storage Resource Broker – Integration Of Digital Libraries and Data Grids. Data Management Concepts Grid Implementations Grids and Digital Libraries

Introduction Data grids support massive data collections that are distributed across multiple institutions. For example: Worldwide Universities Network support the sharing of data between academic institutions in the United States and United Kingdom. The SIOExplorer project manages an archive of ship logs from oceanographic research vessels. The above projects use collections to provide a context for the interpretations of their digital entities which are based upon a generic data management infrastructure, the San Deigo Supercomputer Center Storage Resource Broker (SDSC SRB).

Storage Resource Broker Storage Resource Broker manages distributed data. Data Grid technology provides the fundamentals mechanisms for distributed data. Digital libraries can be implemented on top of data grids through the addition of mechanisms to support collection creation, browsing and discovery. Persistent archives can be implemented on data grids by addition of integrity metadata needed to assert the invariance of the deposited material.

Definitions of Data, Information. Data Grid Community defines “data” as a strings of bits that comprise a digital entity. Information as a set of semantic labels that are assigned to strings of bits. The combination of a semantic label and associated data is termed as metadata.

Grid Evolution Each evolutionary steps required new naming conventions for resources, users, file etc. The naming conventions made it possible to create uniform labels. Virtual Organization Assignment of naming convention depends on the following: Cultural Consideration Organizational Consideration Choice of Infrastructure

Grid Evolution

INTEGRATING DIGITAL LIBRARIES AND DATA GRIDS: SPANNING THE INFORMATION DIVIDE Knowledge management is needed to support constraints in federation of data grids and semantic crosswalks between digital libraries. The integration of constraint based knowledge management technology with data grids, digital libraries and persistent archives, requires the relationship based constraints for all three environments. The assignment of semantic label to a digital entity requires processing step. For a scientific data one might attach the following semantic label to a filed: Name of the physical variable represented by the file. Units associated with the physical variable. Data model by which the bits are organized. Structural mapping implied by the data model Procedural mapping imposed on the data model.

Knowledge is the expression of relationship between semantic labels. Relationships are typed as logical (“is a”, ”has”) structural (existence of a structure within the string of bits) spatial (mapping of a string of bits to a coordinate system) temporal (mapping to a point in time) procedural (mapping to process results) functional (mapping of features to evaluation algorithm) systemic (properties that cover all members of a collection)

Each semantic label is the result of the application of a process. Information is created by application of constraints appropriate to given community. Each type of knowledge constraints can be given a name and associated with a digital entity as a semantic label. The DLib community encapsulates the knowledge constraints in the curation processes that are applied when collection is assembled. The preservation community encapsulates them in the archival processes that are applied when the archival collection is created. The data grid community characterizes knowledge constraints as applied processes or functions that transform digital entities into derived data products. A major change in perspective is needed when dealing with sociological imperatives that arise from interaction between independent group of researchers.

Requirement for management of knowledge constraints are pervasive. Constraints are needed to enforce controls on interaction between federated data management systems both for access and for consistency. Constraints constitute relationship or rules that must be evaluated each the item is shared or accessed. For DLib community access constraints controls the mapping of semantic label within one community that can be mapped to semantic labels used by another community. The preservation community associates authenticity metadata with each digital entity with each digital which asserts the archival processes applied on it.

STORAGE RESOURCE BROKER – INTEGRATION OF DIGITAL LIBRARIES AND DATA GRIDS A generic data management system developed to build digital libraries to build data, data grids for the sharing of data, and persistent archives for preservation of data. SRB is used extensively within the NPACI project over 350 terabytes of data stored under the management of SRB at SDSC, comprising over 50 millions of files. The implementation of the SRB technology for use within the NPACI grid required the development of fundamental virtualizations. Storage Repository Virtualization. Data Virtualization Mechanisms. Information Repository Virtualization. Service Virtualization.

Projects Using SRB Technology

SRB SRB is the underlying data management technology in each of the table 2 projects. The resulting architecture have similar components to those used in National Virtual Observatory (NVO). It has the following components: Portals that provide a user interface to the NVO services Registry for publishing the existence of NVO services Web-based services that implement interactive data manipulation or analysis tasks Workflow environments for support of processing pipelines SRB data grid for access to the storage repositories Grid software for distributed computation Catalogs and image archives of sky surveys Storage systems and archives

NVO Architecture

SRB The concepts implemented in the SRB are now being used by all other data grid implementation. The concepts include : Use of federated client server architecture. Use of a logical name space. Mapping of attributes onto the logical name space. Use of access controls on digital entities. Explicit services developed within the SRB for replication, aggregation of data into containers, support for user- defined metadata, role-based access controls, and ticket- based authentication, are now being implemented in other data grids.

Data Management Concepts Creation of logical name space. Mapping state information to logical names as attributes. Consistency constraints is maintained by imposing multiple levels of constraints on logical name space. Middleware Data grids manage and manipulate consistency on distributed state information. DLib add mappings to manage user defined metadata to support discovery and browsing. Persistent archives add mappings to manage the authenticity of the deposited digital entities.

Data Management Concepts

Grid Implementation Middleware is an infrastructure that manages the information flow between processes and distributed collections. Organization of computational results makes it possible to associate a context. A digital entity becomes useless without a context. Grids focus on execution of access services. Dlibs focus on the management of the results

Grid Implementation

Grids and Digital Libraries Federating Name Spaces Replica Location Service Community Authorization Metadata Catalog Services Processing Pipeline Dataflow Environment Workflow Environment Consistency Management Information Flow

Conclusion By integrating data grids, digital libraries, and persistent archives we will be able to maintain the consistency of federated data collections while flowing information and data from digital libraries through grid services into preservation environments.

Thank You