Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.

Slides:



Advertisements
Similar presentations
Panel 2 – Promoting Re-Use of Scientific Collections John Harrison SHAMAN Project University of Liverpool
Advertisements

ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Spatial Data Infrastructure: Concepts and Components Geog 458: Map Sources and Errors March 6, 2006.
Background Chronopolis Goals Data Grid supporting a Long-term Preservation Service Data Migration Data Migration to next generation technologies Trust.
Transformations at GPO: An Update on the Government Printing Office's Future Digital System George Barnum Coalition for Networked Information December.
Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center.
ADAPT An Approach to Digital Archiving and Preservation Technology Principal Investigator: Joseph JaJa Lead Programmers: Mike Smorul and Mike McGann Graduate.
PAWN: Producer-Archive Workflow Network University of Maryland Institute for Advanced Computer Studies Joseph Ja’Ja, Mike Smorul, Mike McGann.
May Archiving PAWN: A Policy-Driven Software Environment for Implementing Producer- Archive Interactions in Support of Long Term Digital.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
Producer-Archive Workflow Network (PAWN) Goals Consistent with the Open Archival Information System (OAIS) model Use of web/grid technologies and platform.
Supporting Customized Archival Practices Using the Producer-Archive Workflow Network (PAWN) Mike Smorul, Mike McGann, Joseph JaJa.
Brief Overview of Major Enhancements to PAWN. Producer – Archive Workflow Network (PAWN) Distributed and secure ingestion of digital objects into the.
July NAGARA 1 Producer-Archive Workflow Network Mike Smorul, Mike McGann, Joseph JaJa Institute for Advanced Computer Science Studies University.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
Robust Technologies for Automated Ingestion and Long-Term Preservation of Digital Information Principal Investigator: Joseph JaJa Lead Programmers: Mike.
PAWN: Producer-Archive Workflow Network University of Maryland Institute for Advanced Computer Studies Joseph JaJa, Mike Smorul, Mike McGann.
PAWN: Producer-Archive Workflow Network University of Maryland Institute for Advanced Computer Studies Joseph Ja’Ja, Mike Smorul, Mike McGann.
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
Data Grid: GRASP Mike Smorul. Grid Retrieval and Search Platform Based on concepts developed in the Earth Science Data Interface (ESDI) developed at the.
Robust Technologies for Automated Ingestion and Long-Term Preservation of Digital Information PI: Joseph JaJa Co-PIs: Allison Druin and Doug Oard Major.
The Open Grid Service Architecture (OGSA) Standard for Grid Computing Prepared by: Haoliang Robin Yu.
Data Sources & Using VIVO Data Visualizing Scholarship VIVO provides network analysis and visualization tools to maximize the benefits afforded by the.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation Mike Smorul, Joseph JaJa, Yang Wang, and Fritz McCall.
A Framework for Distributed Preservation Workflows Rainer Schmidt AIT Austrian Institute of Technology iPres 2009, Oct. 5, San.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
UNIVERSITY of MARYLAND GLOBAL LAND COVER FACILITY High Performance Computing in Support of Geospatial Information Discovery and Mining Joseph JaJa Institute.
Digital Library Architecture and Technology
A Scalable Application Architecture for composing News Portals on the Internet Serpil TOK, Zeki BAYRAM. Eastern MediterraneanUniversity Famagusta Famagusta.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
Preserving Digital Collections for Future Scholarship Oya Y. Rieger Cornell University
Global Land Cover Facility The Global Land Cover Facility (GLCF) is a member of the Earth Science Information Partnership (ESIP) Federation providing data,
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
The Global Land Cover Facility Presentation for WGISS-25 February 26, 2008 University of Maryland, Department of Geography.
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009.
The Digital Library for Earth System Science: Contributing resources and collections Meeting with GLOBE 5/29/03 Holly Devaul.
National Geospatial Digital Archive Greg Janée University of California at Santa Barbara.
The Digital Library for Earth System Science: Contributing resources and collections GCCS Internship Orientation Holly Devaul 19 June 2003.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
Millman—Nov 04—1 An Update on Digital Libraries David Millman Director of Research & Development Academic Information Systems Columbia University
April 14, 2005MIT Libraries Visiting Committee Libraries Strategic Plan Theme III Work to shape the future MacKenzie Smith Associate Director for Technology.
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
System Development & Operations NSF DataNet site visit to MIT February 8, /8/20101NSF Site Visit to MIT DataSpace DataSpace.
Harokopio University of Athens – Department of Informatics and Telematics HAROKOPIOUNIVERSITY A Distributed Architecture for Building Federated Digital.
National Geospatial Enterprise Architecture N S D I National Spatial Data Infrastructure An Architectural Process Overview Presented by Eliot Christian.
Physical Oceanography Distributed Active Archive Center THUANG June 9-13, 20089th GHRSST-PP Science Team Meeting GHRSST GDAC and EOSDIS PO.DAAC.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
WP5 – Infrastructure Operations Test and Production Infrastructures StratusLab kick-off meeting June 2010, Orsay, France GRNET.
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
International Planetary Data Alliance Registry Project Update September 16, 2011.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
PAWN: Producer-Archive Workflow Network
The Open Grid Service Architecture (OGSA) Standard for Grid Computing
Joseph JaJa, Mike Smorul, and Sangchul Song
Implementing an Institutional Repository: Part II
Serpil TOK, Zeki BAYRAM. Eastern MediterraneanUniversity Famagusta
Implementing an Institutional Repository: Part II
How to Implement an Institutional Repository: Part II
Presentation transcript:

Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park

Overview Digital Preservation Research –ADAPT Project and Components –Pilot Persistent Archive Digital Library and Production Data Distribution –Global Land Cover Facility Conclusion

A Digital Approach to Preservation Technology (ADAPT) Premise: –Preservation of digital entities into self- describing objects OAIS Information Packet model as a framework –Separation of management into three layers, bitstream, semantic, and access/discovery –Distributed and Secure Infrastructure Automatic ingestion and replication Policy-Driven Management of Preservation Processes Global Format Registry Separate Peer-to-Peer Deep Archive

ADAPT Architecture

ADAPT Components Ingestion –Producer-Archive Workflow Network (PAWN) Management of Preservation Processes –Lightweight Preservation Environment (LPE) Access and Discovery –Grid Retrieval and Search Platform (GRASP) –EAP Collection browser

Overall Principles (PAWN) Distributed, secure ingestion OAIS based Information Packet creation Use of web/grid technologies – platform independent Minimal client-side requirements Ease of integration with archive and data grid systems. Designed to satisfy data integrity requirements of scientific collections and digital preservation

Distributed Ingestion (PAWN)

Ingestion Workflow (PAWN) 1.Negotiate Submission Agreement. 2.Workflow Initialization and Submission Information Packet (SIP) creation. 3.Transfer of SIPs to Data Grid site. 4.Validation of SIP transfer 5.Organization of data into collections and transfer into Data Grid.

Component Overview (PAWN)

Target Collections (PAWN) Digital Image Collection –Rich metadata in various formats Web site crawling –Online and interactive content GLCF Landsat data –Spatial and temporal metadata –Large quantity (over 15,000 objects)

The Lightweight Preservation Environment is an archival system based on a modular design using grid and web services. The current implementation relies mostly on Globus technologies. Primarily, we’ve focused on wrapping logic around those components. Lightweight Preservation Environment (LPE)

Developed Components (LPE) Data Manager (DM): Organizes data and queries between the user and the other components Policy Manager (PM): Ensures that a minimum number of copies exist for any given file Transformation Manager (TM): Executes specific transformations on a named file on a given storage node and returns the results

Grid Retrieval and Search Platform (GRASP) Based on concepts developed in the Earth Science Data Interface (ESDI) developed at the UMIACS GLCF. Provides a graphical interface into data grid holdings. Access to entire GLCF holdings through the Storage Resource Broker(SRB)

GRASP Architecture

GRASP uses a data grid as an abstract storage repository. Metadata in the grid is mined from the grid itself or from external sources and published into a browsable form. –Data grids may allow for platform independent metadata, but may not be optimal for access

GRASP Screenshot

Global Land Cover Facility Mission: “The GLCF Mission is to encourage the use of remotely sensed imagery, derived products and applications within a broad range of science communities in a manner that improves comprehension of the nature and causes of land cover change and its impact on the Earth.” Goal: “The GLCF Goal is to provide free access to an integrated collection of critical land cover and Earth science data through systems that are designed to maximize user outreach and that promote development of novel tools for ordering, visualizing and manipulating spatial data.”

Data Collections Majority of the holdings are of Landsat and MODIS data

Data Distribution Data at the GLCF –Approximately 5.1 TB compressed –Approximately 13 TB uncompressed Anticipated Production Rate –Triple or Quadruple current data holding within the next two year

Data Discovery Applications ESDI  Web Interface  User friendly  Search  Retrieve  Discover  Scalable  Over 9TB a month !

GLCF Architecture Scalable and Reliable

Participation Possibilities PAWN ingestion component –Minimal geospatial metadata support planned, can be expanded to support NGDA endpoint GRASP display component –Solid core components, end-user interfaces need additional polishing GLCF data holdings –Additional hardware required if additional data and access mechanisms (grid, etc) required Other possibilities include: grid infrastructure, GSI security, format registry, etc.

Questions