Towards a Federated Infrastructure for the Preservation and Analysis Archival Data Chien-Yi HOU Richard MARCIANO {chienyi, School.

Slides:



Advertisements
Similar presentations
Panel 2 – Promoting Re-Use of Scientific Collections John Harrison SHAMAN Project University of Liverpool
Advertisements

Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Presentations Introduction Case Studies:
An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
CFR 250/590 Introduction to GIS, Autumn 1999 Data Search & Import © Phil Hurvitz, find_data 1  Overview Web search engines NSDI GeoSpatial Data.
1 Applied CyberInfrastructure Concepts ISTA 420/520 Fall Nirav Merchant Bio Computing & iPlant Collaborative Eric Lyons.
A Very Brief Introduction to iRODS
“What I Learned This Summer”: A Week at SAA’s First Electronic Records Summer Camp Daniel Linke University Archivist and Curator of Public Policy Papers.
Sustainable Preservation Services for Archivists through Distributed Custody Caryn Wojcik State of Michigan Records Management Services.
Archives & Technology Collide: The Carolina Digital Repository Erin O’Meara Electronic Records Archivist University Archives and Records Services University.
PREMIS Implementation for the Carolina Digital Repository Andrew Hart Head, Preservation Department University Library University of North Carolina at.
iRODS: Interoperability in Data Management
DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.
Richard MARCIANO Chien-Yi HOU School of Information and Library Science (SILS) Sustainable Archives & Leveraging Technologies Group (SALT) University of.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
DCAPE Distributed Custodial Archival Preservation Environments ( Chien-Yi HOU Richard MARCIANO UNC Chapel Hill, SILS /
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation Mike Smorul, Joseph JaJa, Yang Wang, and Fritz McCall.
Richard MARCIANO Chien-Yi HOU School of Information and Library Science (SILS) Sustainable Archives & Leveraging Technologies Group (SALT) University of.
A Data Management Life-Cycle By David Ferderer Project Chief Chris SkinnerContractor Greg GuntherContractor
Sai Deng, Metadata Catalog Librarian, Wichita State University Libraries Tse-Min Wang, Graduate Student in CS, Wichita State University Digital Imaging.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
National Partnership for Advanced Computational Infrastructure Digital Library Architecture Reagan Moore Chaitan Baru Amarnath Gupta George Kremenek Bertram.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
OASIS ebXML Registry Standard Open Forum 2003 on Metadata Registries 10:30 – 11:15 January 20, 2003 Kathryn Breininger The Boeing Company Chair, OASIS.
Jan Storage Resource Broker Managing Distributed Data in a Grid A discussion of a paper published by a group of researchers at the San Diego Supercomputer.
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
1 integrated Rule Oriented Data System Tutorial: iRODS Capabilities.
Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management ServicesSALT DCAPE.
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
Banner Document Management Suite David Cheney |
SCENZ-Grid The implementation of a Science Collaboration and Computation Environment Niels Hoffmann Landcare Research.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
SLIDE 1DID Meeting - Montreal Integrating Data Mining and Data Management Technologies for Scholarly Inquiry Ray R. Larson University of California,
IRODS Service in GIMI. 2 User Can Search, Access, Add and Manage Data & Metadata Access distributed data with Web-based Browser or iRODS GUI or Command.
Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Archive for the NSDL Reagan W. Moore Charlie Cowart.
1 A Very Large Digital Library Technology Demonstration William Y. Arms Cornell University.
IRODS: the use of rules and micro services for automatic data conversion and signal pattern searching Martyn Fletcher, Tom Jackson, Bojian Liang, Michael.
Interoperability of Digital Repositories Adil Hasan Univ of Liverpool.
Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore 1.
National Geospatial Digital Archive Greg Janée University of California at Santa Barbara.
| Banner XtenderSolutions David Cheney SunGard Higher Education.
Persistent Digital Archives and Library System (PeDALS)
Mercury – A Service Oriented Web-based system for finding and retrieving Biogeochemical, Ecological and other land- based data National Aeronautics and.
1 Overview Finding and importing data sets –Searching for data –Importing data_.
The Project Three-year grant from the National Historical Publications and Records Commission (NHPRC), April 2010-March 2013 Develop electronic records.
Integrating Data Mining and Data Management Technologies for Scholarly Inquiry Ray R. Larson University of California, Berkeley Paul Watry Richard Marciano.
NSDL STEM Exchange: Technical Overview and Implications for Active Dissemination of Federally Funded Resources Across Implementation Systems.
National Archives and Records Administration1 Integrated Rules Ordered Data System (“IRODS”) Technology Research: Digital Preservation Technology in a.
OASIS ebXML Registry Standard Open Forum 2003 on Metadata Registries 10:30 – 11:15 January 20, 2003 Kathryn Breininger The Boeing Company Chair, OASIS.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
Physical Oceanography Distributed Active Archive Center THUANG June 9-13, 20089th GHRSST-PP Science Team Meeting GHRSST GDAC and EOSDIS PO.DAAC.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
North Carolina Geospatial Data Archiving Project/NDIIPP: Collection and preservation of at-risk digital geospatial data Partners: NCSU Libraries NC Center.
Use of Policies to Enforce Collection Properties Richard Marciano Reagan Moore University of North Chapel Hill Data Intensive Cyber Environments.
Developing a Dark Archive for OJS Journals Yu-Hung Lin, Metadata Librarian for Continuing Resources, Scholarship and Data Rutgers University 1 10/7/2015.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Policy-Based Data Management integrated Rule Oriented Data System
DCAPE Interface Demonstration
Flexible Extensible Digital Object Repository Architecture
Flexible Extensible Digital Object Repository Architecture
Presentation transcript:

Towards a Federated Infrastructure for the Preservation and Analysis Archival Data Chien-Yi HOU Richard MARCIANO {chienyi, School of Information and Library Science (SILS) Sustainable Archives & Leveraging Technologies Group (SALT) University of North Carolina at Chapel Hill 1

Preservation with Data Grid 2 Data grid systems provide data virtualization – Allows users to access files seamlessly across a distributed environments. – It replicates, syncs and archives data, connecting heterogeneous resources in a logical and abstracted manner. In addition to the capabilities above, iRODS, the Integrated Rule-Oriented Data System, provides policy/service virtualization – Rule Engine applies user-defined Policies and Rules – Policy can be coded as functions (micro-services) – Remote micro-services can be chained – The chains can be triggered on an event and condition (rules) – Micro-services communicate through parameters, shared contexts, and out-of-band message queues.

Overview of iRODS Architecture 3 3 User Can Search, Access, Add and Manage Data & Metadata *Access data with Web-based Browser or iRODS GUI or Command Line clients. iRODS Data Server Disk, Tape, etc. iRODS Metadata Catalog Track information iRODS Data System iRODS Rule Engine Track policies

Interoperability & Reproducibility T-RACES PoDRI e-Legacy Data Grid RSS Feed Reader Digital Repository Digital Library GIS 4

CA’s Geospatial Records: Archival Appraisal, Accessioning, and Preservation e-Legacy RSS Feed Reader Data Grid Dealing with GIS data Moving from “Expert Appraisal & Accessioning” to “Social Appraising and Automated Accessioning”. Incorporating RSS subscription into the appraisal process to allow archivists to work together on deciding what to preserve. Ingesting data to the archive automatically once the criteria were satisfied. Collaborators: California State Archives: Chris Garmire CERES: David Harris SALT/UNC: Richard Marciano, Chien-Yi Hou Funded by NHPRC 5

6 CSA California State Archives DICE (UNC/ INC) Archive ICAT CaSIL California Spatial Information Library Local Storage Resources Shared Preservation Environment Metadata Catalog (Oracle) Archival Storage (HPSS, Sam-QFS) e-Legacy: Shared Infrastructure

7 Where is The Data?

8 Old Approach: Formulating Appraisal Rules Retrieve root webpage ‘ For each entry: Create an “matching entry” collection on iRODS Add ‘entry description’ metadata to that collection Create “Documentation” subcollection Load web page Load all “.gif” | “.jpg” | “.jpeg” files Load all “.doc” Load metadata file Create “ArcINFO” subcollection Load all “.e00” | “.clr” | “.asc” | “.nit” | “.dlg” | “.txt” files Create “Shape” subcollection Load all “.shp” files Create “SDTS” subcollection Load all “.sdts” files Create “Others” subcollection Load “.tfw” | “.rdb” | “.clr” | “.asc” | “.prj” files DECOMPRESS & LOAD “.zip” | “.gz” | “.tgz” | “.tar” | “.tar.gz” files

9 What is RSS? RSS is a standardized XML file format for content providers to publish their contents. RSS is a web feed format. CERES Geospatial Service CERES Geospatial Service en-us Wed, 23 Jul :08:18 EDT CA Digital Raster Graphics 1x2 degree series CA Digital Raster Graphics 1x2 degree series(1:250K scale) Tue, 22 Jul :07:00 EDT CA Digital Raster Graphics 30x60 minute series CA Digital Raster Graphics 30x60 minute series(1:100K scale) Tue, 22 Jul :58:00 EDT

RSS Feed Reader

Appraisal Description Arrangement Preservation Subscribe to RSS Review Received Entry Share and Tag Meet Preservation Criteria Preserve to iRODS Yes e-Legacy Workflow

e-Legacy RSS Tool

PoDRI Policy-Driven Repository Interoperability PoDRI Digital Repository Data Grid The PoDRI project investigates the requirements for policy-aware interoperability and demonstrates key features needed for its implementation. Using iRODS and its rules engine, combined with Fedora’s rich semantic object model for digital objects, enables use of the best features of both products. Collaborators: UNC: Richard Marciano, David Poclar, Alex Chassanoff. Chien-Yi Hou Duraspace/Cornell: Daniel Davis DICE/UCSD: Bing Zhu Funded by IMLS 13

PoDRI Use Cases Fedora (Digital Repository) Fedora (Digital Repository) iRODS (Data Grid) iRODS (Data Grid) New content ingested via iRODS Bulk registration from iRODS into Fedora Update of content or metadata via Fedora Update of content or metadata via iRODS New content ingested via Fedora 14

PoDRI Use Case 1 15 New content ingested via Fedora

T-RACES Testbed for the Redlining Archives of California’s Exclusionary Spaces T-RACES Digital Library GIS Data Grid Making 1930s redlining files of eight California cities from the Federal Home Owners’ Loan Corporation accessible to the public. Integrating the data and maps and providing an interface for users to access and query the data easily. Being one of the first project to use the HASS (Humanities, Arts, and Social Sciences) Grid, a cyberinfrastructure initiative organized by the University of California Humanities Research Institute (UCHRI) and partners. Collaborators: SALT/UNC: Richard Marciano, Chien-Yi Hou UCHRI: David Theo Goldberg Funded by IMLS 16

17 HASS Data Grid

18 HOLC Area Description Example

19 Redlining Map

20 Original Paper Documents Database GIS Maps PDF Documents Scan OCR Parse Scan OCR User Queries Get Results on maps Get Results on docs Get Corresponding docs User Browsing How to use the data? 1.Query the database 2.Browse the map 3.Browse the PDF

21 The Interface 1939

Thank you! More information? or 22