Presentation is loading. Please wait.

Presentation is loading. Please wait.

Towards a Federated Infrastructure for the Preservation and Analysis Archival Data Chien-Yi HOU Richard MARCIANO {chienyi, School.

Similar presentations


Presentation on theme: "Towards a Federated Infrastructure for the Preservation and Analysis Archival Data Chien-Yi HOU Richard MARCIANO {chienyi, School."— Presentation transcript:

1 Towards a Federated Infrastructure for the Preservation and Analysis Archival Data Chien-Yi HOU Richard MARCIANO {chienyi, richard_marciano}@unc.edu School of Information and Library Science (SILS) Sustainable Archives & Leveraging Technologies Group (SALT) University of North Carolina at Chapel Hill 1

2 Preservation with Data Grid 2 Data grid systems provide data virtualization – Allows users to access files seamlessly across a distributed environments. – It replicates, syncs and archives data, connecting heterogeneous resources in a logical and abstracted manner. In addition to the capabilities above, iRODS, the Integrated Rule-Oriented Data System, provides policy/service virtualization – Rule Engine applies user-defined Policies and Rules – Policy can be coded as functions (micro-services) – Remote micro-services can be chained – The chains can be triggered on an event and condition (rules) – Micro-services communicate through parameters, shared contexts, and out-of-band message queues.

3 Overview of iRODS Architecture 3 3 User Can Search, Access, Add and Manage Data & Metadata *Access data with Web-based Browser or iRODS GUI or Command Line clients. iRODS Data Server Disk, Tape, etc. iRODS Metadata Catalog Track information iRODS Data System iRODS Rule Engine Track policies

4 Interoperability & Reproducibility T-RACES PoDRI e-Legacy Data Grid RSS Feed Reader Digital Repository Digital Library GIS 4

5 CA’s Geospatial Records: Archival Appraisal, Accessioning, and Preservation e-Legacy RSS Feed Reader Data Grid Dealing with GIS data Moving from “Expert Appraisal & Accessioning” to “Social Appraising and Automated Accessioning”. Incorporating RSS subscription into the appraisal process to allow archivists to work together on deciding what to preserve. Ingesting data to the archive automatically once the criteria were satisfied. Collaborators: California State Archives: Chris Garmire CERES: David Harris SALT/UNC: Richard Marciano, Chien-Yi Hou Funded by NHPRC 5

6 6 CSA California State Archives DICE (UNC/ INC) Archive ICAT CaSIL California Spatial Information Library Local Storage Resources Shared Preservation Environment Metadata Catalog (Oracle) Archival Storage (HPSS, Sam-QFS) e-Legacy: Shared Infrastructure

7 7 Where is The Data?

8 8 Old Approach: Formulating Appraisal Rules Retrieve root webpage ‘http://water.usgs.gov/lookup/getgislist’ For each entry: Create an “matching entry” collection on iRODS Add ‘entry description’ metadata to that collection Create “Documentation” subcollection Load web page Load all “.gif” | “.jpg” | “.jpeg” files Load all “.doc” Load metadata file Create “ArcINFO” subcollection Load all “.e00” | “.clr” | “.asc” | “.nit” | “.dlg” | “.txt” files Create “Shape” subcollection Load all “.shp” files Create “SDTS” subcollection Load all “.sdts” files Create “Others” subcollection Load “.tfw” | “.rdb” | “.clr” | “.asc” | “.prj” files DECOMPRESS & LOAD “.zip” | “.gz” | “.tgz” | “.tar” | “.tar.gz” files

9 9 What is RSS? RSS is a standardized XML file format for content providers to publish their contents. RSS is a web feed format. CERES Geospatial Service http://salt.unc.edu/eLegacy/RSS/CERES.xml CERES Geospatial Service en-us Wed, 23 Jul 2008 02:08:18 EDT CA Digital Raster Graphics 1x2 degree series http://salt.unc.edu/eLegacy/data/1x2_degree_serie.zip CA Digital Raster Graphics 1x2 degree series(1:250K scale) Tue, 22 Jul 2008 17:07:00 EDT CA Digital Raster Graphics 30x60 minute series http://salt.unc.edu/eLegacy/data/30x60_minute_series.zip CA Digital Raster Graphics 30x60 minute series(1:100K scale) Tue, 22 Jul 2008 14:58:00 EDT

10 RSS Feed Reader

11 Appraisal Description Arrangement Preservation Subscribe to RSS Review Received Entry Share and Tag Meet Preservation Criteria Preserve to iRODS Yes e-Legacy Workflow

12 e-Legacy RSS Tool

13 PoDRI Policy-Driven Repository Interoperability PoDRI Digital Repository Data Grid The PoDRI project investigates the requirements for policy-aware interoperability and demonstrates key features needed for its implementation. Using iRODS and its rules engine, combined with Fedora’s rich semantic object model for digital objects, enables use of the best features of both products. Collaborators: UNC: Richard Marciano, David Poclar, Alex Chassanoff. Chien-Yi Hou Duraspace/Cornell: Daniel Davis DICE/UCSD: Bing Zhu Funded by IMLS 13

14 PoDRI Use Cases Fedora (Digital Repository) Fedora (Digital Repository) iRODS (Data Grid) iRODS (Data Grid) New content ingested via iRODS Bulk registration from iRODS into Fedora Update of content or metadata via Fedora Update of content or metadata via iRODS New content ingested via Fedora 14

15 PoDRI Use Case 1 15 New content ingested via Fedora

16 T-RACES Testbed for the Redlining Archives of California’s Exclusionary Spaces T-RACES Digital Library GIS Data Grid Making 1930s redlining files of eight California cities from the Federal Home Owners’ Loan Corporation accessible to the public. Integrating the data and maps and providing an interface for users to access and query the data easily. Being one of the first project to use the HASS (Humanities, Arts, and Social Sciences) Grid, a cyberinfrastructure initiative organized by the University of California Humanities Research Institute (UCHRI) and partners. Collaborators: SALT/UNC: Richard Marciano, Chien-Yi Hou UCHRI: David Theo Goldberg Funded by IMLS 16

17 17 HASS Data Grid

18 18 HOLC Area Description Example

19 19 Redlining Map

20 20 Original Paper Documents Database GIS Maps PDF Documents Scan OCR Parse Scan OCR User Queries Get Results on maps Get Results on docs Get Corresponding docs User Browsing How to use the data? 1.Query the database 2.Browse the map 3.Browse the PDF

21 21 The Interface 1939

22 Thank you! More information? http://salt.unc.edu or email salt@unc.edu http://salt.unc.edu 22


Download ppt "Towards a Federated Infrastructure for the Preservation and Analysis Archival Data Chien-Yi HOU Richard MARCIANO {chienyi, School."

Similar presentations


Ads by Google