Presentation is loading. Please wait.

Presentation is loading. Please wait.

Presentations Introduction Case Studies:

Similar presentations


Presentation on theme: "Presentations Introduction Case Studies:"— Presentation transcript:

1 Presentations Introduction Case Studies:
Policies, Services, Interoperability, Mashups: BNF, DCAPE, PoDRI, e-Legacy RENCI Federated Data Projects: NARA TPAP, RENCI VO, TIP Interfaces: Islandora, Jargon, CDR

2 iRODS federates major collections From Ken Arnold, SHAMAN project
A Unified Web interface for Browsing or searching iRODS federates major collections From Ken Arnold, SHAMAN project User Sees Single Hierarchy Flickr file system /flickr/commons/ Using flickr API, a RESTful web API YouTube Media accessible through API New Service Mountable file system: Hulu, photobucket, etc. Each /flickr/commons/Institution “folder” translates to the result of one or two calls to the flickr API, presented to iRODS as if it were a file system For a collection to integrate, it would need to have some remote API that we could write a driver for and one or more ways to map that collection into a tree Each mountable service is made into a resource with all relevant info (location, resource type, etc.

3 With Client Views & Manages Data
User With Client Views & Manages Data iRODS Shows Unified “Virtual Collection” User Sees Single “Virtual Collection” My Data Disk, Tape, Database, Filesystem, etc. My Data Disk, Tape, Database, Filesystem, etc. Partner’s Data Remote Disk, Tape, Filesystem, etc. The iRODS Data System can install in a “layer” over existing or new data, letting you view, manage, and share part or all of diverse data in a unified Collection.

4 Accessing Data in the iRODS System
User With iRODS Client searches CATALOG to find and get Data “I need data!” “Finds the data.” “Gets data to user.” iRODS Data System iRODS Metadata Catalog Keeps track of data Data Server Disk, Tape, Database, Filesystem, etc. Users can search for, access, add/extract metadata, annotate, analyze & process, replicate, copy, share data, manage & track access, subscribe, and more.

5 Overview of iRODS Components
User Interface Web or GUI Client to Access and Manage Data & Metadata* iRODS Server Data on Disk iRODS Metadata Catalog Database Tracks state of data iRODS Rule Engine Implements Policies About iRODS and DICE The Data Intensive Cyber Environments (DICE) group leads core development of the open source iRODS Integrated Rule-Oriented Data System. With more than a decade of award-winning research that harnesses the power of cybertechnologies for managing, sharing, publishing, and preserving digital data, the group is based at the School of Information and Library Science and the Renaissance Computing Institute (RENCI) at the University of North Carolina at Chapel Hill, and the Institute for Neural Computation at the University of California, San Diego. Development of the core iRODS data grid system is funded by the National Science Foundation and the National Archives and Records Administration, with a growing open source iRODS community participating in development worldwide, based in the nonprofit Data Intensive Cyberinfrastructure Foundation. For more information see *Access data with: Web-based Browser, iRODS GUI, Command Line clients, Dspace, Fedora, Kepler workflow, WebDAV, user level file system, etc.

6 "Layers" in iRODS: From Users to Storage
Policies Express goals for data access, sharing, preservation, etc. Community Decides how to manage shared Collection(s) Administrator/User Applies Rules Rules Implement Policies in computer-actionable form Micro-services Operate on reomte data iRODS Server Executes Micro-services

7 Under the hood - a glimpse
NC State Duke Chapel Hill Meta Data Catalog iRODS Server Rule Engine DB iRODS Server Rule Engine iRODS Server Rule Engine User asks for data (using logical properties) Data request goes to 1st Server Server looks up information in catalog Catalog tells 2nd federated server has data 1st server asks 2nd server for data 2nd server applies Rules and serves data

8 Policies in iRODS Policies: Express community goals for data access and sharing, management, long-term preservation, uses, etc. Policy Examples Run a particular workflow when a “set of files” is ingested into a collection (e.g. make thumbnails of images, post to website). Automatically replicate a file added to a collection into 3 geographically distributed sites. Automatically extract metadata for a file of a certain type and store in metadata catalog. Periodically check integrity of files in a Collection and repair/replace if needed/possible. Automatically pick a certain storage location based on user or collection or size or type. Let a user access a collection only if using certificate-based login. Send a notification when a certain file is ingested. etc.

9 Policies, Services, Interoperability, Mashups:
Richard Marciano, SILS

10 e-Legacy Mashup RSS Feed Reader Data Grid (SRB/iRODS)

11 e-Legacy Demo Appraisal Subscribe to RSS Review Received Entry Share
and Tag Description Arrangement Preservation Meet Preservation Criteria Preserve to iRODS Yes

12 National Library of France: Distributed Archiving & Preservation System (SPAR)

13 BNF: French National Library
Three rules: Import Import an input document into iRODS Add import date and checksum as AVU-triplet metadata Replicate to other resources Get Locate a copy of the record Return if physical checksum .eq. stored checksum If not, delete replica, copy a good one over it Audit Locate all replicas of a data object Compute a physical checksum using system’s MD5 Compare the result of the checksum stored in user metadata All stale copies are removed and then replicated from another good copy When all copies are audited, a clean copy is staged onto a specific FS directory

14 BNF: French National Library
Three rules: Import Import an input document into iRODS Add import date and checksum as AVU-triplet metadata Replicate to other resources Get Locate a copy of the record Return if physical checksum .eq. stored checksum If not, delete replica, copy a good one over it Audit Locate all replicas of a data object Compute a physical checksum using system’s MD5 Compare the result of the checksum stored in user metadata All stale copies are removed and then replicated from another good copy When all copies are audited, a clean copy is staged onto a specific FS directory

15 BNF: French National Library
Micro-Services Add metadata to an iRODS object Import an object into iRODS, compute MD5 checksum and validate against the supplied one. Once validated, add MD5SUM and import date as metadata. If invalid, content is removed from iRODS Return the value of an iRODS object metadata attribute Prepare to retrieve a metadata attribute for a resource Prepare to retrieve a metadata attribute for an object Get the input resources belonging to a zone name Get iCAT results regarding location info for a record Execute MD5SUM on the physical content and return value Return a pseudo random string of specified length Delete a stale replica and replicate over it from another fresh copy Stale replica replacement can be eager (synchronous execution) or lazy (delayed execution)

16 DCAPE

17 DCAPE

18 DCAPE

19 PoDRI: Policy-Driven Repository Interoperability

20 RENCI Federated Data Projects
Leesa Brieger, RENCI

21 Metadata Catalog (iCAT)
RENCI VO Data Grid Duke NCSU iRODS Server iRODS Server ECU UNC-A Metadata Catalog (iCAT) DB UNC-CH RENCI, Europa Center iRODS Server iRODS Server iRODS Server iRODS Server Client asks for data Data request goes to iRODS server Server looks up information in iCAT iCAT tells which iRODS server has data Data is retrieved from physical location and delivered to client

22 Federation of Seven Independent Data Grids
National Archives and Records Administration Transcontinental Persistent Archive Prototype (TPAP) Federation of Seven Independent Data Grids NARA II iCAT Georgia Tech iCAT Rocket Center NARA I UNC UMD UCSD iCAT iCAT iCAT iCAT iCAT Extensible Environment: can federate with additional research and education sites. Each data grid uses different vendor products.

23 Federated Repositories
TUCASI Infrastructure Project (TIP) Federated Repositories

24 TUCASI Infrastructure Project (TIP) Goals
Leverage data resources for competitive research and leadership Support research and education efforts in a wide range of disciplines and domains National leadership in next-generation data management Model for long term campus storage Architecture and design; hardware, software Operations and support Data policies Selection and retention Ingest, curation and preservation Collections and repository management

25 Classroom content on a DICE/RENCI data grid
A Test Classroom content on a DICE/RENCI data grid Panopto Elluminate

26 Interfaces Jargon, Web, REST, SOAP
Mike Conway, DICE Center Jargon, Java, Interface Developer

27 Goals Make integration simple by creating clear, familiar service API.
Make IRODS a familiar, easy-to-use resource to mid-tier Java developers. Develop a REST/SOAP service model for common use-cases using mature tools. Create an out-of-the-box web interface that makes IRODS easy for administrators and archivists.

28 Currently... Jargon is a pure-Java API that talks to IRODS over Java sockets. Jargon is fairly low-level and can be tricky at first. Used in multiple projects including WebDAV interface, as well as integration with the Fedora repository via the irodsfedora library.

29 Jargon (next...) Jargon-core: Jargon re-factored
High level service API, POJO's, Spring-friendly Emphasis on testability Jargon-akubra: Implementation of an Akubra module for IRODS via Jargon Jargon-lingo: Application of mature open-source tools over Jargon-core to provide REST-ful, SOAP, and Web interface to IRODS.

30 Conceptual Diagram IRODS Service Model SOAP/REST Web DuraSpace
Custom code (Java, Groovy, Jython Jruby, etc.) Frameworks Jargon-lingo Jargon-akubra Jargon-core IRODS Grid

31 TRLN Partners Questionnaire
NC State Jim Tuttle Duke Seth Shaw Winston Atkins Russell Koonts UNC Will Owen 1. Preservation Projects Geo NDIIPP Images e-Theses Dissertations records TRAC 30 criteria Fedora  iRODS checksum 2 copies CDR 2. Status Planned planned production ½ way testing phase near production 3. Preservation Challenges permission auditing replication search/browse version control policies tiered storage getting the backlog generating meta. consolidating meta. prez. planning sys. reliability 4. iRODS no yes 5. iRODS Challenges NA none rules syntax documentation production configuration stable release 6. Questions None working w. archivists maintenance releases iRODS book


Download ppt "Presentations Introduction Case Studies:"

Similar presentations


Ads by Google