Presentations Introduction Case Studies:

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Texas Digital Library Services Preservation Network.
Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
OGF-23 iRODS Metadata Grid File System Reagan Moore San Diego Supercomputer Center.
© 2006 Open Grid Forum OGF19 Federated Identity Rule-based data management Wed 11:00 AM Mountain Laurel Thurs 11:00 AM Bellflower.
Data Grid: Storage Resource Broker Mike Smorul. SRB Overview Developed at San Diego Supercomputing Center. Provides the abstraction mechanisms needed.
1 Applied CyberInfrastructure Concepts ISTA 420/520 Fall Nirav Merchant Bio Computing & iPlant Collaborative Eric Lyons.
A Very Brief Introduction to iRODS
SOFTWARE PRESENTATION ODMS (OPEN SOURCE DOCUMENT MANAGEMENT SYSTEM)
Sustainable Preservation Services for Archivists through Distributed Custody Caryn Wojcik State of Michigan Records Management Services.
Towards a Federated Infrastructure for the Preservation and Analysis Archival Data Chien-Yi HOU Richard MARCIANO {chienyi, School.
Archives & Technology Collide: The Carolina Digital Repository Erin O’Meara Electronic Records Archivist University Archives and Records Services University.
iRODS: Interoperability in Data Management
DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.
Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center.
ADAPT An Approach to Digital Archiving and Preservation Technology Principal Investigator: Joseph JaJa Lead Programmers: Mike Smorul and Mike McGann Graduate.
PAWN: Producer-Archive Workflow Network University of Maryland Institute for Advanced Computer Studies Joseph Ja’Ja, Mike Smorul, Mike McGann.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
Supporting Customized Archival Practices Using the Producer-Archive Workflow Network (PAWN) Mike Smorul, Mike McGann, Joseph JaJa.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
PAWN: Producer-Archive Workflow Network University of Maryland Institute for Advanced Computer Studies Joseph JaJa, Mike Smorul, Mike McGann.
Richard MARCIANO Chien-Yi HOU School of Information and Library Science (SILS) Sustainable Archives & Leveraging Technologies Group (SALT) University of.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Data Grid Web Services Chip Watson Jie Chen, Ying Chen, Bryan Hess, Walt Akers.
UNIT-V The MVC architecture and Struts Framework.
Digital Library Architecture and Technology
DCC Conference, Glasgow November, Digital Archive Policies and Trusted Digital Repositories MacKenzie Smith, MIT Libraries Reagan Moore, San Diego.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Information Management and Distributed Data Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
OSG Public Storage and iRODS
CAA/CFA Review | Andrea Laruelo | ESTEC | May CFA Development Status CAA/CFA Review ESTEC, May 19 th 2011 European Space AgencyAndrea Laruelo.
A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management ServicesSALT DCAPE.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.
Dudok de Wit David.  Documents management in a deskless company  SharePoint Online as a solution  Redesigning the documentary organization  Interoperability.
GeoMAPP: Using Metadata to Help Preserve Geospatial Content Matt Peters, Utah’s Automated Geographic Reference Center Glen McAninch, Kentucky Department.
Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
IRODS: the use of rules and micro services for automatic data conversion and signal pattern searching Martyn Fletcher, Tom Jackson, Bojian Liang, Michael.
Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore 1.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
From SRB to IRODS: Policy Virtualization using Rule-Based Data Grids Reagan W. Moore Wayne Schroeder Arcot Rajasekar Mike Wan San Diego Supercomputer Center.
Introduction to The Storage Resource.
DSpace System Architecture 11 July 2002 DSpace System Architecture.
A Technical Overview Bill Branan DuraCloud Technical Lead.
National Archives and Records Administration1 Integrated Rules Ordered Data System (“IRODS”) Technology Research: Digital Preservation Technology in a.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
Managing live digital content with DuraSpace services Bill Branan PASIG Spring 2015.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Active Directory Domain Services (AD DS). Identity and Access (IDA) – An IDA infrastructure should: Store information about users, groups, computers and.
Use of Policies to Enforce Collection Properties Richard Marciano Reagan Moore University of North Chapel Hill Data Intensive Cyber Environments.
Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
An Overview of iRODS Integrated Rule-Oriented Data System
Introduction to iRODS Jean-Yves Nief.
An Overview of Data-PASS Shared Catalog
Policy-Based Data Management integrated Rule Oriented Data System
Joseph JaJa, Mike Smorul, and Sangchul Song
Serpil TOK, Zeki BAYRAM. Eastern MediterraneanUniversity Famagusta
Presentation transcript:

Presentations Introduction Case Studies: Policies, Services, Interoperability, Mashups: BNF, DCAPE, PoDRI, e-Legacy RENCI Federated Data Projects: NARA TPAP, RENCI VO, TIP Interfaces: Islandora, Jargon, CDR

iRODS federates major collections From Ken Arnold, SHAMAN project A Unified Web interface for Browsing or searching iRODS federates major collections From Ken Arnold, SHAMAN project User Sees Single Hierarchy Flickr file system /flickr/commons/ Using flickr API, a RESTful web API YouTube Media accessible through API New Service Mountable file system: Hulu, photobucket, etc. Each /flickr/commons/Institution “folder” translates to the result of one or two calls to the flickr API, presented to iRODS as if it were a file system For a collection to integrate, it would need to have some remote API that we could write a driver for and one or more ways to map that collection into a tree Each mountable service is made into a resource with all relevant info (location, resource type, etc.

With Client Views & Manages Data User With Client Views & Manages Data iRODS Shows Unified “Virtual Collection” User Sees Single “Virtual Collection” My Data Disk, Tape, Database, Filesystem, etc. My Data Disk, Tape, Database, Filesystem, etc. Partner’s Data Remote Disk, Tape, Filesystem, etc. The iRODS Data System can install in a “layer” over existing or new data, letting you view, manage, and share part or all of diverse data in a unified Collection.

Accessing Data in the iRODS System User With iRODS Client searches CATALOG to find and get Data “I need data!” “Finds the data.” “Gets data to user.” iRODS Data System iRODS Metadata Catalog Keeps track of data Data Server Disk, Tape, Database, Filesystem, etc. Users can search for, access, add/extract metadata, annotate, analyze & process, replicate, copy, share data, manage & track access, subscribe, and more.

Overview of iRODS Components User Interface Web or GUI Client to Access and Manage Data & Metadata* iRODS Server Data on Disk iRODS Metadata Catalog Database Tracks state of data iRODS Rule Engine Implements Policies About iRODS and DICE The Data Intensive Cyber Environments (DICE) group leads core development of the open source iRODS Integrated Rule-Oriented Data System. With more than a decade of award-winning research that harnesses the power of cybertechnologies for managing, sharing, publishing, and preserving digital data, the group is based at the School of Information and Library Science and the Renaissance Computing Institute (RENCI) at the University of North Carolina at Chapel Hill, and the Institute for Neural Computation at the University of California, San Diego. Development of the core iRODS data grid system is funded by the National Science Foundation and the National Archives and Records Administration, with a growing open source iRODS community participating in development worldwide, based in the nonprofit Data Intensive Cyberinfrastructure Foundation. For more information see http://diceresearch.org. *Access data with: Web-based Browser, iRODS GUI, Command Line clients, Dspace, Fedora, Kepler workflow, WebDAV, user level file system, etc.

"Layers" in iRODS: From Users to Storage Policies Express goals for data access, sharing, preservation, etc. Community Decides how to manage shared Collection(s) Administrator/User Applies Rules Rules Implement Policies in computer-actionable form Micro-services Operate on reomte data iRODS Server Executes Micro-services

Under the hood - a glimpse NC State Duke Chapel Hill Meta Data Catalog iRODS Server Rule Engine DB iRODS Server Rule Engine iRODS Server Rule Engine User asks for data (using logical properties) Data request goes to 1st Server Server looks up information in catalog Catalog tells 2nd federated server has data 1st server asks 2nd server for data 2nd server applies Rules and serves data

Policies in iRODS Policies: Express community goals for data access and sharing, management, long-term preservation, uses, etc. Policy Examples Run a particular workflow when a “set of files” is ingested into a collection (e.g. make thumbnails of images, post to website). Automatically replicate a file added to a collection into 3 geographically distributed sites. Automatically extract metadata for a file of a certain type and store in metadata catalog. Periodically check integrity of files in a Collection and repair/replace if needed/possible. Automatically pick a certain storage location based on user or collection or size or type. Let a user access a collection only if using certificate-based login. Send a notification when a certain file is ingested. etc.

Policies, Services, Interoperability, Mashups: Richard Marciano, SILS

e-Legacy Mashup RSS Feed Reader Data Grid (SRB/iRODS)

e-Legacy Demo Appraisal Subscribe to RSS Review Received Entry Share and Tag Description Arrangement Preservation Meet Preservation Criteria Preserve to iRODS Yes

National Library of France: Distributed Archiving & Preservation System (SPAR)

BNF: French National Library Three rules: Import Import an input document into iRODS Add import date and checksum as AVU-triplet metadata Replicate to other resources Get Locate a copy of the record Return if physical checksum .eq. stored checksum If not, delete replica, copy a good one over it Audit Locate all replicas of a data object Compute a physical checksum using system’s MD5 Compare the result of the checksum stored in user metadata All stale copies are removed and then replicated from another good copy When all copies are audited, a clean copy is staged onto a specific FS directory

BNF: French National Library Three rules: Import Import an input document into iRODS Add import date and checksum as AVU-triplet metadata Replicate to other resources Get Locate a copy of the record Return if physical checksum .eq. stored checksum If not, delete replica, copy a good one over it Audit Locate all replicas of a data object Compute a physical checksum using system’s MD5 Compare the result of the checksum stored in user metadata All stale copies are removed and then replicated from another good copy When all copies are audited, a clean copy is staged onto a specific FS directory

BNF: French National Library Micro-Services Add metadata to an iRODS object Import an object into iRODS, compute MD5 checksum and validate against the supplied one. Once validated, add MD5SUM and import date as metadata. If invalid, content is removed from iRODS Return the value of an iRODS object metadata attribute Prepare to retrieve a metadata attribute for a resource Prepare to retrieve a metadata attribute for an object Get the input resources belonging to a zone name Get iCAT results regarding location info for a record Execute MD5SUM on the physical content and return value Return a pseudo random string of specified length Delete a stale replica and replicate over it from another fresh copy Stale replica replacement can be eager (synchronous execution) or lazy (delayed execution)

DCAPE

DCAPE

DCAPE

PoDRI: Policy-Driven Repository Interoperability

RENCI Federated Data Projects Leesa Brieger, RENCI

Metadata Catalog (iCAT) RENCI VO Data Grid Duke NCSU iRODS Server iRODS Server ECU UNC-A Metadata Catalog (iCAT) DB UNC-CH RENCI, Europa Center iRODS Server iRODS Server iRODS Server iRODS Server Client asks for data Data request goes to iRODS server Server looks up information in iCAT iCAT tells which iRODS server has data Data is retrieved from physical location and delivered to client

Federation of Seven Independent Data Grids National Archives and Records Administration Transcontinental Persistent Archive Prototype (TPAP) Federation of Seven Independent Data Grids NARA II iCAT Georgia Tech iCAT Rocket Center NARA I UNC UMD UCSD iCAT iCAT iCAT iCAT iCAT Extensible Environment: can federate with additional research and education sites. Each data grid uses different vendor products.

Federated Repositories TUCASI Infrastructure Project (TIP) Federated Repositories

TUCASI Infrastructure Project (TIP) Goals Leverage data resources for competitive research and leadership Support research and education efforts in a wide range of disciplines and domains National leadership in next-generation data management Model for long term campus storage Architecture and design; hardware, software Operations and support Data policies Selection and retention Ingest, curation and preservation Collections and repository management

Classroom content on a DICE/RENCI data grid A Test Classroom content on a DICE/RENCI data grid Panopto Elluminate

Interfaces Jargon, Web, REST, SOAP Mike Conway, DICE Center Jargon, Java, Interface Developer

Goals Make integration simple by creating clear, familiar service API. Make IRODS a familiar, easy-to-use resource to mid-tier Java developers. Develop a REST/SOAP service model for common use-cases using mature tools. Create an out-of-the-box web interface that makes IRODS easy for administrators and archivists.

Currently... Jargon is a pure-Java API that talks to IRODS over Java sockets. Jargon is fairly low-level and can be tricky at first. Used in multiple projects including WebDAV interface, as well as integration with the Fedora repository via the irodsfedora library.

Jargon (next...) Jargon-core: Jargon re-factored High level service API, POJO's, Spring-friendly Emphasis on testability Jargon-akubra: Implementation of an Akubra module for IRODS via Jargon Jargon-lingo: Application of mature open-source tools over Jargon-core to provide REST-ful, SOAP, and Web interface to IRODS.

Conceptual Diagram IRODS Service Model SOAP/REST Web DuraSpace Custom code (Java, Groovy, Jython Jruby, etc.) Frameworks Jargon-lingo Jargon-akubra Jargon-core IRODS Grid

TRLN Partners Questionnaire NC State Jim Tuttle Duke Seth Shaw Winston Atkins Russell Koonts UNC Will Owen 1. Preservation Projects Geo NDIIPP Images e-Theses Dissertations records TRAC 30 criteria Fedora  iRODS checksum 2 copies CDR 2. Status Planned planned production ½ way testing phase near production 3. Preservation Challenges permission auditing replication search/browse version control policies tiered storage getting the backlog generating meta. consolidating meta. prez. planning sys. reliability 4. iRODS no yes 5. iRODS Challenges NA none rules syntax documentation production configuration stable release 6. Questions None working w. archivists maintenance releases iRODS book