DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.

Slides:



Advertisements
Similar presentations
Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.
Advertisements

An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
Fedora Users’ Conference Rutgers University May 14, 2005 Researching Fedora's Ability to Serve as a Preservation System for Electronic University Records.
Digital & Preservation Resources Managing the digital collection life cycle.
ICOLC October 4, 2001 OCLC Services. Purpose Libraries’ web-based information portal needs –Maximize consortia’s role in their members’ use of database.
Institutional Repositories It’s not Just the Technology New England Archivists Boston College March 11, 2006 Eliot Wilczek University Records Manager Tufts.
Technical Framework Charl Roberts University of the Witwatersrand Source: Repositories Support Project (JISC)
Transformations at GPO: An Update on the Government Printing Office's Future Digital System George Barnum Coalition for Networked Information December.
Sustainable Preservation Services for Archivists through Distributed Custody Caryn Wojcik State of Michigan Records Management Services.
Towards a Federated Infrastructure for the Preservation and Analysis Archival Data Chien-Yi HOU Richard MARCIANO {chienyi, School.
Archives & Technology Collide: The Carolina Digital Repository Erin O’Meara Electronic Records Archivist University Archives and Records Services University.
Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center.
Richard MARCIANO Chien-Yi HOU School of Information and Library Science (SILS) Sustainable Archives & Leveraging Technologies Group (SALT) University of.
Merrilee Proffitt e(X)literature / Digital Cultures Project April 2003 News from the Digital Library The Metadata Encoding and Transmission Standard; the.
DCAPE Distributed Custodial Archival Preservation Environments ( Chien-Yi HOU Richard MARCIANO UNC Chapel Hill, SILS /
Digitization Projects: Internal Development vs. Outsourcing Production or D.I.Y. vs. The Pros.
Richard MARCIANO Chien-Yi HOU School of Information and Library Science (SILS) Sustainable Archives & Leveraging Technologies Group (SALT) University of.
Persistent Digital Archives and Library System (PeDALS) South Carolina Department of Archives and History.
South Carolina Information Technology Directors Association September 8, 2008 Bill Henry, Matt Guzzi SC Department of Archives and History.
1 Archiving and Preserving the Web Kristine Hanna Internet Archive April 2006.
National Aeronautics and Space Administration Implementing DSpace at NASA Langley Research Center 1 Greta Lowe Librarian NASA Langley Research Center
Persistent Digital Archives and Library System (PeDALS) A Guide for Wisconsin State Agencies.
Data-PASS Shared Catalog Micah Altman & Jonathan Crabtree 1 Micah Altman Harvard University Archival Director, Henry A. Murray Research Archive Associate.
Archive-It and CINCH tool: Using web harvesting to facilitate born- digital preservation Kathleen Kenney Archive-It Partners Meeting 2012.
Preserving Electronic Mailing Lists: The H-Net Archive H-Net Mapped to the OAIS Model Preservation AssessmentPreservation improvementsOverview How H-Net.
Finding a New Way Richard Pearce-Moses Deputy Director for Technology & Information Resources Arizona State Library, Archives and Public Records Using.
Persistent Digital Archives and Library System (PeDALS) SC Department of Archives and History.
Johannes Spitzbart Phonogrammarchiv, Austrian Academy of Sciences Österreichische Tage der Digitalen Geisteswissenschaften save the data - workshop on.
Ms. Irene Onyancha ISTD/Library & Information Management Services United Nations Economic Commission for Africa The Second Session of the Committee on.
Choosing Delivery Software for a Digital Library Jody DeRidder Digital Library Center University of Tennessee.
Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management ServicesSALT DCAPE.
Katherine Skinner Educopia Institute and MetaArchive Cooperative Matt Schultz Educopia Institute and MetaArchive Cooperative NDIIPP Partners Meeting Arlington,
Preserving Digital Culture: Tools & Strategies for Building Web Archives : Tools and Strategies for Building Web Archives Internet Librarian 2009 Tracy.
Use & Access 26 March Use “Proof of Concept” Model for General Libraries & IS faculty Model for General Libraries & IS faculty Test bed for DSpace.
Digital preservation activities at the NLW Sally McInnes 18 September 2009.
GeoMAPP: Using Metadata to Help Preserve Geospatial Content Matt Peters, Utah’s Automated Geographic Reference Center Glen McAninch, Kentucky Department.
Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
Katherine Skinner, Executive Director, Educopia Institute ESOPI 2013 Chapel Hill, NC April 19, 2013.
Session 3.  Now you know WHY to make policies and WHAT they should contain…  But HOW do you implement policies?  And then HOW do you implement a program.
Report on Preservation of ETDs: The LOCKSS Prototype The work of Kamini Santhanagopalan Virginia Tech Graduate Student in Computer Science Reported at.
Dr. Martin Halbert Dr. Katherine Skinner Digital Preservation: What’s Now, What’s Next. Amigos Online Conference, August 12, 2011.
Persistent Digital Archives and Library System (PeDALS)
OAIS: From Requirements to Reality at OCLC FLICC / CENDI Symposium, Dec Pam Kircher Product Manager, Digital Archive OCLC Digital & Preservation.
Selene Dalecky March 20, 2007 FDsys: GPO’s Digital Content System.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
Funded by: © AHDS Preservation in Institutional Repositories Preliminary conclusions of the SHERPA DP project Gareth Knight Digital Preservation Officer.
National Archives and Records Administration Status of the ERA Project RACO Chicago Meg Phillips August 24, 2010.
The Project Three-year grant from the National Historical Publications and Records Commission (NHPRC), April 2010-March 2013 Develop electronic records.
Preserving Electronic Mailing Lists as Scholarly Resources: The H-Net Archives Lisa M. Schmidt
DAITSS and the Florida Digital Archive Priscilla Caplan Florida Center for Library Automation iPRES 2006.
ARIADNE is funded by the European Commission's Seventh Framework Programme Archiving and Repositories Holly Wright.
Institutional Repositories July 2007 DIGITAL CURATION creating, managing and preserving digital objects Dr D Peters DISA Digital Innovation South.
National Archives and Records Administration1 Integrated Rules Ordered Data System (“IRODS”) Technology Research: Digital Preservation Technology in a.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Grant Writing for Digital Projects September 2012 IODE Project Office IODE Project Office Oostende, Belgium Oostende, Belgium Sustainability and.
Use of Policies to Enforce Collection Properties Richard Marciano Reagan Moore University of North Chapel Hill Data Intensive Cyber Environments.
Katherine Skinner, Martin Halbert & Matt Schultz Educopia Institute and MetaArchive Cooperative NDSA Infrastructure Committee
CENTRAL/WESTERN MASSACHUSETTS AUTOMATED RESOURCE SHARING Digitization GOALS & THEIR LOGISTICS Michael J. Bennett Digital Initiatives Librarian C/WMARS,
CENTRAL/WESTERN MASSACHUSETTS AUTOMATED RESOURCE SHARING Digital Repositories Build It & They Will Come Michael J. Bennett Access Services Supervisor C/WMARS,
DAITSS and the Florida Digital Archive
An Introduction to Tessella and The Safety Deposit Box Platform
Policy-Based Data Management integrated Rule Oriented Data System
DCAPE Interface Demonstration
Bentley Project Reel Digitization Bentley Historical Library t
Implementing an Institutional Repository: Part II
Research data preservation in Canada
Robin Dale RLG OAIS Functionality Robin Dale RLG
Implementing an Institutional Repository: Part II
How to Implement an Institutional Repository: Part II
Presentation transcript:

DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management Services SALTSALT

NHPRC Issued a Call… Design a digital preservation service with a business model for the archival community Design a digital preservation service with a business model for the archival community Fill the needs of archival repositories that cannot build and sustain their own electronic records archive Fill the needs of archival repositories that cannot build and sustain their own electronic records archive

DCAPE Project Distributed Custodial Archival Preservation Environments Distributed Custodial Archival Preservation Environments Project was funded by NHPRC in 2008 (RE ) Project was funded by NHPRC in 2008 (RE ) Officially started in December 2008 Officially started in December 2008 Project extended through April 2012 Project extended through April

What is Distributed Custodial Preservation? Physical custody of archival collections is distributed outside of the archival repository to a trusted preservation service Physical custody of archival collections is distributed outside of the archival repository to a trusted preservation service Archival repository retains legal custody Archival repository retains legal custody Archival repository remains responsible for archival functions, including preservation and access Archival repository remains responsible for archival functions, including preservation and access Access to collections is controlled by archival repository Access to collections is controlled by archival repository

DCAPE Partners 28 people across 9 institutions and 2 staff at UNC, for a total of 32 participants 28 people across 9 institutions and 2 staff at UNC, for a total of 32 participants Cultural Entity: Getty Research Institute Cultural Entity: Getty Research Institute Cyberinfrastructure: West Virginia University, Carleton University (Canada) Cyberinfrastructure: West Virginia University, Carleton University (Canada) State Archives: California, Kansas, Michigan, Kentucky, North Carolina, New York State Archives: California, Kansas, Michigan, Kentucky, North Carolina, New York State Library: North Carolina State Library: North Carolina University Archives: Tufts University Archives: Tufts UNC: School of Information and Library Science (SILS), Sustainable Archives and Leveraging Technologies (SALT) UNC: School of Information and Library Science (SILS), Sustainable Archives and Leveraging Technologies (SALT)

DCAPE Goals Build a preservation environment that meets the needs of archival repositories for trusted archival preservation services. Build a preservation environment that meets the needs of archival repositories for trusted archival preservation services. Services are based on policies (rules) that are defined by the archivist Services are based on policies (rules) that are defined by the archivist Over 250 rules have been developed for the iRODS library that can be leveraged for DCAPE Over 250 rules have been developed for the iRODS library that can be leveraged for DCAPE A series of rules might “look” like this: A series of rules might “look” like this: When files are ingested, replicate them in three different locations and run a checksum on each file. Bit-check files every month. Send an alert about any changes to the files. When files are ingested, replicate them in three different locations and run a checksum on each file. Bit-check files every month. Send an alert about any changes to the files.

DCAPE Goals The trusted digital repository infrastructure will be assembled from state-of-the-art rule-based data management systems, commodity storage systems, and sustainable preservation services. The trusted digital repository infrastructure will be assembled from state-of-the-art rule-based data management systems, commodity storage systems, and sustainable preservation services. The software infrastructure will automate many of the administrative tasks associated with the management of archival repositories. The software infrastructure will automate many of the administrative tasks associated with the management of archival repositories. Tasks will include: authentication, replication, migration, obsolete file management, preservation metadata management, etc. Tasks will include: authentication, replication, migration, obsolete file management, preservation metadata management, etc.

Project Tasks Execute service agreements between UNC and partners to govern use of the test collections. Execute service agreements between UNC and partners to govern use of the test collections. Define rules and services (organized according to the OAIS framework) for iRODS to perform on test collections. Define rules and services (organized according to the OAIS framework) for iRODS to perform on test collections. Ingest test collections into iRODS and validate the rules and services. Ingest test collections into iRODS and validate the rules and services. Develop business model (including costs) for sustaining a repository service based on iRODS. Develop business model (including costs) for sustaining a repository service based on iRODS. Develop model service agreements that define the standard and optional services of the repository. Develop model service agreements that define the standard and optional services of the repository.

Role of iRODS Preservation environment provides rule- based automation of archival functions (repeatable services) Preservation environment provides rule- based automation of archival functions (repeatable services) Standard and optional services will be available Standard and optional services will be available Shared service should reduce costs for each archival repository compared to the cost of building in-house preservation capabilities Shared service should reduce costs for each archival repository compared to the cost of building in-house preservation capabilities

Life Cycle of Data Virtual Loading Dock Virtual Loading Dock Preservation Area Preservation Area SIP AIP DIP Reference Room Reference Room DIP

DCAPE Framework iRODS Virtual Loading Dock Virtual Loading Dock V1 V2 V3 Preservation Area Preservation Area P1 P2 P3 SIP AIP DIP Reference Room Reference Room R1R1 R1R1 R2R2 R2R2 DIP

DCAPE Capabilities iRODS Virtual Loading Dock Virtual Loading Dock V1 V2 V3 Preservation Area Preservation Area P1 P2 P3 SIP AIP DIP , 3, 4, 5, 6, 7, 8 11, 12, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23 25, Reference Room Reference Room R1R1 R1R1 R2R2 R2R2 DIP

iRODS Virtual Loading Dock Virtual Loading Dock V1 V2 V3 Preservation Area Preservation Area P1 P2 P3 SIP AIP DIP , 3, 4, 5, 6, 7, 8 11, 12, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23 25, Reference Room Reference Room R1R1 R1R1 R2R2 R2R2 DIP DCAPE Capabilities Replication

Sample Rule sampleRule||delayExec( 1m 2m,assign(*path,/samplePath )##msiMakeGenQuery("COLL_NAME","COLL_PARENT_NAME = '*path' AND META_COLL_ATTR_NAME = 'DCAPE_COLL_TYPE' AND META_COLL_ATTR_VALUE = 'AIP'",*GenQInp)##msiExecGenQuery(*GenQInp, *GenQOut)##forEachExec(*GenQOut,msiGetValByKey(*GenQOut, "COLL_NAME",*DataObj)##msiSplitPath(*DataObj,*p,*c)##assign(*newpath,SamplePat h2*c) ##msiDataObjRename(*DataObj,*newpath,1,*result)##acAddLog(Move_Collection,"*D ataObj")##acCheckPolicy(*newpath,DCAPE_POLICY_REPLICA,*pResult)##ifExec((*pRes ult == Yes),msiCollRepl(*newpath,destRescName=resource,*status)##acAddLog(Replicate_C oll,"*newpath"),nop,nop,nop),nop),nop)|nop

iRODS Virtual Loading Dock Virtual Loading Dock V1 V2 V3 Preservation Area Preservation Area P1 P2 P3 SIP AIP DIP , 3, 4, 5, 6, 7, 8 11, 12, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23 25, Reference Room Reference Room R1R1 R1R1 R2R2 R2R2 DIP An Interface that is easy to manage the policies! 24

Hide the technical details Hide the technical details Show the information that archivists want to know Show the information that archivists want to know Be able to customize policies easily Be able to customize policies easily Web-based, no installation required Web-based, no installation required Interface - Requirements

iRODS Virtual Loading Dock Virtual Loading Dock V1 V2 V3 Preservation Area Preservation Area P1 P2 P3 SIP AIP DIP , 3, 4, 5, 6, 7, 8 11, 12, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23 25, Reference Room Reference Room R1R1 R1R1 R2R2 R2R2 DIP Checksum Replication Demo I

iRODS Virtual Loading Dock Virtual Loading Dock V1 V2 V3 Preservation Area Preservation Area P1 P2 P3 SIP AIP DIP , 3, 4, 5, 6, 7, 8 11, 12, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23 25, Reference Room Reference Room R1R1 R1R1 R2R2 R2R2 DIP Checksum & Virus Check No Replication Demo II

DCAPE is More More than a storage service or environment More than a storage service or environment More than a reference tool More than a reference tool DCAPE will provide the capability for all archival repositories to fulfill their responsibility to preserve electronic records DCAPE will provide the capability for all archival repositories to fulfill their responsibility to preserve electronic records

DCAPE Interface PlatformNavigationLoginAudit Trail Appearance Reporting Batch Operations Screen Views

DCAPE Metadata Follow Dublin Core model Follow Dublin Core model Allow customization Allow customization Encourage standardization Encourage standardization Define Define Source: creator, system, archivist Source: creator, system, archivist Level: collection, accretion, item Level: collection, accretion, item Accessibility: internal vs. public Accessibility: internal vs. public Fields: Required vs. optional Fields: Required vs. optional

DCAPE Workflow Define functionality at each stage Define functionality at each stage Virtual Loading Dock Virtual Loading Dock Pre-accessioning Pre-accessioning Ingestion Ingestion Preservation Area Preservation Area Archival storage Archival storage Data management Data management Administration Administration Preservation planning Preservation planning Reference Room Reference Room Access Access Common services Common services Management Management

DCAPE Business Model Non-profit Non-profit Fees for services Fees for services Fees for storage Fees for storage Storage and disaster prevention services Storage and disaster prevention services Software maintenance Software maintenance Access and connectivity Access and connectivity

MetaArchive Cooperative Encourage organizations to build their own preservation infrastructures rather than outsourcing to external vendors Encourage organizations to build their own preservation infrastructures rather than outsourcing to external vendors 3 levels of membership: 3 yr commitment 3 levels of membership: 3 yr commitment Basic costs: Basic costs: Equipment: 1 st year, $4.6K server purchase Equipment: 1 st year, $4.6K server purchase Staffing: 2% of a sys. admin’s time + POC admin + software eng. For content ingestion preparation Staffing: 2% of a sys. admin’s time + POC admin + software eng. For content ingestion preparation Storage: $1.00 / GB / year for content stored in n et. Storage: $1.00 / GB / year for content stored in n et. Yearly dues: Yearly dues: Sustaining Members: $5.5K / yr Sustaining Members: $5.5K / yr Preservation Members: $3K / yr Preservation Members: $3K / yr Collaborative Members: varies Collaborative Members: varies Cost scenarios: 2TB of content Cost scenarios: 2TB of content Sustaining Member: Preservation Member: Collaborative Member: $27.1K / 3 yrs --->($5.5K (membership) + $2K (space) )x 3 yrs + $4.6K (server) $19.6K / 3 yrs --->($3K (membership) + $2K (space)) x 3 yrs + $4.6K (server) $22.6K/ 3 yrs --->($4K (membership) + $2K (space)) x 3 yrs + $4.6K (server)

Archive-It Subscription service from the Internet Archive, allowing institutions to build and preserve collections of born digital content Subscription service from the Internet Archive, allowing institutions to build and preserve collections of born digital content Allows users to crawl, scope, catalog, manage, and browse their archived collections Allows users to crawl, scope, catalog, manage, and browse their archived collections Collections are hosted at the IA data center and are available through URL and full-text search Collections are hosted at the IA data center and are available through URL and full-text search a minimum of 2 copies of each collection are kept online a minimum of 2 copies of each collection are kept online Cost Scenarios Cost Scenarios

Storage Cost Model Scenarios 1. Question: What is the yearly charge for a customer with 4,000 files and 1.5 TB of storage, assuming the need for two copies – one on disk and one on tape (iRODS)? 2. Question: What is the yearly cost of 6 million files (web crawl scenario) and 1 TB of storage, assuming the need for two tape copies (using iRODS)? 3. Question: What is the yearly cost of 100,000 files and 20 TB of storage with two tape copies (using iRODS)? Answer: $2,900 + $1,400 x 1.5 = $5,000 Answer: $2,900 + $ x $870 + $5,165 = $13,835 Answer: $2, x $ x $870 + $5,165 = $19,152

DCAPE Project